42 Students Take On Unity Obstacle Tower Challenge as Part of a Reinforcement Learning Project

This shows 16 environments running in parallel, but it can be scaled to much more. Our current standard is 32 environments on our GCP compute engines.

42 Students Place in the Top 20 of Unity Obstacle Tower Challenge

42 students Eamon Ito-Fisher and Louis Young took on the Unity Obstacle Tower Challenge as part of a reinforcement learning (RL) project. Eamon and Louis had no prior knowledge of Python or reinforcement learning. They also never worked together before they met in the machine learning piscine. Dan Goncharov oversees 42’s Robotics Lab where the machine learning piscine takes place. He also provides valuable mentoring to 42 students who want to challenge themselves during their time here. Through their hard work and determination, Eamon and Louis placed in the top 20 of the Unity Obstacle Tower Challenge.

Before joining the machine learning piscine, Louis had been at 42 for over a year. He had worked on a Computer Graphics project known as C.L.I.V.E. After that, he interned with a FileMaker consultancy. Eamon has been at 42 for 6 months as part of a gap year between high school and traditional college. After 42, he will be attending Franklin W. Olin College of Engineering outside of Boston.

More About Reinforcement Learning

Reinforcement learning is an area of machine learning where the computer learns by trial and error. This is very similar to how humans learn. A recent article in Forbes explains further, “Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error. It performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in order to achieve the best outcomes. This is similar to how we learn things like riding a bike where in the beginning we fall off a lot and make too heavy and often erratic moves, but over time we use the feedback of what worked and what didn’t to fine-tune our actions and learn how to ride a bike. The same is true when computers use reinforcement learning, they try different actions, learn from the feedback whether that action delivered a better result, and then reinforce the actions that worked, i.e. reworking and modifying its algorithms autonomously over many iterations until it makes decisions that deliver the best result.”

The Unity Obstacle Tower Challenge has been a great way for Eamon and Louis to learn more about reinforcement learning. The Unity blog noted how the challenge can contribute to this field, “We believe that the Obstacle Tower has the potential to help contribute to research into AI, specifically a sub-field called Deep Reinforcement Learning (Deep RL), which focuses on agents which learn from trial-and-error experience.”

The entire Tower Challenge is a 3D labyrinth. Eamon and Louis recently wrote about their experience with the project on Medium, which you can read about here.

Using a Game-Focused AI Competition to Help Create Better AI Algorithms  

The Unity Tower Obstacle Challenge may sound like another fun game but there is more to it. According to the Google Cloud blog, “As games have become a prominent arena for AI, Google Cloud and Unity Technologies decided to collaborate on a game-focused AI competition: the Obstacle Tower Challenge. Competitors create advanced AI agents in a game environment. The agents they create are AI programs that take as inputs the image data of the simulation, including obstacles, walls, and the main character’s avatar. They then provide the next action that the character takes in order to solve a puzzle or advance to the next level. The Unity engine runs the logic and graphics for the environment, which operates very much like a video game.”

The Unity blog further explains the development of the Tower Obstacle Challenge, “The idea for the Obstacle Tower came from looking at the current field of benchmarks being used in Artificial Intelligence research today. Despite the great theoretical and engineering work being put into developing new algorithms, many researchers were still focused on using decades-old home console games such as Pong, Breakout, or Ms. PacMan. Aside from containing crude graphics and gameplay mechanics, these games are also completely deterministic, meaning that a player (or computer) could memorize a series of button presses, and even be able to solve them blindfolded. Given these drawbacks, we wanted to start from scratch and build a procedurally generated environment that we believe can be a benchmark that pushes modern AI algorithms to their limits. Specifically, we wanted to focus on AI agents vision, control, planning, and generalization abilities.”

We sat down with Eamon and Louis to learn more about their first time jumping into reinforcement learning research:

What Was Your Experience Like in the Machine Learning Piscine?

Eamon: The machine learning piscine was a lot of work. I learned a lot, but I  wouldn’t recommend going into it unless you are really passionate about machine learning and are willing to commit time. We put a lot of work into it, about 80-100 hours per week. That being said we weren’t part of the piscine for long. I did the first 2 weeks, then we got pulled into the reinforcement learning project. We started with the Google Hashcode problem. Then we were the only ones in the piscine who moved on to the Obstacle Tower Challenge.  

Louis: It was intense, it is a commitment for sure. Dan likes to put pressure on students, it is his teaching strategy. You think you know it, but once you are in the situation you realize it was something you didn’t know. He forces you to go back to the drawing board and process everything perfectly. Explaining things perfectly is hard. You may understand it but to lay out the big picture and fill in details is a skill in itself. Dan does a circle time thing at the end of the day. You have to contribute to the conversation, talk about what you learned and what ways you are hoping to learn. It can be uncomfortable but is good for students.

What Is the Inspiration Behind Your Project?

Louis: There aren’t a lot of constraints with the Unity Obstacle Tower Challenge. You are given the environment. Inspiration for how we approached it came from papers and research in general. It is more about problem-solving.

Eamon: I did a lot of reading and research. It is an academic field that has a lot of recent hype. The final algorithm we used to get to the first round was used by Open AI to beat DotA. But past that we look for cool ideas and research papers and sees if that sticks. There is a lot of interesting, groundbreaking research coming out right now, so we’re in the right spot.

In this particular shot we can see that our agent is focused on the locked door on the left and the key required to open it on the right!

How Does Your Project Work?

Louis: The Obstacle Tower is a video game that is playable by a human player and you can get on a keyboard and press the inputs like you normally would. But they also shipped the game with a small little API that allows it to connect to an agent in this case or some sort of predictive model or algorithm that will play the game. So the game itself, we call that the environment and the model we are training is called the agent. The benefit of AI as a whole is to automate systems on a large scale and to replace basic engineering.  With reinforcement learning, you provide the architecture and the environment to learn in and any problem is solvable.

Eamon: We use reinforcement learning to beat the game. In normal machine learning, you need massive datasets labeled by people. But, in reinforcement learning all we have to do is drop the algorithm in a game, give it an objective and leave it to learn. It’s a lot like how people learn, through trial and error. It’s a process of learning by doing.

What Part of the Project Did You Work On?

Louis: For most of the first round, Eamon and I worked closely together on research and implementation. We had a lot of crazy ideas in the first round and we spent a lot of time experimenting. It was in the last week or so before the first round deadline that we started diverging a bit. I was focused on optimizing our working A3C model to get us past the level 5 threshold, while Eamon was refactoring the whole code base and switching to a better-parallelized architecture in A2C and PPO. Once we secured our spot in the final round, we let our newest models train and I switched gears to evaluation and some desperately needed visualizations.

Eamon: Yeah, Louis and I spent most of the project working on the same things, just splitting what needed to get done with each other. But by the end of the first month, we had kinda hit a dead end and our code was all over the place. When we decided we had to refactor, I got to sit down and rewrite everything while Louis kept making progress with our old codebase. Afterwards, I also did research on a couple of improvements we could make and implement a couple more improvements to make the codebase we’re still using today.

What Was the Most Difficult Aspect?

Eamon: Understanding what our network was and wasn’t doing was the hardest part. The critic models need to know what is good to know what constitutes good or bad to get better. The critic model tells it what is good or bad, you have a second neural network to actually improve the neural network that is learning your game. We figured that the critic network would figure itself out and be fine, and apparently, it was horrible and giving us random data, so it was getting crazy mixed signals. It required a small thing to fix that entirely and catapult that algorithm into what ended up passing the first level. There are two thresholds, you have to be in the top 50, and beat the first 5 levels. We ended up in 19th place out of 200.

Louis: One of the hardest things about RL, in general, is that it’s really difficult to tell whether you’ve done something right or wrong. In RL, things have a tendency to ‘fail silently’. We might implement something with a flawed understanding, or with a hyper-parameter that isn’t ideal, and we wouldn’t know until after potentially hundreds of hours of training. And when we finally get results we don’t expect, we might suspect a bug in the code, but we’d often have no idea what the source might be. Without the proper metrics, visuals, and intuition, its easy to waste a lot of time working on the wrong things or just trying to find a bug.

What Did You Enjoy Most About Your Project?

Eamon: Hanging out in the Robo Lab was fun, being with people who are into machine learning, the future of machine learning, and who are excited about what is going to happen. We were working for a month before we got results, it was an amazing moment to see that we made an algorithm that can learn. We didn’t think it would learn a strategy we didn’t expect. That feeling also is pretty amazing.

Louis: One of the parts I enjoyed the most is the kind of algorithms we are learning about.  With reinforcement learning, in particular, the algorithms are very human inspired. They have characteristics like curiosity and they attach these visceral names to these algorithms that give you an idea of where you are going. You want to encourage your model to do new things in the environment, and as it goes further along it must explore new opportunities instead of focusing on one particular thing. That is inspired by the idea of humans wanting to explore like when you get bored. There are a lot of things in reinforcement learning that tie closely to the way we think. At the end of the day, AI is trying to accomplish that on a holistic level.

One of the video renders the team used to gain intuition about the agent’s decision making.

What Did You Learn From This Project?

Eamon: There are technical things and I learned a lot of soft skills too. Regarding soft skills, for me 42’s learning how to learn was pretty huge. A lot of the technical things came from figuring out all the all the algorithm stuff. It was hard because there were only the research papers and a few blog posts to learn from.

I learned a lot about learning from myself and being forced to explain things in a step by step manner. That is something we had to do a lot, and for me, I never did that before but it is totally the best way to learn something and remember it. When you can break down every step, you understand it. I  have never been challenged to do that and being active and in that position has put it into context. Also, it was helpful to go to meetups and talk to people. 

Louis: We had to question our own intuition all of the time. We think it is important to pick each other’s brains before we jump into it. Learn the reinforcement learning algorithms and machine learning practices. You also have to learn a lot about how to explain what you are doing to others. It is not just getting it to do the right thing, but in industry, you have to explain why your algorithm did something or was built that way.

What Future Do You See for Reinforcement Learning?

Louis: I see this field as the future of technology as a whole. I am confident in it. AI is seeping into every industry and impacting all parts of the world. In 50 years it will be all-encompassing. If we aren’t working on that we aren’t working on what is relevant.

Eamon: I am a little less optimistic. In the short term, everything is valuable, in the long term everything will be augmented by AI.

L to R: Eamon Ito-Fisher and Louis Young in 42’s Robotics Lab

Meet the Team

Name: Eamon Ito-Fisher

Hometown:  Born in Tokyo, but grew up in LA

Interests: Drawing, machine learning, reinforcement learning

Dream Job/Career Interests: Reinforcement learning research, working at Google would be the dream

Name: Louis Young

Hometown:  I was born in Tokyo, but I moved around a lot

Interests: Computer programming, specifically in computers, computer graphics and machine learning. Playing and writing music and video games

Dream Job/Career Interests: Not super clear, but would be some CG machine learning with a large company

published by admin – June 11, 2019