Hopp til hovedinnhold

Recently, machine learning research has increasingly been focused on general learning algorithms where the same algorithm can perform a huge variety of tasks. The ultimate goal by many reinforcement learning researchers is to create machines that can learn to solve any general task, just like humans! Yesterday's article introduced you to the concept of reinforcement learning, and today we're going to take a brief look at some of the coolest projects and greatest breakthroughs in the field of self-learning machines.

While there are many companies specializing in Reinforcement Learning these days, we'll take a deeper look at one who has been focusing on learning machines to play increasingly complex games, namely DeepMind. By having the same algorithm learn to play multiple games from scratch without being explicitly programmed to play any of them, they get one step closer to general artificial intelligence (or singuarity, which may well be the end of all humanity).

This is what DeepMind demonstrated when they had a go at Atari games in 2013. As mentioned in yesterday's article, they presented a deep learning model that learnt to play seven different Atari games from scratch. The groundbreaking thing about this achievement was that the machines didn't need any information other than the game frames to learn. They were essentially able to learn playing the games just by looking at the screen (like humans!) while smashing random buttons (like humans playing Tekken!) – until their random actions eventually started paying off. The machines would then recognize which actions paid off and which didn’t, and after trying again and again, occasionally succeeding while failing a million times along the way, they would become experts. With no human guidance.

Breakout

Breakout was one of the Atari games that DeepMind's reinforcement learning model learnt to play.

The following years, DeepMind went ahead and beat increasingly complex games, utilizing novel reinforcement learning techniques to beat games that were previously thought to be extremely hard for computers to be good at because of their high complexity. In 2015, they gave birth to AlphaGo.

While machines have been better than every living human at chess for about 20 years, they have struggled to beat humans at its Asian cousin Go. The amount of possible moves at each turn is so high that it’s infeasible for computers to play it in the same brute-forceish way that made them succeed at chess. To beat Go, you needed something more intuitive, something more human. This lead DeepMind to develop AlphaGo, the first computer to surpass humans in the ancient board game.

AlphaGo

Instead of looking at all the possible moves, AlphaGo uses a Monte Carlo tree search to suggest moves based on knowledge it has previously learnt by its neural network. This makes it think in a way that might make it resemble humans more than computers.

Remember: DeepMind's ultimate goal is to solve general intelligence, to create computers capable of learning any task. While AlphaGo trained and improved on its own, it also needed human games to learn from, so the natural next step was to create a version that could learn only by itself; humans are after all fallible. In 2017, they created AlphaGo Zero and Alpha Zero, programs that were capable of teaching itself how to play with no human guidance. AlphaZero also learnt to play chess and shogi better than all previous players, both humans and machines. With its ability to learn to play any two-player perfect information games, it had broken another milestone in the quest for general intelligence.

But we're not done yet. Only one month ago, DeepMind presented a new model, MuZero, with an even higher degree of generality. Although AlphaZero learnt to play games with no human guidance on how to excel at them, they obviously needed to know the rules in advance, right? Well, this new model doesn't even need to know the rules of the games it plays. By looking at "pictures" of different games, it learns to predict the value function (who's winning), the policy (which action or move to play) and reward function (the action's result). Consequently, MuZero achieved state-of-the-art performance in 57 Atari games as well as matching the performance of AlphaZero at chess, go and shogi! By not needing to know the rules of the games it plays, MuZero can more easily generalize to real-life problems where the dynamics of the environment are unknown.

7 years ago, none of the reinforcement learning examples mentioned in this article existed, and researchers come up with new techniques that improve the machines' ability to learn on a weekly basis.

Who knows where we (or rather, the machines) will be after the next 7?

Did you like the post?

Feel free to share it with friends and colleagues