See Robot Play: an exploration of curiosity in humans and machines.

On the role of curiosity in humans and artificial intelligence agents.

Norman Di Palo
Towards Data Science

--

You can find a GitHub repository with the code to reproduce the experiments at the end of the article.

From a survival point of view, the main biological needs that drive animals and humans are not particularly different. Humans and animals need to eat and drink in order to survive, take shelter, and they feel an impulse to reproduce in order to keep the species alive. But, as it is evident, the behavior of humans and animals differ completely. Why is that?

The evolution of the human brain has created areas that are not present or particularly developed in animals, such as the prefrontal cortex, an area generally responsible for reasoning, planning and logical thinking. This development of the brain has led to the creation of completely different impulses and driving forces. One of those drives in a substantial way the behavior of humans; it is the reason why the movie and book industries exist, to name a few, the reason why explorers sailed for months a few centuries ago, and why you are reading this article: curiosity.

Curiosity is the impulse of looking for unexpected things, the need to explore, to discover, to unveil. It is often described as a feature of intelligent species: the need of exploring and expanding ones knowledge has been cited as defining women and men of intelligence and virtue even in the great Greek myths.

It is thus intuitive to create a link between curiosity in human beings and artificial intelligence: researcher believe that emulating these phenomena in artificial brains will be fundamental in the creation of true machine intelligence. In this article we will explore how to create curiosity in the electronic brain of an agent, and what effect does it have in its actions, discovering how it will led to behaviors that are, in a sense, very human.

We started this article talking about needs and impulses. How do we simulate those biological signals in machines? One very popular framework is Reinforcement Learning (RL). In RL, an agent, such as a robot, can observe the environment and its own state, like its position in space, and take actions accordingly. The main goal of an agent is to get a high reward. A reward is a signal that we define, and that should tell the agent which behaviors are good and which ones should be avoided. Positive rewards, from a biological point of view, could be given by eating and drinking. Negative rewards could be given by causing damage to itself. For example, in a videogame like Pac-Man, rewards are given for eating fruit, while negative rewards are given after being eaten by ghosts, and this is enough to teach the player how to beat the game.

In recent years, deep reinforcement learning has obtained great achievements, such as learning to play Atari videogames at a superhuman level, beating the world’s best Go players, and learning how to control complex simulated robots. Let’s focus on this last example: to teach robots to move, the environment rewards the agent for moving forward and not falling, thus a RL algorithms learns a behavior that leads to forward movement, such as a gait or walking pattern, by trial and error. These rewards are called extrinsic: they come from the environment, they are a part of the task so that the agent can learn how to accomplish it. But there exists a second type of rewards: intrinsic rewards. Intrinsic rewards are not part of the environment, but are generated by the agent itself. The agent can reward itself for discovering new things or reaching new unseen states, whichever the final environment task is. In this category we can find curiosity: it is indeed a form of reward for discovering new, surprising things, similarly to how you feel rewarded if you discover a new interesting article, book or restaurant. So, how is curiosity created in the agent? To understand this, we have to understand what is a predictive forward model, first.

Explorers sailed to discover new lands driven by curiosity.

In computer science and control theory, we can create models that predict the next state given a current state and an action. These models, often neural networks, predict the immediate effect of actions on the future based on experience. You know what will happen by throwing a ball in the air with a certain strength because you have experienced it before. Thus, a neural network can learn to predict what will happen next to, let’s say, a robot arm, by learning from previous experiences of observations, actions, and new observations. This is a predictive forward model in a nutshell.

Your brain is, in a sense, constantly predicting the near future, based on the immediate past. As some neuroscience studies suggest, the brain is a predictive machine. Thus, you are surprised if things don’t go as expected. You probably make the same route to work almost everyday and all these memories just fade into each other. But what if one day you saw a vehicle catching fire in the middle of the road? You would surely remember it for years after, possibly remembering exactly even the date. Similarly, this is why you read a book, an article, see a movie, travel somewhere: to see and learn something that you didn’t expect or already know. This driving impulse, now that we know what a forward model is, can be reproduced in machines: an artificial intelligence agent can reward itself by doing actions that lead it to surprising states. Computationally, the surprise is the difference between the expected future and the future that actually happened, from a state, doing a certain action.

We can now introduce the environment where we will do our experiments: a simulated Fetch Robot, that can push around a box with its arm. This is the FetchPush-v1 environment in the famous OpenAI’s gym library. The goal of the agent is to push the box to its target position, the red ball. But we don’t care about this task now: as explained before, we are interested in intrinsic rewards, and we want to see what happens by guiding the agent with curiosity.

Our robot, Fetch, can move its end-effector around in three dimension. At every step, it can observe its position in space, and also the position of the cube. We can create an internal predictive forward model in the agent by making it experience its environment, by simply moving the arm around. Based on this experience, the agent will quickly learn what happens when it gives its arm a command: the arm moves a little in the specified direction. Thus the forward model will quickly become very good in predicting the movement of the arm. But a second thing can happen when giving an action command to the arm: the arm can touch the cube and move it. While this is intuitive for us, it is not for the robot, that is exploring the world for the first time like an infant. Learning to predict what happens to the cube when it is touched is quite harder, both for the complex physics of contact forces, and for the fact that, in its initial exploration, the robot will touch the cube just a few times, since the whole operational space is quite large and the cube is very small. So, the robot will experience that the cube is unaffected by the movement of the arm 99% of the times, and probably thinks that the cube is forever still. Thus, the predictive forward model will have an hard time predicting the movement of the cube. And here’s where we can see the effects of curiosity on the robot.

Two of the Fetch gym environments. We will focus on FetchPush (left).

Now that the robot has learned an initial forward model, we can make it explore the environment more, moved by curiosity. The robot will then try to find actions for which the outcome is surprising. As anticipated, moving the arm around is generally boring for the robot, since it knows well what will happen. But, little by little, it will learn that it finds surprising what happens when it touches the cube. The cube moves in an unpredicted way, and this is cause of surprise for it, tickling its curiosity. And so, just as a baby, the robot will learn to play with the cube because it is, in some sense, fun. Interestingly, it discovers that the most unpredictable thing happen when it pushes the box out of the table, and it falls on the ground.

Fetch learns to play with the cube guided by curiosity.

The robot has learned to play with the cube without any external guidance or signal. It doesn’t know about the goal of the task, it’s just trying to explore the world around it and trying to find surprising things because it’s curious. And this simple intrinsic motivation has made it discover the cube, that like a toy is now its main interest. As described in this blog post, curiosity can greatly help an agent in exploring its surroundings, finding things that would otherwise remain unseen. With curiosity, an agent can learn to activate rare events in an environments, and little by little learn about all the underlying mechanics of a complex system.

In the recent years, researchers in the field of AI have studied the effects of curiosity in agents in a wide range of environments. One of the most interesting results came from applying curiosity to agents playing videogames. A recent study has demonstrated how, if guided by curiosity, an agent can learn to play several levels of Super Mario Bros. without any external reward. It has no interest in breaking records, just a strong interest in discovering what happens next, and a desire to find new and unexpected things. In this game, the best way to do this is by proceeding in a level and discovering the next areas. And to do this, the agent has to learn how to survive, avoiding enemies and traps, just to discover new things.

Another study showed how this actually happens in several Atari videogames: an agent can learn how to play quite well a game just by following this intrinsic reward. But what’s really interesting is a conclusion that the researchers wrote in the paper: this result is not only an achievement of AI, but also a good insight in the effect of curiosity in humans as well. We play videogames because they’re fun, and they are a source of new stimuli, experiences and challenges. Thus, a well designed videogame should be created around rewarding the curiosity of the player, and this is why an AI agent driven by curiosity can learn to play those games.

Curiosity is an essential part of human intelligence. It doesn’t only characterize the human behavior, but it’s also an essential tool in building further intelligence and knowledge: without curiosity we cannot discover new things unless they bump into us. This is why, to build truly intelligent machines, it is fundamental to characterize and model curiosity and the other intrinsic stimuli that are generated by our brains, and have driven mankind in its constant evolution.

Feel curious? You can find all the code in this GitHub repository.

Thank you for reading this far! You can follow me on Twitter (@normandipalo) to follow my work and research.

--

--