Humanoid robots can play football (known as soccer in the United States) in the real world, thanks to reinforcement learning.
What’s new: Tuomas Haarnoja and colleagues at Google and University of Oxford trained an agent to play one-on-one football in a simulated environment. They applied the agent to 20-inch hardware robots on a scaled-down field. You can see it in action here.
Key insight: In reinforcement learning, an agent improves as it explores various motions. However, such exploration risks damaging expensive hardware. By training in a simulation, the agent can attempt a diversity of motions without risking a physical robot. Once the agent is trained, it can make the leap from simulation to reality.
How it works: The agent learned in a virtual world to control the robot’s motion given (i) a simulated robot’s state (including the position, velocity, and acceleration of each of 20 joints), (ii) the current game state (including the location and velocity of the ball and opponent), (iii) the game state at each of the last five time steps, and (iv) the agent’s five previous actions. Training proceeded via reinforcement learning in two stages.
- During the first stage of training, the authors trained two teachers, both of which were vanilla neural networks. (i) The first teacher learned to predict movements that help a simulated robot score goals against an untrained opponent that immediately fell over. The teacher earned rewards for scoring and was penalized for falling over or letting the opponent score, among other rewards and penalties. (ii) The second teacher learned to make a fallen simulated robot stand up. It received larger rewards for smaller differences, and smaller rewards for larger differences, between the robot’s joint positions and the joint positions for key robot poses recorded during a manually designed process of standing up.
- The second stage of training involved another agent, also a vanilla neural network. This agent played a match against a previous version of itself in which each agent controlled a simulated robot. It received rewards for moving the robot’s joints in ways that helped it win the match or resembled the two teachers’ movements; this encouraged the agent to score goals and stand up after falling. To better approximate real-world conditions, the authors randomly perturbed the simulation, adding noise to the sensors that measured the robot’s actions and delaying parts of the simulation. They also restricted the joints’ range of motion to prevent the simulated robot from acting in ways that would damage a hardware robot.
- At inference, the trained agent controlled an off-the-shelf Robotis OP3 humanoid robot, which costs around $14,000.
Results: The agent learned not only to turn and kick but also to anticipate the ball’s motion and block an opponent’s shots. It scored penalties against a stationary goalie with 90 percent success in simulation and 70 percent success in the physical world. It stood up in 0.9 seconds on average, while a manually designed agent stood up in 2.5 seconds. Its maximum walking speed of 0.69 meters per second beat the manually designed agent’s 0.27 meters per second. However, its kicks propelled the ball at 2.0 meters per second on average, slower than the manually designed agent’s 2.1 meters per second.
Why it matters: Controlling humanoid robots is challenging, as they’re less stable than quadrupeds. Just getting them to do one type of motion, such as jumping, can require dedicated research. This work drives humanoid robots in complex motions by combining established training methods: training in a noisy simulation, self-play, and using teacher agents to reward particular actions.
We’re thinking: This work demonstrates that robots get a kick out of machine learning.