Playing a team sport involves a fluid blend of individual and group skills. Researchers integrated both types of action into realistic humanoid agents that play football (known as soccer in the U.S.).
What's new: Siqi Liu, Guy Lever, Zhe Wang, and colleagues at DeepMind developed a method for training simulated football teams that learned to run, pass, defend, and score goals on a physically accurate virtual field. You can see the output here.
Key insight: Football players must control their own muscle motions over time spans measured in milliseconds while collaborating with teammates over greater intervals. By training in stages — starting with lower-level controllers that operate on short time scales for things like running and moving on higher-level controllers that operate on longer time scales for, say, teamwork — agents can learn to move both independently and cooperatively.
How it works: The authors trained 16 agents to compete in two-member teams. An agent could apply torques to its 56 joints; track its own joint angles, positions, and velocities; and observe the positions and velocities of other players and objects on the field. All model architectures were vanilla neural networks.
- In the first stage of training, a model learned motions like running and turning. The authors trained an encoder and decoder via supervised learning to predict an agent's motion, given 105 minutes of motion-capture data from real players in scripted scenes. The encoder learned to convert the agent’s physical state into a representation, while the decoder learned to convert the representation into torques on joints. The same decoder was used in subsequent steps.
- In the second stage, separate encoders learned via reinforcement learning to perform four drills: following a point, following a point while dribbling, kicking a ball to a point on the field, and shooting a goal. Each encoder learned representations of not only the agent’s physical state but also the drill, such as the point to be followed. The decoder determined how the agent should move its joints.
- Four additional encoders learned via supervised learning to re-create the drill model’s representations without access to information about where to run or kick the ball.
- Finally, the agents learned via reinforcement to compete in teams. An encoder learned to combine the drill representations and passed the result to the decoder to determine the agent’s motion. The model received +1 when its team scored a goal and -1 when its team was scored upon. Further rewards encouraged the player closest to the ball to advance it toward the opponents’ goal.
Results: The agents’ skills increased with the number of training episodes. For example, at initialization, when an agent fell, it got up 30 percent of the time. After 375 million training steps in competition, it righted itself 80 percent of the time. Likewise, at initialization, when an agent touched the ball, it executed a pass 0 percent of time. After 80 billion training steps in competition, it passed the ball in 6 percent of touches.
Why it matters: It may take more than one training mode to teach all the skills required to perform a complex task. In this case, the authors combined supervised learning, reinforcement learning, and training in teams.
We’re thinking: How to build agents that operate at both short and long time scales is a longstanding problem in reinforcement learning. The authors solved it by specifying the skills at each time scale manually. The next step is to design agents that can learn that abstraction on their own.