Individual deep learning models proved their mettle in hundreds of tasks.
What happened: The scope of multi-task models expanded dramatically in the past year.
Driving the story: Researchers pushed the limits of how many different skills a neural network can learn. They were inspired by the emergent skills of large language models — say, the ability to compose poetry and write computer programs without architectural tuning for either — as well as the capacity of models trained on both text and images to find correspondences between the disparate data types.
- In spring, Google’s PaLM showed state-of-the-art results in few-shot learning on hundreds of tasks that involve language understanding and generation. In some cases, it outperformed fine-tuned models or average human performance.
- Shortly afterward, DeepMind announced Gato, a transformer that It learned over 600 diverse tasks — playing Atari games, stacking blocks using a robot arm, generating image captions, and so on — though not necessarily as well as separate models dedicated to those tasks. The system underwent supervised training on a wide variety of datasets simultaneously, from text and images to actions generated by reinforcement learning agents.
- As the year drew to a close, researchers at Google brought a similar range of abilities to robotics. RT-1 is a transformer that enables robots to perform over 700 tasks. The system, which tokenizes actions as well as images, learned from a dataset of 130,000 episodes collected from a fleet of robots over nearly a year and a half. It achieved outstanding zero-shot performance in new tasks, environments, and objects compared to prior techniques.
Behind the news: The latest draft of the European Union’s proposed AI Act, which could become law in 2023, would require users of general-purpose AI systems to register with the authorities, assess their systems for potential misuse, and conduct regular audits. The draft defines general-purpose systems as those that “perform generally applicable functions such as image/speech recognition, audio/video generation, pattern-detection, question-answering, translation, etc.,” and are able to “have multiple intended and unintended purposes.” Some observers have criticized the definition as too broad. The emerging breed of truly general-purpose models may prompt regulators to sharpen their definition.
Where things stand: We’re still in the early phases of building algorithms that generalize to hundreds of different tasks, but the year showed that deep learning has the potential to get us there.