Which comes first, training a reinforcement learning model or extracting high-quality features? New work avoids this chicken-or-egg dilemma by doing both simultaneously.
What’s new: Aravind Srinivas and Michael Laskin at UC Berkeley offer Contrastive Unsupervised Representations for Reinforcement Learning (CURL). The authors propose contrastive learning to extract features during RL.
Key insight: In many RL scenarios, the model learns by interacting with its environment. To extract features, it must capture training data while learning, so pre-trained feature extractors don’t generalize well to novel situations. Contrastive learning, which has been successfully applied to self-supervised learning, extracts similar features for similar inputs and dissimilar features for dissimilar inputs. This doesn’t require pre-training, so the researchers figured that reinforcement and contrastive learning could go hand-in-hand.
How it works: The authors essentially combined an RL agent of the user’s choice with a high-performance contrastive learning model that draws techniques from SimCLR, MoCo, and CPC. The two learn independently.
- The RL agent observes multiple images in sequence.
- The contrastive learning model applies two data augmentations to the observations, for instance a pair of random crops.
- CURL learns to extract similar feature vectors from each version.
- The RL agent learns from the extracted features.
Results: The researchers tested CURL with Rainbow DQN in 42 tasks. They compared its performance against state-of-the-art pixel-based models with similar amounts of training. CURL collected rewards an average 2.8 times larger in DMControl and 1.6 times larger in Atari games. It achieved this performance in DMControl in half the training steps.
Why it matters: A typical solution to the chicken-or-egg problem is to collect enough data so that it doesn’t matter whether RL or feature extraction comes first. CURL cuts the data requirement.
We’re thinking: We’ve been excited about self-supervised learning for some time and are glad to see these techniques being applied to speed up RL as well.