Multimodal Event Representation Learning Over Time (MERLOT)
Richer Video Representations: Pretraining Method Improves AI's Ability to Understand Video
To understand a movie scene, viewers often must remember or infer previous events and extrapolate potential consequences. New work improved a model’s ability to do the same.