ConVIRT
Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.
While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive