ConVIRT

3 Posts

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

While models like GPT-3 and EfficientNet, which work on text and images respectively, are responsible for some of deep learning’s highest-profile successes, approaches that find relationships between text and images made impressive

AI-generated images with the model DALL-E

ConVIRT

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.

ConVIRT

Learning From Words and Pictures: A deep learning method for medical x-rays with text

It’s expensive to pay doctors to label medical images, and the relative scarcity of high-quality training examples can make it hard for neural networks to learn features that make for accurate diagnoses.

ConVIRT

Multimodal AI Takes Off: Multimodal Models, such as CLIP and DALL·E, are taking over AI.

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Learning From Words and Pictures: A deep learning method for medical x-rays with text

Subscribe to The Batch