Mar 15, 2023
Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI.
Recent multimodal models process both text and images as sequences of tokens, but they learn to represent these distinct data types using separate loss functions. Recent work unifies the loss function as well.