GILL (Generating Images with Large Language Models)
Text or Images, Input or Output: GILL, an innovative approach to multimodal model training
GPT-4V introduced a large multimodal model that generates text from images and, with help from DALL-E 3, generates images from text. However, OpenAI hasn’t fully explained how it built the system. A separate group of researchers described their own method.