Emu3 claims “next-token prediction is all you need” Black Forest updates FLUX image model, adds a developer API

Published
Oct 4, 2024
Reading time
3 min read
A serene outdoor scene where a person is painting on a large canvas set against a beautiful natural landscape.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • OpenAI introduces new canvas UI for editing with ChatGPT
  • Google’s chip design model gets a new (but familiar) name
  • Microsoft launches some of the features it promised with Recall
  • Aider wants to give the right models the right jobs

But first:

Emu3, a next-token, open-source multimodal model

BAAI unveiled Emu3, a suite of multimodal AI models trained solely with next-token prediction on tokenized images, text, and videos. The models (including chat, generative, and tokenizer versions) outperform established competitors in both generation and perception tasks, surpassing open models like SDXL, LLaVA-1.6, and OpenSora-1.2, without using diffusion or compositional architectures. BAAI released Emu3 on GitHub under the Apache 2.0 license, allowing developers and researchers to freely use, modify, and distribute the models. (GitHub)

Black Forest’s image generator gets an update along with a new API

Black Forest Labs released FLUX1.1 [pro], a text-to-image model three times faster than its predecessor that outperforms competitors on the Artificial Analysis image arena benchmark. The company also launched a beta version of its API, allowing developers to integrate FLUX’s capabilities into their applications with advanced customization options and competitive pricing, with FLUX1.1 [pro] priced at 4 cents per image. This release challenges larger tech companies by offering developers a cost-effective alternative for integrating cutting-edge image generation into their products and workflows. (Black Forest Labs)

ChatGPT’s canvas offers new interfaces to edit writing and code

OpenAI launched canvas, a new interface for ChatGPT that allows users to collaborate on writing and coding projects beyond simple chat interactions. Canvas opens in a separate window, enabling users to edit text or code directly while receiving inline feedback and suggestions from ChatGPT. This new feature aims to provide a more context-aware environment for complex projects, allowing users to highlight specific sections for focused assistance and offering shortcuts for common tasks like adjusting length of text sections or debugging code. (OpenAI)

Google revisits its learning-based chip design model, names it “AlphaChip”

Google officially named its deep reinforcement learning method for chip layout generation “AlphaChip” and addressed misconceptions about its capabilities. The company emphasized that AlphaChip’s performance improves with pre-training on chip blocks and scales with computational resources, achieving up to 6.2 percent wirelength reduction compared to human experts in recent Tensor Processing Unit designs. Google also clarified that AlphaChip doesn’t require initial placement data and may need adjustments for older chip technologies, while highlighting its successful deployment in multiple generations of Google’s AI accelerators and its adoption by other chipmakers like MediaTek. (DeepMind and Nature)

Copilot gets new eyes and a voice, with privacy baked-in

Microsoft launched new capabilities for its Copilot AI assistant, including Copilot Vision, which can analyze and respond to questions about on-screen content in Microsoft Edge. The company also introduced Think Deeper, a feature designed to tackle more complex problems, and Copilot Voice, which enables voice interactions with the AI. All of these features are based on OpenAI models fine-tuned by Microsoft. Microsoft addressed privacy concerns raised after its initial announcement of Recall, stating that Copilot Vision deletes data immediately after conversations and doesn’t store processed audio, images, or text for model training. (Microsoft)

Aider’s coding assistant tests models’ performance in different tasks

Aider, an AI coding assistant, now uses separate “Architect” and “Editor” models to handle code reasoning and editing tasks respectively. This approach achieved state-of-the-art results on Aider’s code editing benchmark, with OpenAI’s o1-preview as the Architect and either DeepSeek or o1-mini as the Editor scoring 85%. The two-model system allows each AI to focus on its specific task, potentially improving overall performance and efficiency for AI developers. (Aider)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng celebrated the veto of California’s anti-innovation bill SB 1047 by Governor Newsom, highlighting the efforts of AI experts and advocates who worked to defeat the legislation and stressing the importance of evidence-based regulation in the field of AI.

“The fight to protect open source is not yet over, and we have to continue our work to make sure regulations are based on science, not science fiction.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Meta expands its Llama Herd with updates to its Llama models, adding vision-language capabilities, edge sizes, and agentic APIs; Adobe integrates AI video generation tools into Premiere Pro, bringing generative video directly into the editing suite; a global coalition endorses international guidelines for the responsible use of AI in military applications; and researchers develop a method enabling large language models to accurately process and answer questions from complex spreadsheets.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research