Molmo’s impressive open multimodal models Llama 3.2 adds vision models and small LLMs

Published
Sep 27, 2024
Reading time
4 min read
A friendly robot is giving a presentation to an attentive audience of human beings.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Google’s top Gemini models cut prices, boost performance
  • Microsoft’s new approach to correct hallucinations
  • OpenAI releases a multilingual training dataset
  • Chain-of-thought reasoning has limits

But first:

Ai2 (slightly) beats Meta in releasing open vision-language models

Molmo, a series of open multimodal AI models, achieved performance matching or exceeding proprietary systems like GPT-4 on various benchmarks. The 72 billion parameter model outperformed Gemini 1.5 Pro and Claude 3.5 Sonnet on academic tests and certain vision benchmarks, while the smaller 7 billion parameter models performed between GPT-4V and GPT-4o. Even the 1 billion parameter MolmoE-1B model nearly matched GPT-4V’s capabilities. This development demonstrates that vision models trained on fully open, high-quality datasets can compete with closed systems built using massive computational resources. (Ai2)

Meta’s Llama 3.2 goes multimodal

Meta released Llama 3.2, a family of vision-capable large language models and lightweight text models for edge devices. The new lineup includes 11 billion and 90 billion parameter multimodal models that can reason about images, outperforming Claude 3 Haiku and GPT4-mini on visual understanding tasks. Meta also launched 1 billion and 3 billion parameter models optimized for on-device use with 128K token context lengths, with the 3B model outperforming Gemma 2 2.6B and Phi 3.5-mini on tasks like instruction following and summarization. The company also introduced Llama Stack distributions to simplify deployment across various environments, and updated safety tools, including Llama Guard 3 for vision tasks. (Meta AI)

Google 1.5 Pro and Flash get updates and price cuts

Google announced updated versions of Gemini 1.5 Pro and Gemini 1.5 Flash, offering performance improvements and cost reductions. The new models show a 7% increase in MMLU-Pro scores and approximately 20% improvement on MATH and HiddenMath benchmarks, along with 2-7% gains in visual understanding and Python code generation tests. Google also announced a 50% price cut for Gemini 1.5 Pro, plus increased rate limits and faster output speeds for both models. These updates enable developers to process longer documents, analyze extensive codebases, and create content from hour-long videos more efficiently and at a lower cost. (Google)

Microsoft’s “Correction” seeks to fix LLM hallucinations and other errors

Microsoft introduced “Correction,” a new Azure AI Content Safety feature that uses a two-model approach to detect and revise ungrounded AI-generated content. A classifier model first identifies potentially incorrect, fabricated, or irrelevant text snippets, then a language model rewrites the flagged sections to align with specified grounding documents. The system can be used with various text-generating AI models including Meta’s Llama and OpenAI’s GPT-4, and is built to enhance the reliability of AI outputs in fields like medicine or science where accuracy is crucial. Critics argue that this approach doesn’t address the fundamental issue of AI hallucinations and may create a false sense of security, potentially introducing new problems as the correction models themselves could be prone to errors. (Microsoft and TechCrunch)

OpenAI makes one of its multilingual datasets available to developers and researchers

OpenAI released a dataset of 100 million human-written sentences across 514 languages to help train AI models in non-English languages. The dataset, called OpenAI Translate, was created by translating English texts into other languages using GPT-4 and human reviewers. This release aims to address the global language divide in AI development and improve language models’ capabilities in underrepresented languages. (VentureBeat)

Research suggests chain-of-thought works best for limited subjects

Researchers at UT-Austin, Princeton, and Johns Hopkins analyzed over 100 papers and tested 14 AI models to determine when asking AI to explain its reasoning improves performance. They found that chain-of-thought prompting mainly helps with math and logic tasks but offers little benefit for other problems like language understanding, common sense reasoning, or factual recall. This finding suggests AI developers can use this method selectively to save resources and points to the need for new approaches to enhance reasoning across various tasks. (arXiv)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng discussed AI’s transformative potential in education, highlighting Coursera’s generative AI tools and the ongoing need for innovation in the field.

“There has been a lot of hype about generative AI’s ability to transform industries overnight. Certainly many industries — including education — will be transformed. But we’re about 15 years into the deep learning revolution, and we’re not yet done identifying and building useful deep learning applications. Despite the exciting progress to date with generative AI, I expect that a decade from now we will still be far from finished identifying and building generative AI applications for education and numerous other sectors.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: California passed new laws regulating deepfakes, a local move that could influence national and global legislation; Qwen 2.5 continues the trend of ever-improving open-source large language models; Lionsgate, the studio behind blockbuster franchises like The Hunger Games and John Wick, is embracing video generation technology with the help of AI startup Runway; and a robot capable of playing table tennis is beating human beginners while entertaining expert players.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research