Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Copilot may lead to more bugs than productivity gains
- OpenAI prunes its Whisper model for faster completions
- A new study measures top AI companies’ redteaming efforts
- An open-source CLI coding assistant, o1-engineer
But first:
Apple’s multimodal models focus on data curation
Apple introduced MM1.5, a new series of multimodal large language models designed to improve text-rich image understanding, visual referring and grounding, and multi-image reasoning. The models, ranging from 1 billion to 30 billion parameters, include dense and mixture-of-experts variants and demonstrate strong performance even at smaller scales. Apple’s approach focuses on careful data curation and training strategies, offering insights that could guide future research in multimodal large language model development. (arXiv)
Liquid announces benchmarks for a new family of math-driven language models
A new series of Liquid Foundation Models (LFMs) claims to achieve state-of-the-art performance in their size classes (1.3B, 3.1B, and 40.3B) on multiple benchmarks, with smaller memory footprints and more efficient inference. The models can output multiple media types, including text, audio, images, and video, using a process that converts raw data into structured feature representations. The models’ unusual architecture incorporates specialized computational units for token and channel mixing, adaptive linear operators, and weight and feature sharing mechanisms, potentially leading to more versatile and resource-efficient AI systems. (Liquid)
Study suggests coding assistants may not boost productivity
A recent study by Uplevel found no significant productivity gains for developers using GitHub Copilot, contrary to widespread claims about AI coding assistants. The research, which compared the output of 800 developers before and after adopting Copilot, measured pull request cycle time and throughput, finding no substantial improvements. Additionally, the study revealed that Copilot usage introduced 41 percent more bugs, challenging the notion that AI coding tools consistently enhance developer efficiency and code quality. (CIO.com)
OpenAI speeds up Whisper model for quicker speech recognition
OpenAI released Whisper large-v3-turbo, a streamlined version of its leading speech recognition model. The new variant reduces the number of decoding layers from 32 to 4, significantly increasing speed while only slightly decreasing quality. This development offers AI developers a more efficient option for implementing advanced speech recognition capabilities in their applications. (Hugging Face)
Evaluating AI leaders’ redteaming and other safety measures
A new risk management assessment from Safer AI (part of the US AI Safety Consortium) reveals shortcomings in AI companies’ risk management practices. The report ranks companies on a 0-5 scale, with Meta, Mistral AI, and xAI scoring lowest at 0.7, 0.1, and 0 respectively, while Anthropic, OpenAI, and Google DeepMind lead with scores of 2.2, 1.6, and 1.5. These findings are based on the AI companies’ own disclosures of their red-teaming and risk management practices, suggesting the lowest scoring organizations have been either slow to implement or not transparent about standard safety measures. (Safer AI)
New open-source CLI tool leverages o1-preview
O1-engineer is a command-line tool that uses OpenAI’s API to assist developers with code generation, file management, project planning, and code review. The tool features an interactive console, conversation history management, and enhanced file and folder operations to help streamline development workflows. O1-engineer can also automate routine tasks and provide intelligent support throughout the development process. (GitHub)
Still want to know more about what matters in AI right now?
Read last week’s issue of The Batch for in-depth analysis of news and research.
Last week, Andrew Ng celebrated the veto of California’s anti-innovation bill SB 1047 by Governor Newsom, highlighting the efforts of AI experts and advocates who worked to defeat the legislation and stressing the importance of evidence-based regulation in the field of AI.
“SB 1047 makes a fundamental mistake of trying to regulate technology rather than applications. It was also a very confusing law that would have been hard to comply with. That would have driven up costs without improving safety.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: Meta expands its Llama Herd with updates to its Llama models, adding vision-language capabilities, edge sizes, and agentic APIs; Adobe integrates AI video generation tools into Premiere Pro, bringing generative video directly into the editing suite; a global coalition endorses international guidelines for the responsible use of AI in military applications; and researchers develop a method enabling large language models to accurately process and answer questions from complex spreadsheets.