Mistral’s Ministral family tops other local models Open AI’s Swarm orchestrates teams of agents

Published

Oct 18, 2024

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

NotebookLM lets you add prompts for better podcasts
Apple researchers doubt whether current LLMs can reason
OpenAI tests ChatGPT for gender and racial bias
Perplexity squares off with The New York Times

But first:

Mistral releases powerful new models for local and edge computing

Mistral AI introduced two new language models called Ministral-3B and Ministral-8B, designed to run on personal computers and smaller devices. Both models outperform competitors of similar size on knowledge, reasoning, and coding benchmarks. Released under the Mistral Research License (a paid license is required for commercial use), the model accommodates a 128,000-token context window, offers multilingual and code capabilities, and enables function calling. These features make it a strong option for researchers and developers working on local AI applications. (Mistral AI)

Swarm helps developers experiment with multi-agent systems

OpenAI’s open-source experimental framework Swarm showcases how multiple AI agents can work together smoothly. While not officially supported or intended for production use, it serves as an educational tool for developers exploring multi-agent systems. Swarm uses two key concepts: agents (with defined instructions and tools) and handoffs (allowing agents to pass tasks to one another). Built on OpenAI’s Chat Completions API, Swarm operates statelessly between calls. It’s particularly useful for scenarios requiring diverse, independent capabilities: example cases include customer service, personal shopping, and weather forecasting. (GitHub)

NotebookLM adds new features and a pilot to test future tools

Google removed the “Experimental” label from NotebookLM, its AI-powered tool for understanding complex information. The company introduced new features for Audio Overviews, including the ability to guide conversations and listen in the background while working within the app. Google also announced NotebookLM Business, an upcoming version for organizations with enhanced features, and opened applications for a pilot program for business users. (Google)

New test shows flaws in AI models’ math and logic abilities

Apple researchers developed GSM-Symbolic, an improved benchmark to assess large language models’ mathematical reasoning skills. Their study found that even state-of-the-art AI models show significant performance variations when solving different versions of the same math problem. The models’ accuracy decreased when numerical values were altered or question complexity increased. Notably, adding irrelevant information to problems led to substantial performance drops across all tested models. These findings suggest that current AI systems may not truly understand mathematical concepts or perform logical reasoning, but instead rely on sophisticated pattern matching from their training data. (arXiv)

OpenAI releases study of first-person bias in its own systems

OpenAI researchers examined how ChatGPT’s responses varied when given identical prompts but different usernames marking different genders, races, or ethnicities. The study found that different names did frequently elicit different responses, but less than 0.1% of responses on average contained harmful stereotypes, with older models showing higher rates up to 1% for certain tasks. The research paper shows how OpenAI’s use of human feedback in post-training helped mitigate these biases. This research provides a benchmark for measuring bias in AI language models and highlights the importance of ongoing efforts to improve fairness in AI systems. (OpenAI)

The New York Times and Perplexity clash over news summaries

The New York Times sent a cease-and-desist letter to Perplexity AI, demanding the startup stop using the newspaper’s content for generative AI purposes, claiming copyright violations. Perplexity responded that it doesn’t scrape data for building foundation models, but instead indexes web pages and surfaces factual content as citations when users ask questions. This marks the latest in a series of disputes between Perplexity and news publishers, highlighting anxieties over AI search engines and summaries of copyrighted material. (Reuters)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng argued for considering geoengineering as an important potential tool to mitigate climate change.

“While stratospheric aerosol injection (SAI) — which sprays particles (aerosols) in the atmosphere to provide a small amount of shade from the sun — is far from a perfect solution, we should take it seriously as a possible tool for saving lives.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Malaysia experiences a data center boom driven by its strategic location, natural resources, and investor-friendly policies; the U.S. launches Operation AI Comply to crack down on AI applications that overpromise and underdeliver; a new report highlights the contending forces shaping AI, including the battle between open and proprietary technology; and researchers introduce a better text embedding model with adapters specialized for tasks like retrieval, clustering, and text classification.

Subscribe to Data Points