ElevenLabs drops latency to 75 milliseconds New Falcon3 helps push the field for smaller language models

Published

Jan 6, 2025

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

Nvidia promises to open source Run:ai
SALT inverts distillation by having a smaller model train a larger one
SWE-Gym offers new way to fine-tune coding agents
Llama put to work to recommend books on Scribd

But first:

ElevenLabs introduces Flash, a low-latency speech generation model

ElevenLabs unveiled a new AI model that generates speech in as little as 75 milliseconds (plus application and network latency). The model is available in two versions: Flash v2 for English and Flash v2.5 for 32 languages, both accessible through ElevenLabs’ Conversational AI platform or API. While Flash sacrifices some quality and emotional depth compared to ElevenLabs’ Turbo models, it outperforms comparable ultra-low-latency models in blind tests, optimizing it for developers creating real-time conversational AI applications. (ElevenLabs)

Falcon3 models push boundaries for smaller model performance

The Technology Innovation Institute in Abu Dhabi released Falcon3, a family of large language models, all with fewer than 10 billion parameters. The new models, which include five base versions ranging from 1 billion to 10 billion parameters, employ single pre-training runs, depth up-scaling, and knowledge distillation to improve performance while reducing training costs. Falcon3 models demonstrate strong capabilities in areas such as math, coding, and scientific knowledge, outperforming larger models in several benchmarks and offering AI developers more efficient open options for their applications. (Hugging Face)

Nvidia acquires Run:ai, will open source its GPU orchestration software

Nvidia finalized its acquisition of Run:ai, a GPU orchestration software company, for a reported $700 million. Run:ai’s founders, Omri Geller and Ronen Dar, announced plans to open source the company’s software while maintaining their “open-platform philosophy” and continuing to support multiple AI chips and platforms. This acquisition further strengthens Nvidia’s position in the AI industry, challenging competitors like AMD and Intel to respond with their own strategic moves. (Yahoo and Run:ai)

Google DeepMind’s SALT method speeds up large language model training

Researchers at Google DeepMind introduced SALT (Small model Aided Large model Training), a novel approach that uses smaller language models to improve the efficiency of training large language models (LLMs). The two-phase method leverages smaller models to provide soft labels and select valuable data subsets, reducing computational requirements by 28 percent while improving model performance. SALT-trained LLMs outperformed baseline models on various benchmarks, including reading comprehension and commonsense reasoning, demonstrating better generalization capabilities. This technique could help democratize access to advanced AI technologies by making LLM development more accessible to institutions with limited computational resources. (arXiv)

New environment SWE-Gym fine-tunes software engineering agents

Researchers at UC Berkeley, UIUC, Carnegie Mellon, and Apple developed SWE-Gym, a novel environment for training software engineering AI agents. Using 2,438 real-world Python tasks from GitHub issues, SWE-Gym offers pre-configured executable environments and expert-validated test cases, addressing limitations of previous benchmarks that lacked comprehensive training environments. Post-training with SWE-Gym significantly improved AI agents’ performance on existing benchmarks, with fine-tuned models showing increased task resolution rates and reduced failures in real-world settings. (arXiv)

Llama models power Scribd’s new AI book discovery tool

Scribd enhanced Everand’s Ask AI feature using three open source Llama models to improve content discovery across its library of over 195 million items. The new system combines Llama 3.1’s 8B, 70B, and 405B models to create a more intuitive AI assistant that understands user intent and provides personalized recommendations. This new tool highlights the potential of open source AI models to change how users interact with large digital libraries, offering more precise and engaging content discovery experiences. (Meta)

Still want to know more about what matters in AI right now?

Read last week’s special issue of The Batch for an inspiring glimpse into AI’s potential in 2025, featuring insights from leading experts on generative AI, cinematic creativity, generalized intelligence, and the future of prosocial platforms.

In last week’s letter to readers and learners, Andrew Ng highlighted the excitement around AI’s potential in 2025, emphasizing the ease of building software prototypes with AI-assisted coding and its impact on productivity, creativity, and learning. He encouraged readers to make a learning plan, build prototypes, and embrace the fun and educational journey of creating with AI.

“Even small wins — like the flash cards I printed out, which inspired my daughter to spend an extra 20 minutes practicing her multiplication table last night — make life better. Perhaps you’ll invent something that really takes off. And even if you don’t, you’ll have fun and learn a lot along the way.”

Read Andrew’s full letter here.

Our New Year special issue explores the transformative potential of AI in 2025: generative AI liberating artists to focus on creativity while ensuring safety and accessibility; video models revolutionizing cinematic storytelling with integrated audio and video; AGI driving personalized and contextual interactions; data-efficient models enabling broader accessibility and sustainability; autonomous agents taking meaningful actions to simplify our lives and enhance productivity; and AI-powered platforms fostering empathy, collaboration, and unity in digital spaces.

Subscribe to Data Points