EAGLE-3 speeds up language models And the 2024 Turing Award goes to…

Published
Mar 10, 2025
Reading time
3 min read
Futuristic nightclub with neon lights, a dancing crowd, and a supercomputer DJ booth glowing amid fog and lasers.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Music and lyrics in one diffusion model
  • Manus AI’s impressive demos spark excitement and backlash
  • OpenAI sees AGI as a gradual evolution
  • Google unveils its first Gemini-branded embedding models

But first:

EAGLE-3 introduces new techniques for accelerating inference

Researchers at Peking University, Microsoft, and elsewhere developed EAGLE-3, an updated method for speculative sampling that aims to speed up large language model inference. The approach removes feature prediction constraints and introduces a “training-time test” technique for directly predicting draft tokens. EAGLE-3 also incorporates a fusion of low, middle, and high-level features from the target model, moving beyond the use of only top-layer features. Experiments show EAGLE-3 achieves faster inference speeds compared to standard autoregressive decoding and previous speculative sampling methods across various tasks and model sizes. (arXiv)

Reinforcement learning pioneers honored with top computing award 

Andrew Barto and Richard Sutton won the 2024 Turing Award for their groundbreaking work on reinforcement learning, a method for AI systems to learn from digital rewards and punishments. Their research, which began in the late 1970s, laid the foundation for major AI breakthroughs like AlphaGo and ChatGPT. The $1 million prize acknowledges Barto and Sutton’s role in developing a fundamental AI technique that continues to shape the field’s rapid advancement and future potential. (Association for Computing Machinery)

DiffRhythm generates full-length songs within seconds

Chinese researchers developed DiffRhythm, a diffusion-based model capable of generating complete songs up to 4 minutes 45 seconds long, including both vocals and accompaniment. The model uses a variational autoencoder to compress audio into latent representations, which are then generated by a diffusion transformer conditioned on lyrics and style prompts. DiffRhythm can produce high-quality 4-minute songs in just 10 seconds, significantly faster than previous language model approaches. The researchers released their model and code under a noncommercial license. (GitHub and arXiv)

Manus AI attracts plenty of attention but little consensus

Chinese startup The Butterfly Effect launched Manus, an AI agent platform that uses Claude and various undisclosed models to autonomously perform complex tasks without human oversight. Despite generating significant buzz and comparisons to breakthroughs like DeepSeek, some early users report Manus struggling with basic requests and crashing frequently. Manus is still in private preview, and the widely differing reports seem to stem from a combination of users’ limited access and very different expectations. (ManusForbes, and TechCrunch)

OpenAI outlines its evolving approach to AI safety and alignment

OpenAI detailed its current principles for ensuring artificial general intelligence benefits humanity. The company now views AGI development as a continuous process rather than a sudden leap, emphasizing iterative deployment to learn from real-world usage. OpenAI’s core safety principles include embracing uncertainty, layering multiple safeguards, developing scalable alignment methods, maintaining human control, and collaborating with the wider AI community. (OpenAI)

Gemini Embedding model tops multilingual benchmarks

Google unveiled a new experimental Gemini Embedding text model, available through the Gemini API, which outperforms previous models and tops the Massive Text Embedding Benchmark Multilingual leaderboard. The model features an 8K token input limit, 3K output dimensions, and supports over 100 languages, making it applicable for diverse tasks like retrieval augmented generation and text classification. This release gives developers early access to Gemini Embedding capabilities, with Google working towards a stable, generally available version in the coming months. (Google)


Still want to know more about what matters in AI right now?

Read last week’s issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng discussed the challenges of Voice Activity Detection (VAD) in noisy environments and highlighted Moshi, a model that continuously listens and decides when to speak, eliminating the need for explicit turn-taking detection. He emphasized ongoing innovations in voice AI and the potential for improved voice-to-voice interactions.

“Just as the architecture of text-only transformers has gone through many evolutions (such as encoder-decoder models, decoder-only models, and reasoning models that generate a lot of ‘reasoning tokens’ before the final output), voice models are going through a lot of architecture explorations.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Mercury Coder released a fast text generator with a non-transformer architecture, introducing what may be the first commercially available Language Diffusion Model; OpenAI unveiled GPT-4.5, its most powerful non-reasoning model to date, promising enhanced performance and efficiency; Claude 3.7 Sonnet introduced a budget for reasoning tokens, a hybrid approach to reasoning models; and Amazon launched Alexa+, integrating generative AI and intelligent agents powered by Claude and other models to create a more advanced voice assistant.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research