Mar 05, 2025

6 Posts

Diagram of an RQ-Transformer speech system with Helium and Depth Transformers for audio processing.
Mar 05, 2025

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.
Diagram of an RQ-Transformer speech system with Helium and Depth Transformers for audio processing.
Mar 05, 2025

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

The Batch AI News and Insights: Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.
Amazon smart display with widgets for recipes, calendar, weather, events, and streaming (Prime Video, Netflix, Disney+).
Mar 05, 2025

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Amazon announced Alexa+, a major upgrade to its long-running voice assistant.
Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.
Mar 05, 2025

Budget for Reasoning to the Token: Claude 3.7 Sonnet introduces hybrid reasoning and extended thinking

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.
Mar 05, 2025

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5—its most powerful non-reasoning model yet

OpenAI launched GPT-4.5, which may be its last non-reasoning model.
Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.
Mar 05, 2025

Text Generation by Diffusion: Mercury Coder may be the first commercially available Language Diffusion Model

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox