Mar 05, 2025

6 Posts

Diagram of an RQ-Transformer speech system with Helium and Depth Transformers for audio processing.

Mar 05, 2025

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

Mar 05, 2025

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

The Batch AI News and Insights: Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

Amazon smart display with widgets for recipes, calendar, weather, events, and streaming (Prime Video, Netflix, Disney+).

Mar 05, 2025

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Amazon announced Alexa+, a major upgrade to its long-running voice assistant.

Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.

Mar 05, 2025

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.

Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.

Mar 05, 2025

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.

Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.

Mar 05, 2025

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

Mar 05, 2025

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Subscribe to The Batch