AI Agents

16 Posts

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.

Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.

A person typing a prompt in an AI-powered mobile app with a button to improve the input.

AI Agents

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Replit, an AI-driven integrated development environment, updated its mobile app to generate further mobile apps to order.

Diagram showing GPT-4o with and without search, highlighting task execution success and failure.

AI Agents

Tree Search for Web Agents: How tree search improves AI agents’ ability to browse the web and complete tasks

Browsing the web to achieve a specific goal can be challenging for agents based on large language models and even for vision-language models that can process onscreen images of a browser.

ChatGPT interface drafting a research report on retail trends, including AI, e-commerce, and inflation.

AI Agents

Agents Go Deep: OpenAI’s Deep Research agent generates detailed reports by analyzing web sources

OpenAI introduced a state-of-the-art agent that produces research reports by scouring the web and reasoning over what it finds.

Diagram illustrating Moshi’s use of an LLM to process user audio input, inner monologue, and output.

AI Agents

Okay, But Please Don’t Stop Talking: Moshi, an open alternative to OpenAI’s Realtime API for Speech

Even cutting-edge, end-to-end, speech-to-speech systems like ChatGPT’s Advanced Voice Mode tend to get interrupted by interjections like “I see” and “uh-huh” that keep human conversations going. Researchers built an open alternative that’s designed to go with the flow of overlapping speech.

Flowchart illustrating the automation of opening, editing, and saving a Word document using PyAutoGUI.

AI Agents

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

As Anthropic, Google, OpenAI, and others roll out agents that are capable of computer use, new work shows how underlying models can be trained to do this.

AI assistant processes ‘Find me a family-friendly campsite’ and suggests options.

AI Agents

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

OpenAI introduced an AI agent that performs simple web tasks on a user’s behalf.

AI Agents

Mustafa Suleyman: Agents of action

In 2025, AI will have learned to see, it will be way smarter and more accurate, and it will start to do things on your behalf.

Santas in line with gifts and a ‘Photos with Santa’ sign.

AI Agents

Agents Ascendant: LLMs evolve with agentic workflows, enabling autonomous reasoning and collaboration

The AI community laid the foundation for systems that can act by prompting large language models iteratively, leading to much higher performance across a range of applications.

Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.

AI Agents

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.

Flow diagram of an application using LLMs to process prompts and tools for responses.

AI Agents

Agents Open the Wallet: Stripe builds ecommerce agent toolkit for AI to securely spend money

One of the world’s biggest payment processors is enabling large language models to spend real money.

OpenDevin animation illustrating open-source AI model collaboration.

AI Agents

Free Agents: OpenHands launches as an open toolkit for advanced code generation and automation

An open source package inspired by the commercial agentic code generator Devin aims to automate computer programming and more.

MLE-Bench workflow showing competition steps for model training, testing, and leaderboard scoring.

AI Agents

When Agents Train Algorithms: OpenAI’s MLE-bench tests AI coding agents

Coding agents are improving, but can they tackle machine learning tasks?

User retrieves vendor contact information to fill out a request form, verifying each entry.

AI Agents

Claude Controls Computers: Anthropic empowers Claude Sonnet 3.5 to operate desktop apps, but cautions remain

API commands for Claude Sonnet 3.5 enable Anthropic’s large language model to operate desktop apps much like humans do. Be cautious, though: It’s a work in progress.

AI Agents

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Tree Search for Web Agents: How tree search improves AI agents’ ability to browse the web and complete tasks

Agents Go Deep: OpenAI’s Deep Research agent generates detailed reports by analyzing web sources

Okay, But Please Don’t Stop Talking: Moshi, an open alternative to OpenAI’s Realtime API for Speech

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

Mustafa Suleyman: Agents of action

Agents Ascendant: LLMs evolve with agentic workflows, enabling autonomous reasoning and collaboration

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Agents Open the Wallet: Stripe builds ecommerce agent toolkit for AI to securely spend money

Free Agents: OpenHands launches as an open toolkit for advanced code generation and automation

When Agents Train Algorithms: OpenAI’s MLE-bench tests AI coding agents

Claude Controls Computers: Anthropic empowers Claude Sonnet 3.5 to operate desktop apps, but cautions remain

Subscribe to The Batch