AI Agents

8 Posts

MUSTAFA SULEYMAN
AI Agents

Mustafa Suleyman: Agents of action

In 2025, AI will have learned to see, it will be way smarter and more accurate, and it will start to do things on your behalf.
Santas in line with gifts and a ‘Photos with Santa’ sign.
AI Agents

Agents Ascendant: LLMs evolve with agentic workflows, enabling autonomous reasoning and collaboration

The AI community laid the foundation for systems that can act by prompting large language models iteratively, leading to much higher performance across a range of applications.
Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.
AI Agents

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.
Flow diagram of an application using LLMs to process prompts and tools for responses.
AI Agents

Agents Open the Wallet: Stripe builds ecommerce agent toolkit for AI to securely spend money

One of the world’s biggest payment processors is enabling large language models to spend real money.
OpenDevin animation illustrating open-source AI model collaboration.
AI Agents

Free Agents: OpenHands launches as an open toolkit for advanced code generation and automation

An open source package inspired by the commercial agentic code generator Devin aims to automate computer programming and more.
MLE-Bench workflow showing competition steps for model training, testing, and leaderboard scoring.
AI Agents

When Agents Train Algorithms: OpenAI’s MLE-bench tests AI coding agents

Coding agents are improving, but can they tackle machine learning tasks? 
User retrieves vendor contact information to fill out a request form, verifying each entry.
AI Agents

Claude Controls Computers: Anthropic empowers Claude Sonnet 3.5 to operate desktop apps, but cautions remain

API commands for Claude Sonnet 3.5 enable Anthropic’s large language model to operate desktop apps much like humans do. Be cautious, though: It’s a work in progress.
The SWE-bench full leaderboard shows Cosine Genie outperforming its competitors.
AI Agents

Agentic Coding Strides Forward: Genie coding assistant outperforms competitors on SWE-bench by over 30 percent

An agentic coding assistant boosted the state of the art in an important benchmark by more than 30 percent.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox