AI Agents

6 Posts

Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.
AI Agents

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.
Flow diagram of an application using LLMs to process prompts and tools for responses.
AI Agents

Agents Open the Wallet: Stripe builds ecommerce agent toolkit for AI to securely spend money

One of the world’s biggest payment processors is enabling large language models to spend real money.
OpenDevin animation illustrating open-source AI model collaboration.
AI Agents

Free Agents: OpenHands launches as an open toolkit for advanced code generation and automation

An open source package inspired by the commercial agentic code generator Devin aims to automate computer programming and more.
MLE-Bench workflow showing competition steps for model training, testing, and leaderboard scoring.
AI Agents

When Agents Train Algorithms: OpenAI’s MLE-bench tests AI coding agents

Coding agents are improving, but can they tackle machine learning tasks? 
User retrieves vendor contact information to fill out a request form, verifying each entry.
AI Agents

Claude Controls Computers: Anthropic empowers Claude Sonnet 3.5 to operate desktop apps, but cautions remain

API commands for Claude Sonnet 3.5 enable Anthropic’s large language model to operate desktop apps much like humans do. Be cautious, though: It’s a work in progress.
The SWE-bench full leaderboard shows Cosine Genie outperforming its competitors.
AI Agents

Agentic Coding Strides Forward: Genie coding assistant outperforms competitors on SWE-bench by over 30 percent

An agentic coding assistant boosted the state of the art in an important benchmark by more than 30 percent.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox