OpenAI introduced an AI agent that performs simple web tasks on a user’s behalf.
What’s new: Operator automates online actions like buying goods, booking tickets and completing forms by navigating websites in a browser-like environment within ChatGPT. It’s available on desktops as a research preview for subscribers to ChatGPT Pro ($200 per month). OpenAI promises broader availability to come as well as API access to the underlying model and improved ability to coordinate multi-step tasks like scheduling meetings across calendars from different vendors.
How it works: Operator uses a new model called Computer-Using Agent (CUA) that accepts text input and responds with web actions.
- Users type commands into ChatGPT. GPT-4o translates these inputs into structured instructions, and CUA executes them by interacting directly with web elements like buttons, menus, and text fields. OpenAI didn’t disclose CUA’s architecture or training methods but said it was trained on simulated and real-world browser scenarios via reinforcement learning.
- CUA earns high marks on some measures in tests performed by OpenAI. On WebVoyager, which evaluates web tasks, CUA succeeded 87 percent of the time. On OSWorld, a benchmark that evaluates the ability of multimodal agents to perform complex tasks that involve real-world web and desktop apps, CUA achieved a success rate of 38.1 percent. In separate tests performed by Kura and Anthropic, on WebVoyager, Kura achieved 87 percent while DeepMind’s Mariner achieved 83.5 percent, and on OSWorld, Claude Sonnet 3.5 with Computer Use achieved 22 percent.
- Operator is restricted from interacting with unverified websites and sharing sensitive data without the user’s consent. It offers content filters, and a separate model monitors Operator in real time and pauses the agent in case of suspicious behavior.
Behind the news: Operator rides a wave of agents designed to automate everyday tasks. Last week, OpenAI introduced ChatGPT Tasks, which lets users schedule reminders and alerts but doesn’t support web interaction. (Early users complained that Tasks was buggy and required overly precise instructions.) Anthropic’s Computer Use focuses on basic desktop automation, while DeepMind’s Project Mariner is a web-browsing assistant built on Gemini 2.0. Perplexity Assistant automates mobile apps such as booking Uber rides on Android phones.
Why it matters: In early reports, users said Operator sometimes was less efficient than a human performing the same tasks. Nevertheless, agentic AI is entering the consumer market, and Operator is poised to give many people their first taste. It’s geared to provide AI assistance for an endless variety of personal and business uses, and — like ChatGPT was for other developers of LLMs — and it’s bound to serve as a template for next-generation products.
We’re thinking: Computer use is maturing, and the momentum behind it is palpable. AI developers should have in their toolbox.