An open source package inspired by the commercial agentic code generator Devin aims to automate computer programming and more.
What’s new: OpenHands, previously known as OpenDevin, implements a variety of agents for coding and other tasks. It was built by Xingyao Wang and a team at University of Illinois Urbana-Champaign, Carnegie Mellon, Yale, University of California Berkeley, Contextual AI, King Abdullah University of Science and Technology, Australian National University, Ho Chi Minh City University of Technology, Alibaba, and All Hands AI. The code is free to download, use, and modify.
How it works: OpenHands provides a set of agents, or workflows for the user’s choice of large language models. Users can command various agents to generate, edit, and run code; interact with the web; and perform auxiliary tasks related to coding and other work. The agents run in a secure Docker container with access to a server to execute code, a web browser, and tools that, say, copy text from pdfs or transcribe audio files.
- The CodeAct agent follows the CodeAct framework, which specifies an agentic workflow for code generation. Given a prompt or results of a code execution, it can ask for clarification, write code and execute it, and deliver the result. It can also retrieve relevant information from the web.
- The browsing agent controls a web browser. At every time step, it receives the user’s prompt and a text description of each element it sees on the resulting webpage. The description includes a numerical identifier, words like “paragraph” or “button” (and associated text), a list of possible actions (such as scroll, click, wait, drag and drop, and send a message to the user), an example chain of thought for selecting an action, and a list of previous actions taken. It executes actions iteratively until it has sent a message to the user.
- A set of “micro agents” perform auxiliary tasks such as writing commit messages, writing Postgres databases, summarizing codebases, solving math problems, delegating actions to other agents, and the like. Users can write their own prompts to define micro agents.
Results: Overall, OpenHands agents achieve similar performance to previous agents on software engineering problems, web browsing, and miscellaneous tasks like answering questions. For example, fixing issues in Github in SWE-Bench, the CodeAct agent using Claude 3.5 Sonnet solved 26 percent while Moatless Tools using the same model solved 26.7 percent. On GPQA Diamond, a set of graduate-level questions about physics, chemistry, and biology, the CodeAct agent using GPT-4-turbo with search wrote code to perform the necessary calculations and found relevant information to answer the questions, achieving 51.8 percent accuracy. GPT-4 with search achieved 38.8 percent accuracy.
Why it matters: Agentic workflows are rapidly expanding the scope and capabilities of large language models. As open source software, this system gives developers an extensible toolkit for designing agentic systems. Although it’s oriented toward coding, it accommodates a variety of information-gathering, -processing, and -publishing tasks.
We’re thinking: This system lets users tailor custom agents simply by rewriting prompts. We look forward to seeing what non-programmers do with it!