New coding tools act like agents to automate software programming tasks.
What’s new: A wave of open source software-development tools based on large language models take advantage of the ability of large language models to plan, critique their own work, and extend themselves by calling functions.
How it works: These projects follow hot on the heels of Cognition’s Devin, a commercial system billed as a semi-autonomous software developer that’s available to selected customers upon request. Some, like Devin, provide sandboxed chat for natural-language commands, command line shell, code editor, and/or a web browser through which the agent can test code or find documentation. Given a prompt, they generate a step-by-step plan and execute it. They may ask for further information or instructions, and users can interrupt to modify their requests.
- Devika uses Anthropic’s Claude 3, OpenAI’s GPT-4 and GPT-3.5, and models supported by Ollama, a tool that runs large language models locally. Like Devin, Devika runs in a web browser and includes an agent that performs planning and reasoning. A persistent knowledge base and database recalls active projects.
- OpenDevin is based on GPT-4 but has access to more than 100 models via litellm, a package that simplifies API calls. OpenDevin’s developers aim to match Devin’s user interface and enable the system to evaluate its own accuracy.
- SWE-agent addresses bugs and issues in Github repositories. It can use any language model. Using GPT-4, it resolved 12.3 percent of tasks in the SWE-bench dataset of real-world GitHub issues. (Devin resolved 13.9 percent of SWE-bench tasks. Claude 3, the highest-scoring model not specifically trained for coding, resolved 4.8 percent of SWE-bench tasks.)
Behind the News: Code-completion tools like Github Copilot and Code Llama quickly have become ubiquitous. AutoGPT, released in 2023, is an open-source generalist AI agent based on GPT-4 that has been used to write and debug code. Recently Replit, known for its Ghostwriter code-completion and chatbot applications, began building its own LLMs for automated code repair.
Why it matters: Agentic coding tools are distinguished by techniques that enable large language models to plan, reflect on their work, call tools, and collaborate with one another. Users report that, unlike previous coding assistants, the new tools are better at sustaining extended tasks and correcting their own work.
We’re thinking: Many software developers worry that large language models will make human coders obsolete. We doubt that AI will replace coders, but we believe that coders who use AI will replace those who don’t. Agent-based tools still have a long way to go, but they seem likely to augment programmers’ abilities in a larger development pipeline.