An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.
What’s new: Google introduced AI co-scientist, a general multi-agent system designed to generate in-depth research proposals within constraints specified by the user. The team generated and evaluated proposals for repurposing drugs, identifying drug targets, and explaining antimicrobial resistance in real-world laboratories. It’s available to research organizations on a limited basis.
How it works: AI co-scientist accepts a text description of a research goal, including relevant constraints or ideas. In response, it generates research proposals and reviews, ranks, and improves them using seven agents based on Google’s Gemini 2.0 family of large language models. The completed proposals include sections that explain background, unmet needs, a proposed solution, goals, hypotheses, reasoning, study steps, and relevant articles. The agents take feedback and outputs from other agents to perform their prompted task simultaneously.
- The supervisor agent periodically determines how often to run the other six agents, how important their output is, and whether the system is finished. To accomplish this, it computes statistics that represent the number of proposals generated so far, how many have been reviewed, and so on.
- The generation agent generates a list of proposals. It searches the web for relevant research articles, identifies testable assumptions, and debates with itself to improve ambiguous statements and adhere to constraints.
- The reflection agent filters the generated proposals according to correctness, quality, safety, and novelty. First, it reviews a proposal without web search and discards obviously bad proposals. Then it reviews each proposal against literature it finds online. It breaks down and checks the proposal’s assumptions, checks whether the proposal might explain some observations in previous work, and simulates the proposed experiment (via text generation, similar to how a person performs a thought experiment).
- The proximity agents compute similarity between proposals to avoid redundancy.
- The ranking agent determines the best proposals according to a tournament. It examines one pair of proposals at a time (including reviews from the reflection agent) and debates itself to pick the better one. To save computation, it prioritizes comparing similar proposals, new proposals, and highest-ranking proposals.
- The evolution agent generates new proposals by improving existing ones. It does this in several different ways, including simplifying current ideas, combining top-ranking ideas, and generating proposals that are very different from current ones.
- The meta-review agent identifies common patterns in the reflection agent’s reviews and the ranking agent’s debates. Its feedback goes to the reflection and generation agents, which use it to address common factors in future reviews and avoid generating similar proposals, respectively.
Results: AI co-scientist achieved a number of impressive biomedical results in tests.
- Google researchers generated proposals for experiments that would repurpose drugs to treat acute myeloid leukemia. They shared the 30 highest-ranked proposals with human experts, who chose five for lab tests. Of the five drugs tested, three killed acute myeloid leukemia cells.
- Experts selected three among 15 top-ranked generated proposals that proposed repurposing existing drugs to treat liver fibrosis. Two significantly inhibited liver fibrosis without being toxic to general cells. (Prior to this research, one of the drugs was approved by the United States Food and Drug Administration for a different illness, which may lead to a new treatment for liver fibrosis.)
- AI co-scientist invented a hypothesis to explain how microbes become resistant to antibiotics. Human researchers had proposed and experimentally validated the same hypothesis, but their work had not yet been published at the time, and AI co-scientist did not have access to it.
Behind the news: A few AI systems have begun to produce original scientific work. For instance, a model generated research proposals that human judges deemed more novel than proposals written by flesh-and-blood scientists, and an agentic workflow produced research papers that met standards for acceptance by top conferences.
Why it matters: While previous work used agentic workflows to propose research ideas on a general topic, this work generates proposals for specific ideas according to a researcher’s constraints (for example, a researcher could specify that a novel medical treatment for a specific disease only consider drugs already approved for human trials for other uses) and further instructions. AI co-scientist can take feedback at any point, allowing humans to collaborate with the machine: People provide ideas, feedback, and guidance for the model, and the model researches and proposes ideas in return.
We’re thinking: I asked my AI system to propose a new chemical experiment. But there was no reaction!