GPT-4 poses negligible additional risk that a malefactor could build a biological weapon, according to a new study.
What’s new: OpenAI compared the ability of GPT-4 and web search to contribute to the creation of a dangerous virus or bacterium. The large language model was barely more helpful than the web.
How it works: The researchers asked both trained biologists and biology students to design a biological threat using either web search or web search plus GPT-4.
- The authors recruited 50 experts who had doctorates and experience in a laboratory equipped to handle biohazards, and 50 students who had taken a biology course at an undergraduate level or higher. All participants were U.S. citizens or permanent residents and passed a criminal background check.
- Half of each group were allowed to search the web. The other half also had access to GPT-4. (The experts were given a research version of the model that was capable of answering dangerous questions with limited safeguards.)
- Participants were asked to complete 5 tasks that corresponded to steps in building a biological threat: (i) choose a suitable biohazard, (ii) find a way to obtain it, (iii) plan a process to produce the threat in a sufficient quantity, (iv) determine how to formulate and stabilize it for deployment as a bioweapon, and (v) identify mechanisms to release it.
- The authors scored completion of each task for accuracy, completeness, and innovation (0 to 10) as well as time taken (in minutes). Participants scored each task for difficulty (0 to 10).
Results: Participants who used GPT-4 showed slight increases in accuracy and completeness.Students with GPT-4 scored 0.25 and 0.41 more points on average, respectively, than students in the control group. Experts with access to the less restricted version of GPT-4 scored 0.88 and 0.82 points higher on average, respectively, than experts in the control group. However, these increases were not statistically significant. Moreover, participants who used GPT-4 didn’t show greater innovation, take less time per task, or view their tasks as easier. Even if GPT-4 could be prompted to provide information that would facilitate a biological attack, the model didn’t provide more information than a user could glean by searching the web.
Why it matters: AI alarmists have managed to create a lot of anxiety by promoting disaster scenarios, such as human extinction, that the technology has no clear way to bring about. Meanwhile, the unfounded fears stand to slow down developments that could do tremendous good in the world. Evidence that GPT-4 is no more likely than web search to aid in building a bioweapon is a welcome antidote. (Though we would do well to consider removing from the web unnecessary information that may aid in the making of bioweapons.)
We’re thinking: Large language models, like other multipurpose productivity tools such as web search or spreadsheet software, are potentially useful for malicious actors who want to do harm. Yet AI’s potential in biothreat development garners headlines, while Excel’s is rarely mentioned. That makes it doubly important to quantify the risk in ways that can guide regulators and other decision makers.