It’s surprisingly easy to turn a well-intended machine learning model to the dark side.
What’s new: In an experiment, Fabio Urbina and colleagues at Collaborations Pharmaceuticals, who had built a drug-discovery model to design useful compounds and avoid toxic ones, retrained it to generate poisons. In six hours, the model generated 40,000 toxins, some of them actual chemical warfare agents that weren’t in the initial dataset.
How it works: The authors didn’t detail the architecture, dataset, and method to avoid encouraging bad actors. The following description is drawn from the few particulars they did reveal along with accounts of the company’s existing generative model, MegaSyn.
- The authors pretrained an LSTM to generate compounds, expressed in a standardized text format, from a large database of chemical structures and their substructures.
- They fine-tuned the LSTM to generate the compounds similar to VX, a deadly nerve agent, saving different models along the way. Models saved early in the fine-tuning process generated a wide variety of chemicals, while those later in the process generated chemicals almost identical to the fine-tuning set.
- They used each fine-tuned model to generate thousands of compounds and rank them according to predicted toxicity and impact on the human body. MegaSyn’s ranking function penalizes toxicity and rewards greater biological impact, so the authors reversed the toxicity factor, prioritizing the deadliest compounds with the greatest effect.
- They further fine-tuned each model on the most harmful 10 percent of compounds it generated, spurring it to design ever more deadly chemicals.
Why it matters: The authors took an industrial model and turned it into what they call “a computational proof of concept for making biochemical weapons.” They emphasize that it wouldn’t be difficult to copy using publicly available datasets and models. It may be similarly easy to subvert models built for tasks other than drug discovery, turning helpful models into harmful ones.
We’re thinking: Despite machine learning’s enormous potential to do good, it can be harnessed for evil. Designing effective safeguards for machine learning research and implementation is a very difficult problem. What is clear is that we in the AI community need to recognize the destructive potential of our work and move with haste and deliberation toward a framework that can minimize it. NeurIPS’ efforts to promote introspection on the part of AI researchers are a notable start — despite arguments that they politicize basic research — and much work remains to be done.