Dear friends,
Russian troops have invaded Ukraine, and the terrifying prospect of a war in Europe weighs on my mind. My heart goes out to all the civilians affected, and I hope we won’t see the loss of life, liberty, or property that many people fear.
I’ve often thought about the role of AI in military applications, but I haven’t spoken much about it because I don’t want to contribute to the proliferation of AI arms. Many people in AI believe that we shouldn’t have anything to do with military use cases, and I sympathize with that idea. War is horrific, and perhaps the AI community should just avoid it. Nonetheless, I believe it’s time to wrestle with hard, ugly questions about the role of AI in warfare, recognizing that sometimes there are no good options.
Full disclosure: My early work on deep learning was funded by the U.S. Defense Research Projects Agency, or DARPA. Last week, Wired mentioned my early work on drone helicopters, also funded by DARPA. During the U.S.-Iraq war, when IEDs (roadside bombs) were killing civilians and soldiers, I spent time thinking about how computer vision can help robots that dispose of IEDs.
What may not be so apparent is that forces that oppose democracy and civil liberties also have access to AI technology. Russian drones have been found to contain parts made in the U.S. and Europe. I wouldn’t be surprised if they also contain open-source software that our community has contributed to. Despite efforts to control exports of advanced chips and other parts that go into AI systems, the prospects are dim for keeping such technology out of the hands of people who would use it to cause harm.
So I see little choice but to make sure the forces of democracy and civil liberties have the tools they need to protect themselves.
Several organizations have come to the same conclusion, and they’ve responded by proposing principles designed to tread a fine line between developing AI’s capacity to confer advantage on the battlefield and blunting its potential to cause a catastrophe. For example, the United Nations has issued guidance that all decisions to take human life must involve human judgment. Similarly, the U.S. Department of Defense requires that its AI systems be responsible, equitable, traceable, reliable, and governable.
I support these principles. Still, I’m concerned that such guidelines, while necessary, aren’t sufficient to prevent military abuses. User interfaces can be designed to lead people to accept an automated decision — consider the pervasive “will you accept all cookies from this website?” pop-ups that make it difficult to do anything else. An automated system may comply technically with the U.N. guidance, but if it provides little context and time for its human operator to authorize a kill mission, that person is likely to do so without the necessary oversight or judgment.
While it’s important to establish high-level principles, they must be implemented in a way that enables people to make fateful decisions — perhaps the most difficult decisions anyone can make — in a responsible way. I think of the protocols that govern the use of nuclear weapons, which so far have helped to avoid accidental nuclear war. The systems involved must be subject to review, auditing, and civilian oversight. A plan to use automated weapons could trigger protocols to ensure that the situation, legality, and schedule meet strict criteria, and that the people who are authorized to order such use are clearly identified and held accountable for their decisions.
War is tragic. Collectively we’ve invented wondrous technologies that also have unsettling implications for warfare. Even if the subject presents only a menu of unpalatable options, let’s play an active role in navigating the tough choices needed to foster democracy and civil liberties.
Keep learning,
Andrew
News
High-Energy Deep Learning
Nuclear fusion technology, long touted as an unlimited source of safe, clean energy, took a step toward reality with a machine learning algorithm that molds the fuel in a reactor’s core.
What’s new: Researchers at DeepMind and École Polytechnique Fédérale de Lausanne (EPFL) developed a reinforcement learning algorithm to manipulate hydrogen plasma — an extremely high-energy form of matter — into an optimal shape for energy production.
How it works: Reactors that confine plasma in a chamber known as a tokamak generate energy by pushing its atoms so close together that they fuse. A tokamak uses powerful magnetic coils to compress the plasma, heating it to the neighborhood of 100 million degrees Celsius to overcome the electrostatic force that normally pushes them apart. The authors trained a reinforcement learning model to control the voltage of 19 magnetic coils in a small, experimental tokamak reactor, enabling them to shape the plasma in ways that are consistent with maintaining an ongoing fusion reaction.
- The authors initially trained the algorithm in a simulated tokamak. Its reward function scored how well the plasma shape, position, and current matched the desired configuration.
- The training harnessed maximum a priori policy optimization, an actor-critic algorithm in which an actor learns to take actions that maximize rewards delivered by a critic. The actor, a vanilla neural network, learned how to control the simulated coils based on the current state of the plasma. The critic, a recurrent neural network, learned to predict the reward function’s score after each action.
- At inference, the critic was discarded while the actor continued to choose actions 10,000 times per second.
Results: In experimental runs with the real-world reactor, a previous algorithm controlled the coils to form a preliminary plasma shape before handing off the task to the authors’ model. Plasma can't be observed directly, so the authors calculated its shape and position properties based on measurements of the magnetic field within the tokamak. In five separate experiments, the controller formed the plasma into distinct shapes, such as a conventional elongated shape and a prospective “snowflake” shape, within particular tolerances (2 centimeters root mean squared error for shape, 5 kiloamperes root mean squared error for current passing through the plasma). In a novel feat, the algorithm maintained two separate plasma droplets for 200 milliseconds.
Behind the news: Conventional nuclear energy results from nuclear fission. Scientists have been trying to harness nuclear fusion since the 1950s. Yet no fusion reactor has generated more energy than it consumed. (The U.S. National Ignition Facility came the closest yet last year.) A growing number of scientists are enlisting machine learning to manage the hundreds of factors involved in sustaining a fusion reaction.
- Researchers at the Joint European Torus, another tokamak reactor, trained a variety of deep learning models on sensor data from within the reactor. A convolutional neural network visualized the plasma, reducing the time required to compute its behavior. A recurrent neural network predicted the risk of disruptions such as plasma escaping the magnetic field, which could damage the reactor’s walls. A variational autoencoder identified subtle anomalies in plasma that can cause such disruptions.
- Google AI and the startup TAE Technologies developed algorithms designed to improve fusion reactor performance. For instance, a set of Markov chain Monte Carlo models computes starting conditions that enable plasma to remain stable for longer periods of time.
Why it matters: Plasma in a tokamak, which is several times hotter than the sun and reverts to vapor if its electromagnetic container falters, is continually in flux. This work not only shows that deep learning can shape it in real time, it also opens the door to forming plasma in ways that might yield more energy. The next challenge: Scale up to a reactor large enough to produce meaningful quantities of energy.
We’re thinking: Fusion energy — if it ever works — would be a game changer for civilization. It’s thrilling to see deep learning potentially playing a key role in this technology.
Remote Meter Reader
Industrial gauges are often located on rooftops, underground, or in tight spaces — but they’re not out of reach of computer vision.
What’s new: The Okinawa startup LiLz Gauge provides a system that reads analog gauges and reports their output to a remote dashboard. The system is available in Japan and set to roll out globally in 2023.
How it works: The system automates inspection in places that have no computer network or power. It ties together remote units that integrate a camera, processor, cellular and Bluetooth connectivity, and a battery designed to last up to three years.
- Users position the camera where it can see a gauge.
- They can configure the algorithm to recognize a style of gauge — circular, rectangular, counter, or seven-segment alphanumeric — and its range of readings.
- The algorithm extracts readings continuously or periodically and transmits them to a dashboard or via an API.
Behind the news: AI increasingly enables inspectors to do their jobs at a distance. For instance, drones equipped with computer vision have been used to spot damage and deficiencies in buildings, dams, solar and wind farms, and power lines.
Why it matters: Given the complexity of replacing some gauges, computer vision may be more cost effective than installing a smart meter. More broadly, industrial operations don’t necessarily need to replace old gear if machine learning can give it new life. Well-established machine learning approaches can be engineered to meet the needs of low-tech industries.
We’re thinking: This application looks like low-hanging fruit or computer vision. There’s ample room for clever engineers to adapt older practices with newer ways of doing things.
A MESSAGE FROM DEEPLEARNING.AI
Looking to prepare for Google’s TensorFlow Certificate exam? Gain the skills you need to build scalable AI-powered applications with the TensorFlow Developer Professional Certificate program! Enroll today
Scam Definitely
Robocalls slip through smartphone spam filters, but a new generation of deep learning tools promises to tighten the net.
What’s new: Research proposed fresh approaches to thwarting robocalls. Such innovations soon could be deployed in apps, IEEE Spectrum reported.
How it works: RobocallGuard, devised by researchers at Georgia Institute of Technology and University of Georgia, answers the phone and determines whether a call is malicious based on what the caller says. TouchPal, proposed by a team at Shanghai Jiao Tong University, UC Berkeley, and TouchPal Inc., analyzes the call histories of users en masse to identify nuisance calls.
- RobocallGuard starts by checking the caller ID. It passes along known callers and blocks blacklisted callers. Otherwise, it asks the caller who they are trying to reach and listens to the reply using a neural network that recognizes keywords. If the caller states the user’s name, it passes along the call with a transcript of the interaction generated using Google’s Speech-to-Text API. If not, it disconnects and saves the recording and transcript.
- TouchPal collected a dataset by enabling users to label incoming calls as harassment, fraud, delivery, sales, and other categories. It used these labels along with information including contacts, anonymized phone numbers, call times, and call durations to train a vanilla neural network to classify nuisance calls before they’re answered.
Behind the news: Many robocallers have outgrown the fixed phone numbers, obviously prerecorded messages, and “press-1” phone trees that were dead giveaways in the past, making it harder for recipients to recognize spam calls even after answering the phone.
- Some robocallers use personalized audio, often including clips of recorded human voices that play in response to specific keywords, to simulate a human on the other end of the line.
- Robocallers commonly falsify the number they’re dialing from, making them hard to trace and virtually impossible to block.
Why it matters: Robocallers placed nearly 4 billion nuisance calls in the U.S. in January 2021. These numbers have hardly budged since 2019 despite government efforts to combat them. The problem is even worse elsewhere. In Brazil, the average user of one call-blocking app received more than one spam call daily. It’s unlikely that robocalls will ever disappear entirely, but machine learning could relegate them to the background, like email spam.
We’re thinking: If everybody blocks robocalls, maybe robocallers will start sending nuisance calls to each other.
Fine-Tune Your Fine-Tuning
Let’s say you have a pretrained language model and a small amount of data to fine-tune it to answer yes-or-no questions. Should you fine-tune it to classify yes/no or to fill in missing words — both viable approaches that are likely to yield different results? New work offers a way to decide.
What’s new: Yanan Zheng and collaborators at Beijing Academy of Artificial Intelligence, Carnegie Mellon University, DeepMind, Massachusetts Institute of Technology, and Tsinghua University proposed FewNLU, a method that compares fine-tuning algorithms in few-shot natural language understanding, or language comprehension tasks in which a model must learn from a few examples. They also provide a toolkit for optimizing fine-tuned performance.
Key insight: Previous comparisons of fine-tuning algorithms used fixed hyperparameter values; the researchers chose values known to work with a particular algorithm and maintained them with other algorithms. But different combinations of algorithm and architecture require different hyperparameter values to achieve their optimal performance. So, to compare fine-tuning algorithms, it’s best to determine hyperparameter values separately for each combination.
How it works: The authors compared various data-split strategies and hyperparameter values for different fine-tuning algorithms applied to DeBERTa and ALBERT. They fine-tuned the models on 64 labeled examples for each of seven tasks in the SuperGLUE benchmark (such as answering yes-or-no questions about a text passage or multiple-choice questions about causes of events) to find the best data-split strategy and most important hyperparameters. Then they compared fine-tuning algorithms using different values for the most important hyperparameters.
- The authors considered three data-split strategies: minimum description length, K-fold cross validation, and one they created called Multi-Splits. Whereas K-fold cross validation splits the dataset into K parts and uses a different part for validation K times, Multi-Splits shuffles and splits the data randomly into training and validation sets according to a fixed ratio K times.
- They compared different values for six hyperparameters, varying one at a time: the order in which they provided the 64 labeled examples during training, the pattern used to convert various types of examples into fill-in-the-blank examples, training batch size, learning rate, evaluation frequency, and maximum training steps.
- They compared the performance of four fine-tuning algorithms on ALBERT and DeBERTa using the best data-split strategy (Multi-Splits) and various combinations of hyperparameter values. The algorithm known as CLS adds a special token at the beginning of an input example, and the model uses the token’s representation to classify it. PET, ADAPET, and P-tuning change the classification task into a fill-in-the-blank procedure.
Results: Multi-Splits led to superior test performance on 4 of the 7 tasks, and it had the greatest correlation between validation and test performance on 5 of the 7 tasks. Changes in the prompt pattern led to the greatest standard deviation in performance across hyperparameters (average of 5.5 percent accuracy, compared to the next-highest, training order, at 2.0 percent), suggesting that it was the most important hyperparameter to optimize. Using Multi-Splits and the optimal hyperparameter values for each fine-tuning algorithm (specific to each model and task), PET, ADAPET, and P-tuning performed similarly and typically outperformed CLS by 15 to 20 percentage points in accuracy and F1 score. There was no clear winner among PET, ADAPET, and P-tuning, each of which achieved the highest accuracy or F1 score on one task or another, often within 1 standard deviation of each other.
Why it matters: It’s certainly good to know how to get the most out of fine-tuning. Beyond that, this work reinforces the notion that, since the only way to know the best hyperparameter values is to find them empirically, it pays to keep guessing to a minimum.
We’re thinking: Here’s a puzzler: If the choice of a fine-tuning algorithm changes a model’s optimal hyperparameter values, is the choice itself a hyperparameter?