Dear friends,
It's official: Elon Musk will buy Twitter, pending approval of the transaction by the company's stockholders and the U.S. government. While some people are celebrating the deal in the name of free speech, others are worried about the platform’s future. Will the rules change to favor Musk’s personal views? Will trolling, harassment, and disinformation run rampant?
I hope the change in management will improve governance and conversation on Twitter. But I wonder whether an open standard for social media might be a better way to improve social networks.
Think about email. The open protocol SMTP has enable many companies to provide email services so that anyone with an email address can communicate freely with anyone else, regardless of their provider. A similar open standard could underpin social media.
Platforms like Facebook, Instagram, LinkedIn, and Twitter implement similar features like posting, liking, commenting and sharing. Why not enable key features to work across all platforms, including newcomers? This would permit users to interact even if their accounts were on different platforms, just as people who have email accounts with Gmail, Outlook, Yahoo, or any other provider can communicate with each other.
Open standards for social media have been discussed for a long time. Some people argue that only a central gatekeeper can moderate online conversations effectively, so they don’t degenerate into toxicity. This is false. Again, think of email. Spam filters do a good job of eliminating toxic messages, and the fact that different providers filter spam in different ways allows consumers to choose the gatekeeper they like best — or none at all. Meanwhile, adherence to an open protocol has prevented any single company from monopolizing email.
Open standards have driven huge amounts of innovation in computing and communications. They do evolve slowly, by committee. But when a technology is sufficiently mature, setting an open standard makes it difficult for any one company to change the rules to benefit themselves at others’ expense. Any developer can plug into an ecosystem, and the best implementations rise to the top. In contrast, proprietary platforms can change on a whim to, say, charge to reach followers or disallow apps from sharing. This makes it harder for innovators to build large and thriving businesses.
The web is another example. The HTTP protocol lets developers worldwide build whatever website they want. The resulting wave of innovation has lasted for decades. When Larry Page and Sergei Brin wanted to set up google.com, no one could stop them, and it was up to them to make it work. Yes, HTTP has spawned scams such as pushing schemes that lure victims to bogus websites, but competition in web browsers ensures that users have a choice of anti-phishing gatekeepers. This helps keep the web ecosystem healthy.
Creating an open standard for social media and getting many companies and users to adopt it would be difficult. It would require technical contributions from computer scientists and likely an assist from regulators. It would push against the tide of Facebook-style walled gardens (in which a single company sets rules and access to content).
The recent U.S. court ruling that legalized scraping websites is a welcome step toward the free flow of information online. Standards that ensure interoperability among social media platforms would be another, major step.
Keep learning!
Andrew
News
The View Through the Windshield
Overhead cameras equipped with computer vision are spotting distracted drivers on the road.
What’s new: A system from Melbourne-based Acusensus alerts police when drivers are engaged in risky activities such as using a cell phone, not wearing a seatbelt, or speeding, The New York Times reported.
How it works: The Heads-Up system uses sensors mounted over the road on overpasses, signs, or movable structures. An infrared flash camera captures images through windshield glare, heavy weather, and nighttime darkness. Radar gauges a vehicle’s speed.
- The camera snaps an image of each passing car and sends it to the cloud, where models analyze it and score the likelihood of various risky behaviors.
- The system forwards high-scoring images to a central police office that evaluates whether to charge the driver with a legal offense.
- The system can also identify sections of road where drivers are more likely to engage in risky behaviors to inform changes in infrastructure, law enforcement, or legislation.
- The company is developing a successor system designed to directly alert officers on patrol and enable them to review images on laptops installed in service vehicles.
Results: New South Wales, Australia, deployed the system in 2019. In its first two years, it contributed to a 22 percent decline in road fatalities and an 80 percent decline in use of mobile phones behind the wheel. An 18-hour assessment along a stretch of road in Missouri that saw an average three and a half crashes daily found that 6.5 percent of drivers used mobile phones and around 5 percent engaged in more than one risky behavior.
Behind the news: AI is being applied to traffic safety worldwide — and not always by surveilling drivers.
- By 2024, every new vehicle sold in the European Union will be required to automatically brake in emergencies, stay in a lane, control speed, and detect drowsy or distracted drivers.
- Numerous Chinese cities along with the Malaysian capital Kuala Lumpur are using Alibaba’s City Brain platform to ease traffic congestion. The system collects video from intersections and GPS data from cars, which it analyzes to coordinate traffic lights across a metropolitan area.
- Since 2017, buses in Barcelona have used a computer vision system from Mobileye to identify cyclists, pedestrians, and other potential hazards.
Why it matters: About 1.3 million people worldwide die in road accidents every year, according to the World Health Organization. Many fatalities are associated with speeding, distracted driving, and not wearing seatbelts. AI systems that identify these behaviors can help save lives.
We’re thinking: People tend to buckle up when they see a police car and slow down when they see their current speed flashing on a sign ahead. If cameras looming over the road can save lives — given adequate controls on who has access to the data and how they can use it — it’s worth a try.
Bridge to Explainable AI
DeepMind’s AlphaGo famously dominated Go, a game in which players can see the state of play at all times. A new AI system demonstrated similar mastery of bridge, in which crucial information remains hidden.
What’s new: NooK, built by Jean-Baptiste Fantun, Véronique Ventos, and colleagues at the French startup NukkAI, recently beat eight world champions at bridge — rather, a core aspect of the game.
Rules of the game: Bridge is played by four players grouped into teams of two. Each player is dealt a hand of cards, after which the game is played in two phases:
- Bidding, in which an auction determines a suit (spades, hearts, diamonds, clubs, or neither), called trump, that’s more valuable than other suits.
- Play, in which the players show one card each, and the team playing the most valuable card wins a trick.
This study focused on the play phase, pitting NooK and human champions against previous automated bridge-playing systems, none of which has proven superior to an excellent human player. Each deal had a preassigned bid and trump suit, and competitors played the same 800 deals, divided into sets of 10. The player with the highest average score in the most sets won.
How it works: The developers didn’t reveal the mechanisms behind NooK, but we can offer a guess based on press reports and the company’s research papers.
- Human experts came up with a list of situations to model separately, taking into account variables like the number of cards the player held in each suit, current bid, and number and value of high cards.
- For each of these situations, the developers generated groups of four hands. They played those hands using a computer solver that knew which cards all players held and assumed they would be played perfectly. Then they trained a vanilla neural network to copy the solver’s decisions without knowing which cards its opponents held, resulting in a separate model for each situation.
- At inference, NooK used the vanilla neural networks for the first few tricks in a given deal. After that, it used probabilistic logic programming to estimate the probability that each of its own cards would win the current trick, as well as Monte Carlo sampling to estimate how many tricks it could win afterwards. It determined which card to play based on those two statistics. (It used a vanilla neural network for the first few tricks because the search space is too large for Monte Carlo sampling to pick the best card to play.)
Results: Pitted against the previous systems, NooK scored higher than the human champions in 67 out of 80 sets, or 83 percent of the time.
Why it matters: Neural networks would be more useful in many situations if they were more interpretable; that is, if they could tell us why they classified a cat as a cat, or misclassified a cat as an iguana. This work’s approach offers one way to build more interpretable systems: a neurosymbolic hybrid that combines rules (symbolic AI, also known as good old-fashioned AI) describing various situations with neural networks trained to handle specific cases of each situation.
We’re thinking: In bridge, bidding is a way to hint to your partner (and deceive your opponent) about what you have in your hand, and thus a vital strategic element. NooK is impressive as far as it goes, but mastering bids and teamwork lie ahead.
A MESSAGE FROM DEEPLEARNING.AI
More than 4.7 million learners took the original Machine Learning course by Andrew Ng. A decade later, a new and updated Machine Learning Specialization is set to launch in June! #BreakIntoAI with this foundational three-course program. Sign up here
Efficiency Experts
The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.
What’s new: Researchers at Google led by Nan Du, Yanping Huang, and Andrew M. Dai developed Generalized Models (GLaM), a trillion-parameter model for language tasks. Like the company’s earlier Switch, this work uses mixture-of-experts (MoE) layers to select which subset(s) of a network to use depending on the input. It provides a clearer picture of how MoE can save time and electricity in practical language tasks.
Key insight: A neural network’s parameter count entails a compromise between performance (bigger is better) and energy cost (smaller is better). MoE architectures use different subsets of their parameters to learn from different examples. Each MoE layer contains a group of vanilla neural networks, or experts, preceded by a gating module that learns to choose which ones to use based on the input, enabling different experts to specialize in particular types of examples. In this way, the network uses less energy and learns more than the size of any given subset might suggest.
How it works: The authors trained a transformer model equipped with MoE layers (similar to GShard) to generate the next word or part of a word in a text sequence using a proprietary 1.6-trillion-word corpus of webpages, books, social media conversations, forums, and news articles. They fine-tuned the model to perform 29 natural language tasks in seven categories such as question answering and logical reasoning.
- During training, each input token (a word of text) passed through an encoder made up of alternating self-attention and MoE layers.
- Each MoE layer starts with a gating module. Given a representation from the attention layer, it selects two experts (out of 64) and passes the representation to them. The pair of experts refine the representation separately, creating two new representations. The weighted average of those representations goes to the next self-attention layer.
- After the last attention layer, a fully connected layer computed the word most likely to follow the input. Since two out of 64 experts were active in any given MoE layer, the network used only 8 percent of its parameters to render each output token.
- At inference, the authors evaluated their approach on zero- and one-shot tasks. In zero-shot tasks, given a prompt, the model generated an output (for example, an answer to an unseen question). In one-shot tasks, it received a randomly selected example of a completed task from a training set along with an input, and generated an output. (For instance, the model received a paragraph, a question about it, and the correct answer, and then answered a new question about a different paragraph.)
Results: Training the 1.2 trillion-parameter GLaM required 456 megawatt hours, while the 175 billion-parameter GPT-3 required 1,287 megawatt hours. Moreover, GLaM outperformed GPT-3 in six categories of zero-shot tasks and in five categories for one-shot tasks. For example, answering trivia questions in the one-shot TriviaQA, it achieved 75 percent accuracy — a state-of-the-art result — compared to GPT-3’s 68 percent.
Why it matters: Increased computational efficiency means lower energy costs, presumably making it easier for everyday engineers to train state-of-the-art models. It also means reduced CO2 emissions, sparing the planet some of the environmental impact incurred by AI.
We’re thinking: MoE models are attracting a lot of attention amid the public-relations race to claim ever higher parameter counts. Yes, building a mixture of 64 experts boosts the parameter count by 64 times, but it also means building 64 models instead of one. While this can work better than building a single model, it also diverts attention from other architectures that may yield insights deeper than bigger is better.
Training Mission
An experimental AI system is helping train the next generation of fighter pilots.
What’s new: The U.S. Air Force is using deep learning to evaluate the progress of around 50 pilots in one of its training squadrons, Popular Science reported.
Cloud-based data: Built by the California startup Crowdbotics, the system harnesses data generated in flight by F-15E airplanes (or simulations). Each aircraft records numerous data streams, such as air speed and position, multiple times per second. Instructors use the system’s output to tailor feedback to each student.
- The system grades trainees on their landings by monitoring the aircraft’s angle of approach, position on the runway, and remaining fuel. A plane that’s heavy with fuel may need to maintain a higher speed as it touches down than one that’s almost empty.
- It compares a trainee’s performance across different flights to evaluate improvement over time. It also compares trainees within a group, helping instructors to home in on areas for improvement.
- The project is funded by Small Business Innovation Research, a competitive government program to nurture technologies that show potential for commercialization. The program will determine the project’s commercial viability within two years.
Behind the news: Several machine learning projects aim to improve pilot safety by taking advantage of the data produced by modern aircraft.
- Paladin AI, based in Montreal, analyzes flight and simulator data to help train commercial pilots by assessing their in-flight maneuvers, awareness of their surroundings, and ability to follow procedures.
- Aura built a computer vision system that monitors helicopter instrument displays to generate performance reports for helicopter pilots-in-training. Purportedly it cuts training time by as much as 10 percent.
Why it matters: Training pilots is costly, time-consuming, and risky to both personnel and aircraft, which can cost tens of millions of dollars each. It’s also ongoing, as each type of aircraft requires unique instruction. AI can make training more effective, efficient, and safe. It can also allow instructors to focus on trainees who need the most attention.
We’re thinking: The sky's the limit for machine learning in training applications.