Online sorting algorithms run trillions of times a day to organize lists according to users’ interests. New work found faster alternatives.
What’s new: Daniel J. Mankowitz and colleagues at Google developed AlphaDev, a system that learned to generate algorithms that sort three to five numbers faster than previous state-of-the-art methods. Accelerating such algorithms can expedite the sorting of lists of any size — say, for search engines, ecommerce sites, and the like — since algorithms that sort more elements often call algorithms that sort fewer elements.
Key insight: Most programmers implement sorting algorithms in a high-level programming language like C++, which a compiler translates into Assembly Language instructions that control the processor and memory. A compiler can translate a single line of C++ into a variety of sequences of Assembly instructions that are equivalent functionally but vary in their speed (number of Assembly instructions required). A reinforcement learning agent can learn to choose a translation that maximizes speed.
How it works: AlphaDev is a collection of neural networks that learn jointly via reinforcement learning. The authors initialized the system by giving it a sequence of unsorted numbers and an empty list of Assembly instructions. It built algorithms by adding Assembly instructions one by one. It earned rewards for choosing instructions that sorted the numbers correctly and quickly.
- With each new instruction selected, a transformer computed an embedding of the instructions so far, and a vanilla neural network computed an embedding of the order of the numbers after applying those instructions. The system concatenated the two embeddings to represent the current state.
- Given the embeddings, two vanilla neural networks selected instructions. The first network (i) predicted the total future reward for the current state and (ii) calculated the probability that any given instruction would improve the algorithm. The second network (iii) predicted the reward after adding each possible instruction and (iv) predicted an embedding to represent the resulting state.
- The system searched through possible sequences of instructions to find which instruction most often led to the highest predicted rewards. It added that instruction to the algorithm.
- Once the system had built an algorithm, the authors uploaded it to the main C++ library, which had not been updated in over a decade. The resulting algorithms now serve as open source subroutines in C++’s default sorting algorithm.
Results: The authors tested two approaches to rewarding speed, minimizing either Assembly instructions or average runtime over a number of inputs. When AlphaDev minimized the number of Assembly instructions, it found an algorithm that sorted three integers using 17 instructions instead of the previous state-of-the-art algorithm, a human-engineered one that used 18 instructions. Its algorithm for sorting four integers used 28 instructions, equal to the typical one. Its algorithm for sorting five integers had 42 instructions, compared to the alternative’s 46 instructions. When AlphaDev optimized for runtime (running on Intel 6th-generation Core “Skylake” processor), sorting three integers took 2.18 nanoseconds, compared to the typical algorithm’s 4.86 nanoseconds. Sorting four unsigned integers took 1.96 nanoseconds instead of 5.43 nanoseconds and sorting five of them took 1.98 nanoseconds instead of 6.79 nanoseconds. AlphaDev achieved smaller speedups with longer number sequences: Sorting 16 unsigned integers took 9.5 nanoseconds instead of 10.5 nanoseconds, and sorting 262,144 numbers took 60.8 nanoseconds instead of 61.4 nanoseconds.
Why it matters: This work repurposes the training method and architecture of game-playing models like AlphaZero to solve real-world problems. The trick is to reframe the task of writing a sorting algorithm as a reinforcement learning problem.
We’re thinking: What other algorithms can this approach optimize? How much faster will they be? Let’s get these questions sorted!