Language models lately have become so good at generating coherent text that some researchers hesitate to release them for fear they'll be misused to auto-generate disinformation. Yet they’re still bad at basic tasks like understanding nested statements and ambiguous language. A new advance shatters previous benchmarks related to comprehension, portending even more capable models to come.
What’s new: Researchers at Google Brain and Carnegie Mellon introduce XLNet, a pre-training algorithm for natural language processing systems. It helps NLP models (in this case, based on Transformer-XL) achieve state-of-the-art results in 18 diverse language-understanding tasks including question answering and sentiment analysis.
Key Insight: XLNet builds on BERT's innovation, but it differs in key ways:
- Language models typically evaluate the meaning of a word by looking at the words leading up to and following it. Previous algorithms like BERT examined those words in forward or backward order, or both. XLNet uses a variety of random permutations.
- BERT learns by masking words and trying to reconstruct them, but the masks aren't present during inference, which impacts accuracy. XLNet doesn't use masks, so it doesn't suffer from BERT's training/inference gap.
How it works: XLNet teaches a network to structure text into phrase vectors before fine-tuning for a specific language task.
- XLNet computes a vector for an entire phrase as well as each word sequence in it.
- It learns the phrase vector by randomly selecting a target word and learning to derive that word from the phrase vector.
- Doing this repeatedly for every word in a phrase forces the model to learn good phrase vectors.
- The trick is that individual words may not be processed sequentially. XLNet samples various word orders, producing a different phrase vector for each sample.
- By training over many randomly sampled orders, XLNet learns phrase vectors with invariance to word order.
Why it matters: NLP models using XLNet vectors achieved stellar results in a variety of tasks. They answered multiple-choice questions 7.6 percent more accurately than the previous state of the art (other efforts have yielded less than 1 percent improvement) and classified subject matter with 98.6 percent accuracy, 3 percent better than the previous state of the art.
Takeaway: XLNet’s output can be applied to a variety of NLP tasks, raising the bar throughout the field. It takes us another step toward a world where computers can decipher what we’re saying — even ambiguous yet grammatical sentences like “the old man the boat” — and stand in for human communications in a range of contexts.