Despite dramatic recent progress, natural language generation remains an iffy proposition. Even users of the muscular GPT-2 text generator have to press the button a number of times to get sensible output. But researchers are figuring out how to exert greater control over generated text.
What’s new: Pre-trained text generators generally require fine-tuning for a specific sort of output. A team at Salesforce developed a model aptly named CTRL that lets users determine the output genre, from news story to horror script, without further training.
Key insight: The model is guided by control codes, human-written text tags that describe a desired output genre — including, yes, jokes. The model learns relationships between a given code and the intended style and content.
How it works: CTRL, like the state-of-the-art language model BERT, is a modified transformer network trained in an unsupervised fashion on large-scale text corpora. Its training data include Wikipedia, Reddit, and Project Gutenberg’s library of digitized books.
- CTRL predicts the next word in a sequence based on learned relationships among words.
- During training, each input sequence comes with a control code. For example, material drawn from a contract would be coded Legal.
- During generation, one of these codes directs the model to produce text similar to the associated subset of the training data.
Results: The researchers provide qualitative results demonstrating that control codes generate different styles of text in response to the same prompt. For example, given the prompt “the knife,” the Reviews code produces “a knife is a tool and this one does the job well,” while the Horror code yields “a knife handle pulled through the open hole in the front.” The paper offers no quantitative evaluation.
Why it matters: The ideal text generator would produce diverse, relevant passages appropriate to a wide variety of uses. CTRL suggests that a single model with unsupervised training could meet this requirement.
We’re thinking: Many people including GPT-2’s creators worry that more-capable text generators invite misuse. Could a CTRL-style approach reduce abuse by suppressing certain genres (say, blatant political disinformation) as well as supporting more effective text generation?