State-of-the-art chatbots typically are trained on short dialogs. Consequently they often respond with off-point statements in extended conversations. To improve that performance, researchers developed a way to track context throughout a conversation.
What's new: Jing Xu, Arthur Szlam, and Jason Weston at Facebook released a chatbot that summarizes dialog on the fly and uses the summary to generate further repartee.
Key insight: Chatbots based on the transformer architecture typically generate replies by analyzing up to 1,024 of the most recent tokens (usually characters, words, or portions of words). Facebook previously used a separate transformer to determine which earlier statements were most relevant to a particular reply — but in long conversations, the relevant statements may encompass more than 1,024 tokens. Summarizing such information can give a model access to more context than is available to even large, open-domain chatbots like BlenderBot, Meena, and BART.
How it works: The authors built a dataset of over 5,000 conversations. They trained a system of three transformers respectively to summarize conversations as they occurred, select the five summaries most relevant to the latest back-and-forth turn, and generate a response.
- The authors recorded text chats between pairs of volunteers. Each conversation consisted of three or four sessions (up to 14 messages each) separated by pauses that lasted up to seven days.
- After each session, a volunteer summarized the session to serve as reference for subsequent sessions (which may involve different conversants). In addition, the volunteer either summarized each turn or marked it with a label indicating that no summary was needed.
- A BlenderBot, given a dialog from the start through each turn, learned to match the turn-by-turn summaries.
- A dense passage retriever, pretrained on question-answer pairs, ranked and selected the turn-by-turn summaries most relevant to the session so far according to nearest neighbor search.
- A separate BlenderBot received the top summaries and generated the next response.
Results: Human evaluators compared the authors’ model to a garden-variety BlenderBot, which draws context from the most recent 128 tokens. They scored the authors’ model an average 3.65 out of 5 compared with the BlenderBot’s 3.47. They found 62.1 percent of its responses engaging versus 56.5 percent of the BlenderBot’s responses.
Why it matters: After much work on enabling chatbots to discuss a variety of topics, it’s good to see improvement in their ability to converse at length. Conversation is inherently dynamic, and if we want chatbots to keep up with us, we need them to ride a train of thought, hop off the line, switch to a new rail, and shift back to the first — all without losing track.
We're thinking: If Facebook were to use this system to generate chatter on the social network, could we call its output Meta data? (Hat tip to Carol-Jean Wu!).