Transformer-XL

2 Posts

Data related to Nvidia's Pay Attention When Required (Par) approach

Selective Attention: More efficient NLP training without sacrificing performance

Large transformer networks work wonders with natural language, but they require enormous amounts of computation. New research slashes processor cycles without compromising performance.

Graph related to Language Model Analysis (LAMA)

Transformer-XL

What Language Models Know

Watson set a high bar for language understanding in 2011, when it famously whipped human competitors in the televised trivia game show Jeopardy! IBM’s special-purpose AI required around $1 billion. Research suggests that today’s best language models can accomplish similar tasks right off the shelf.

Transformer-XL

Selective Attention: More efficient NLP training without sacrificing performance

What Language Models Know

Subscribe to The Batch