Transformer-XL
Selective Attention: More efficient NLP training without sacrificing performance
Large transformer networks work wonders with natural language, but they require enormous amounts of computation. New research slashes processor cycles without compromising performance.