Gopher
Right-Sizing Models for the Dataset: Finding the Best Data-To-Parameter Ratio for NLP Models
The route to improving transformer-based language models like GPT-3 and Gopher, which are trained on immense quantities of text scraped from the web, has been to increase their size. But research shows that, given a processing budget, bigger doesn’t necessarily mean better.