Switch
Efficiency Experts: Mixture of Experts Makes Language Models More Efficient
The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.