DeepNorm
Pile on the Layers!: DeepNorm Allows Transformers to Accommodate More Layers
Adding layers to a neural network puts the “deep” in deep learning, but it also increases the chance that the network will get stuck during training. A new approach effectively trains transformers with an order of magnitude more layers than previous methods.