Longformer
More Efficient Transformers: BigBird is an efficient attention mechanism for transformers.
As transformer networks move to the fore in applications from language to vision, the time it takes them to crunch longer sequences becomes a more pressing issue. A new method lightens the computational load using sparse attention.