(Translated by https://www.hiragana.jp/)
[2109.12188] Predicting Attention Sparsity in Transformers