(Translated by https://www.hiragana.jp/)
[2310.03294] DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training