(Translated by https://www.hiragana.jp/)
[2406.10774] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference