(Translated by https://www.hiragana.jp/)
[2406.13807] AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding