(Translated by https://www.hiragana.jp/)
[2406.10424] What is the Visual Cognition Gap between Humans and Multimodal LLMs?

🚨2024-09-29: arxiv.org is experiencing DB issues.🚨