(Translated by https://www.hiragana.jp/)
[2407.08156v1] AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization