(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–2 of 2 results for author: Ranzinger, M

.
  1. arXiv:2312.06709  [pdf, other

    cs.CV

    AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

    Authors: Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

    Abstract: A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name… ▽ More

    Submitted 30 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Version 3: CVPR Camera Ready, reconfigured full paper, table 1 is now more comprehensive Version 2: Added more acknowledgements and updated table 7 with more recent results. Ensured that the link in the abstract to our code is working properly Version 3: Fix broken hyperlinks

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 12490-12500

  2. arXiv:2310.19731  [pdf, other

    cs.CV cs.AI cs.LG

    ViR: Towards Efficient Vision Retention Backbones

    Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

    Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More

    Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling