(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–18 of 18 results for author: Bhalgat, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.11574  [pdf, other

    cs.CV cs.AI cs.LG

    Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

    Authors: Manan Shah, Yash Bhalgat

    Abstract: This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible, well commented and open-sourced code implementation for the entire method specified in the original paper. (2) We try to verify the effectiveness of the novel a… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Reproducibility study

  2. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  3. arXiv:2403.10997  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method… ▽ More

    Submitted 28 July, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: ECCV 2024

  4. arXiv:2403.06877  [pdf, other

    cs.RO cs.CV

    SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

    Authors: Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, Maurice Fallon

    Abstract: We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. W… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024; Website: https://ori-drs.github.io/projects/silvr/

  5. arXiv:2306.04633  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

    Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

    Abstract: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across fra… ▽ More

    Submitted 1 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Spotlight). Code: https://github.com/yashbhalgat/Contrastive-Lift

  6. arXiv:2303.10087  [pdf, other

    cs.CV

    Neural Refinement for Absolute Pose Regression with Feature Synthesis

    Authors: Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, Zirui Wang, Victor Adrian Prisacariu

    Abstract: Absolute Pose Regression (APR) methods use deep neural networks to directly regress camera poses from RGB images. However, the predominant APR architectures only rely on 2D operations during inference, resulting in limited accuracy of pose estimation due to the lack of 3D geometry constraints or priors. In this work, we propose a test-time refinement pipeline that leverages implicit geometric cons… ▽ More

    Submitted 29 February, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Paper Accepted by CVPR 2024. Project Page: http://nefes.active.vision. Code will be released at https://github.com/ActiveVisionLab/NeFeS

  7. arXiv:2211.15107  [pdf, other

    cs.CV cs.AI cs.LG

    A Light Touch Approach to Teaching Transformers Multi-view Geometry

    Authors: Yash Bhalgat, Joao F. Henriques, Andrew Zisserman

    Abstract: Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propo… ▽ More

    Submitted 2 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Camera-ready version. Accepted to CVPR 2023

  8. arXiv:2203.11933  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

    Authors: Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

    Abstract: Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddi… ▽ More

    Submitted 25 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: 17 pages, 4 figures, 7 tables. For code and trained token embeddings, see https://github.com/oxai/debias-vision-lang; Changed to use ACL layout, added joint training with comparison figure, corrected spelling and formatting errors; This paper is accepted for publication at AACL 2022, the official version of record is in the ACL Anthology

  9. arXiv:2111.06500  [pdf, other

    cs.CV

    Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation

    Authors: John Yang, Yash Bhalgat, Simyung Chang, Fatih Porikli, Nojun Kwak

    Abstract: While hand pose estimation is a critical component of most interactive extended reality and gesture recognition systems, contemporary approaches are not optimized for computational and memory efficiency. In this paper, we propose a tiny deep neural network of which partial layers are recursively exploited for refining its previous estimations. During its iterative refinements, we employ learned ga… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

  10. arXiv:2105.10335  [pdf, other

    cs.NE cs.CV cs.LG

    Data-driven Weight Initialization with Sylvester Solvers

    Authors: Debasmit Das, Yash Bhalgat, Fatih Porikli

    Abstract: In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initializ… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

    Comments: Practical Machine Learning for Developing Countries Workshop, International Conference on Learning Representations, 2021

  11. arXiv:2008.02454  [pdf, other

    cs.CV cs.LG cs.NE

    Structured Convolutions for Efficient Neural Network Design

    Authors: Yash Bhalgat, Yizhe Zhang, Jamie Lin, Fatih Porikli

    Abstract: In this work, we tackle model efficiency by exploiting redundancy in the \textit{implicit structure} of the building blocks of convolutional neural networks. We start our analysis by introducing a general definition of Composite Kernel structures that enable the execution of convolution operations in the form of efficient, scaled, sum-pooling components. As its special case, we propose \textit{Str… ▽ More

    Submitted 31 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Camera-ready for NeurIPS 2020

  12. arXiv:2004.09576  [pdf, other

    cs.CV cs.LG stat.ML

    LSQ+: Improving low-bit quantization through learnable offsets and better initialization

    Authors: Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak

    Abstract: Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently employed in popular efficient architectures can also result in negative activation values, with skewed positive and negative ranges. Typical learnable quantization schemes [PACT, LSQ] assume unsigned quantization for activations and quantize all negative activations to zero which leads to significant loss in pe… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: Camera-ready for Joint Workshop on Efficient Deep Learning in Computer Vision, CVPR 2020

  13. arXiv:2003.00075  [pdf, other

    cs.LG stat.ML

    Learned Threshold Pruning

    Authors: Kambiz Azarian, Yash Bhalgat, Jinwon Lee, Tijmen Blankevoort

    Abstract: This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes $30$ epochs for… ▽ More

    Submitted 18 March, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

  14. arXiv:1911.12491  [pdf, other

    cs.CV cs.LG

    QKD: Quantization-aware Knowledge Distillation

    Authors: Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak

    Abstract: Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation p… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  15. arXiv:1909.11233  [pdf, other

    cs.LG cs.HC

    Teacher-Student Learning Paradigm for Tri-training: An Efficient Method for Unlabeled Data Exploitation

    Authors: Yash Bhalgat, Zhe Liu, Pritam Gundecha, Jalal Mahmud, Amita Misra

    Abstract: Given that labeled data is expensive to obtain in real-world scenarios, many semi-supervised algorithms have explored the task of exploitation of unlabeled data. Traditional tri-training algorithm and tri-training with disagreement have shown promise in tasks where labeled data is limited. In this work, we introduce a new paradigm for tri-training, mimicking the real world teacher-student learning… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

  16. arXiv:1812.11302  [pdf, other

    cs.CV

    Annotation-cost Minimization for Medical Image Segmentation using Suggestive Mixed Supervision Fully Convolutional Networks

    Authors: Yash Bhalgat, Meet Shah, Suyash Awate

    Abstract: For medical image segmentation, most fully convolutional networks (FCNs) need strong supervision through a large sample of high-quality dense segmentations, which is taxing in terms of costs, time and logistics involved. This burden of annotation can be alleviated by exploiting weak inexpensive annotations such as bounding boxes and anatomical landmarks. However, it is very difficult to \textit{a… ▽ More

    Submitted 29 December, 2018; originally announced December 2018.

    Comments: Medical Imaging meets NeurIPS 2018 Workshop

  17. arXiv:1810.00136  [pdf, other

    cs.CV

    FusedLSTM: Fusing frame-level and video-level features for Content-based Video Relevance Prediction

    Authors: Yash Bhalgat

    Abstract: This paper describes two of my best performing approaches on the Content-based Video Relevance Prediction challenge. In the FusedLSTM based approach, the inception-pool3 and the C3D-pool5 features are combined using an LSTM and a dense layer to form embeddings with the objective to minimize the triplet loss function. In the second approach, an Online Kernel Similarity Learning method is proposed t… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

    Comments: Submission report for the ACMMM CBVRP challenge 2018

  18. arXiv:1609.05001  [pdf, other

    cs.CV

    Stamp processing with examplar features

    Authors: Yash Bhalgat, Mandar Kulkarni, Shirish Karande, Sachin Lodha

    Abstract: Document digitization is becoming increasingly crucial. In this work, we propose a shape based approach for automatic stamp verification/detection in document images using an unsupervised feature learning. Given a small set of training images, our algorithm learns an appropriate shape representation using an unsupervised clustering. Experimental results demonstrate the effectiveness of our framewo… ▽ More

    Submitted 16 September, 2016; originally announced September 2016.