(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–9 of 9 results for author: Mustikovela, S K

.
  1. arXiv:2312.00784  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Authors: Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

    Abstract: While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024. Project page: https://vip-llava.github.io/

  2. arXiv:2308.12560  [pdf, other

    cs.CV

    NOVA: NOvel View Augmentation for Neural Composition of Dynamic Objects

    Authors: Dakshit Agrawal, Jiajie Xu, Siva Karthik Mustikovela, Ioannis Gkioulekas, Ashish Shrivastava, Yuning Chai

    Abstract: We propose a novel-view augmentation (NOVA) strategy to train NeRFs for photo-realistic 3D composition of dynamic objects in a static scene. Compared to prior work, our framework significantly reduces blending artifacts when inserting multiple dynamic objects into a 3D scene at novel views and times; achieves comparable PSNR without the need for additional ground truth modalities like optical flow… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted for publication in ICCV Computer Vision for Metaverse Workshop 2023 (code is available at https://github.com/dakshitagrawal/NoVA)

  3. arXiv:2110.09848  [pdf, other

    cs.CV

    Self-Supervised Object Detection via Generative Image Synthesis

    Authors: Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz

    Abstract: We present SSOD, the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection. We use collections of real world images without bounding box annotations to learn to synthesize and detect objects. We leverage controllable GANs to synthesize images with pre-defined object properties and use them to train object detectors. We propose a ti… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  4. arXiv:2006.16011  [pdf, other

    cs.CV cs.GR

    Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition

    Authors: Hassan Abu Alhaija, Siva Karthik Mustikovela, Justus Thies, Varun Jampani, Matthias Nießner, Andreas Geiger, Carsten Rother

    Abstract: Neural rendering techniques promise efficient photo-realistic image synthesis while at the same time providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this res… ▽ More

    Submitted 29 March, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

  5. arXiv:2004.01793  [pdf, other

    cs.CV

    Self-Supervised Viewpoint Learning From Image Collections

    Authors: Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Carsten Rother, Jan Kautz

    Abstract: Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets. However, manually labeling viewpoints is notoriously hard, error-prone, and time-consuming. On the other hand, it is relatively easy to mine many unlabelled images of an object category from the internet, e.g., of cars or faces. We seek to answer the research question of whether such unlabe… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

    Comments: Accepted at CVPR 20

  6. arXiv:1809.04696  [pdf, other

    cs.CV

    Geometric Image Synthesis

    Authors: Hassan Abu Alhaija, Siva Karthik Mustikovela, Andreas Geiger, Carsten Rother

    Abstract: The task of generating natural images from 3D scenes has been a long standing goal in computer graphics. On the other hand, recent developments in deep neural networks allow for trainable models that can produce natural-looking images with little or no knowledge about the scene structure. While the generated images often consist of realistic looking local patterns, the overall structure of the gen… ▽ More

    Submitted 1 December, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

  7. arXiv:1712.01924  [pdf, other

    cs.CV

    iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

    Authors: Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, Carsten Rother

    Abstract: We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no previous method works well for partly occluded objects. Our main contribution is to present the first deep learning-based system that estimates accurate poses for part… ▽ More

    Submitted 18 June, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

  8. arXiv:1708.01566  [pdf, other

    cs.CV

    Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes

    Authors: Hassan Abu Alhaija, Siva Karthik Mustikovela, Lars Mescheder, Andreas Geiger, Carsten Rother

    Abstract: The success of deep learning in computer vision is based on availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learni… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

  9. arXiv:1610.00731  [pdf, other

    cs.CV

    Can Ground Truth Label Propagation from Video help Semantic Segmentation?

    Authors: Siva Karthik Mustikovela, Michael Ying Yang, Carsten Rother

    Abstract: For state-of-the-art semantic segmentation task, training convolutional neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling, which is expensive and involves extensive human effort. In this work, we study the possibility of using auxiliary ground truth, so-called \textit{pseudo ground truth} (PGT) to improve the performance. The PGT is obtained by propagating the labels of a… ▽ More

    Submitted 3 October, 2016; originally announced October 2016.

    Comments: To appear at ECCV 2016 Workshop on Video Segmentation