(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–13 of 13 results for author: DeTone, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10224  [pdf, other

    cs.CV

    EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

    Authors: Julian Straub, Daniel DeTone, Tianwei Shen, Nan Yang, Chris Sweeney, Richard Newcombe

    Abstract: The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2308.13561  [pdf, other

    cs.HC cs.CV

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

    Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  3. arXiv:2304.02009  [pdf, other

    cs.CV

    OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

    Authors: Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan, Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter Kontschieder, Vasileios Balntas

    Abstract: Humans can orient themselves in their 3D environments using simple 2D maps. Differently, algorithms for visual localization mostly rely on complex 3D point clouds that are expensive to build, store, and maintain over time. We bridge this gap by introducing OrienterNet, the first deep neural network that can localize an image with sub-meter accuracy using the same 2D semantic maps that humans use.… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  4. arXiv:2207.09442  [pdf, other

    cs.RO cs.CV cs.LG math.OC

    Theseus: A Library for Differentiable Nonlinear Optimization

    Authors: Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky T. Q. Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, Jing Dong, Brandon Amos, Mustafa Mukadam

    Abstract: We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnost… ▽ More

    Submitted 18 January, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Advances in Neural Information Processing Systems (NeurIPS), 2022

  5. arXiv:2206.01916  [pdf, other

    cs.CV

    Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation

    Authors: Gil Avraham, Julian Straub, Tianwei Shen, Tsun-Yi Yang, Hugo Germain, Chris Sweeney, Vasileios Balntas, David Novotny, Daniel DeTone, Richard Newcombe

    Abstract: This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic,… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: Published at CVPRW with supplementary material

  6. arXiv:2112.12785  [pdf, other

    cs.CV

    NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning

    Authors: Tony Ng, Hyo Jin Kim, Vincent Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk, Chris Sweeney

    Abstract: In the light of recent analyses on privacy-concerning scene revelation from visual descriptors, we develop descriptors that conceal the input image content. In particular, we propose an adversarial learning framework for training visual descriptors that prevent image reconstruction, while maintaining the matching accuracy. We let a feature encoding network and image reconstruction network compete… ▽ More

    Submitted 29 March, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: Accepted at CVPR 2022. Supplementary material included after references. 15 pages, 14 figures, 6 tables

  7. arXiv:2108.10165  [pdf, other

    cs.CV

    ODAM: Object Detection, Association, and Mapping using Posed RGB Video

    Authors: Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe

    Abstract: Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them t… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Accepted in ICCV 2021 as oral

  8. arXiv:1911.11763  [pdf, other

    cs.CV

    SuperGlue: Learning Feature Matching with Graph Neural Networks

    Authors: Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

    Abstract: This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs are predicted by a graph neural network. We introduce a flexible context aggregation mechanism based on attention, enabling SuperGlue to reason ab… ▽ More

    Submitted 28 March, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: Oral at CVPR 2020, with appendix and link to publicly available code

  9. arXiv:1812.03247  [pdf, other

    cs.CV

    Deep ChArUco: Dark ChArUco Marker Pose Estimation

    Authors: Danying Hu, Daniel DeTone, Vikram Chauhan, Igor Spivak, Tomasz Malisiewicz

    Abstract: ChArUco boards are used for camera calibration, monocular pose estimation, and pose verification in both robotics and augmented reality. Such fiducials are detectable via traditional computer vision methods (as found in OpenCV) in well-lit environments, but classical methods fail when the lighting is poor or when the image undergoes extreme motion blur. We present Deep ChArUco, a real-time pose es… ▽ More

    Submitted 1 July, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

    Comments: Published in CVPR 2019

  10. arXiv:1812.03245  [pdf, other

    cs.CV

    Self-Improving Visual Odometry

    Authors: Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

    Abstract: We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend, a network which computes pointwise data associations across images. Our self-improving method enables a VO frontend to learn over time, unlike other VO and SLAM systems which require time-consuming hand-tuning or expensive d… ▽ More

    Submitted 7 December, 2018; originally announced December 2018.

  11. arXiv:1712.07629  [pdf, other

    cs.CV

    SuperPoint: Self-Supervised Interest Point Detection and Description

    Authors: Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

    Abstract: This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We int… ▽ More

    Submitted 19 April, 2018; v1 submitted 20 December, 2017; originally announced December 2017.

    Comments: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM Workshop (DL4VSLAM2018)

  12. arXiv:1707.07410  [pdf, other

    cs.CV

    Toward Geometric Deep SLAM

    Authors: Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

    Abstract: We present a point tracking system powered by two deep convolutional neural networks. The first network, MagicPoint, operates on single images and extracts salient 2D points. The extracted points are "SLAM-ready" because they are by design isolated and well-distributed throughout the image. We compare this network against classical point detectors and discover a significant performance gap in the… ▽ More

    Submitted 24 July, 2017; originally announced July 2017.

  13. arXiv:1606.03798  [pdf, other

    cs.CV

    Deep Image Homography Estimation

    Authors: Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

    Abstract: We present a deep convolutional neural network for estimating the relative homography between a pair of images. Our feed-forward network has 10 layers, takes two stacked grayscale images as input, and produces an 8 degree of freedom homography which can be used to map the pixels from the first image to the second. We present two convolutional neural network architectures for HomographyNet: a regre… ▽ More

    Submitted 12 June, 2016; originally announced June 2016.

    Comments: RSS Workshop on Limits and Potentials of Deep Learning in Robotics