(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–25 of 25 results for author: Newcombe, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10224  [pdf, other

    cs.CV

    EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

    Authors: Julian Straub, Daniel DeTone, Tianwei Shen, Nan Yang, Chris Sweeney, Richard Newcombe

    Abstract: The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2406.09905  [pdf, other

    cs.CV cs.GR

    Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

    Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

    Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.09598  [pdf, other

    cs.CV

    Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

    Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

    Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of object… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2403.13064  [pdf, other

    cs.CV

    SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

    Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

    Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: see project page, https://projectaria.com/scenescript

  5. arXiv:2402.13349  [pdf, other

    cs.CV cs.AI cs.HC

    Aria Everyday Activities Dataset

    Authors: Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexander Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, Jing Dong, Kiran Somasundaram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, Carl Ren

    Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data includi… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Dataset website: https://www.projectaria.com/datasets/aea/

  6. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  7. arXiv:2308.13561  [pdf, other

    cs.HC cs.CV

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

    Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  8. arXiv:2308.13093  [pdf, other

    cs.CV

    EgoBlur: Responsible Innovation in Aria

    Authors: Nikhil Raina, Guruprasad Somasundaram, Kang Zheng, Sagar Miglani, Steve Saarinen, Jeff Meissner, Mark Schwesinger, Luis Pesqueira, Ishita Prasad, Edward Miller, Prince Gupta, Mingfei Yan, Richard Newcombe, Carl Ren, Omkar M Parkhi

    Abstract: Project Aria pushes the frontiers of Egocentric AI with large-scale real-world data collection using purposely designed glasses with privacy first approach. To protect the privacy of bystanders being recorded by the glasses, our research protocols are designed to ensure recorded video is processed by an AI anonymization model that removes bystander faces and vehicle license plates. Detected face a… ▽ More

    Submitted 6 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  9. arXiv:2306.06362  [pdf, other

    cs.CV cs.AI cs.LG

    Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception

    Authors: Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Peters, Thomas Whelan, Chen Kong, Omkar Parkhi, Richard Newcombe, Carl Yuheng Ren

    Abstract: We introduce the Aria Digital Twin (ADT) - an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. This ADT release contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes with 398 object instances (324 stationary and 74 dynamic). Each sequence consists of: a) raw data of two monochrome camera s… ▽ More

    Submitted 13 June, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

  10. arXiv:2305.16487  [pdf, other

    cs.CV cs.AI

    EgoHumans: An Egocentric 3D Multi-Human Benchmark

    Authors: Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani

    Abstract: We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocen… ▽ More

    Submitted 18 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to ICCV 2023 (Oral)

  11. arXiv:2304.02009  [pdf, other

    cs.CV

    OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

    Authors: Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan, Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter Kontschieder, Vasileios Balntas

    Abstract: Humans can orient themselves in their 3D environments using simple 2D maps. Differently, algorithms for visual localization mostly rely on complex 3D point clouds that are expensive to build, store, and maintain over time. We bridge this gap by introducing OrienterNet, the first deep neural network that can localize an image with sub-meter accuracy using the same 2D semantic maps that humans use.… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  12. arXiv:2206.01916  [pdf, other

    cs.CV

    Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation

    Authors: Gil Avraham, Julian Straub, Tianwei Shen, Tsun-Yi Yang, Hugo Germain, Chris Sweeney, Vasileios Balntas, David Novotny, Daniel DeTone, Richard Newcombe

    Abstract: This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic,… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: Published at CVPRW with supplementary material

  13. arXiv:2205.08525  [pdf, other

    cs.CV

    Self-supervised Neural Articulated Shape and Appearance Models

    Authors: Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollhöfer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, Mira Slavcheva

    Abstract: Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 15 pages. CVPR 2022. Project page available at https://weify627.github.io/nasam/

  14. arXiv:2204.01695  [pdf, other

    cs.CV

    LISA: Learning Implicit Shape and Appearance of Hands

    Authors: Enric Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris Sweeney, Richard Newcombe, Lingni Ma

    Abstract: This paper proposes a do-it-all neural model of human hands, named LISA. The model can capture accurate hand shape and appearance, generalize to arbitrary hand subjects, provide dense surface correspondences, be reconstructed from images in the wild and easily animated. We train LISA by minimizing the shape and appearance losses on a large set of multi-view RGB image sequences annotated with coars… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Published at CVPR 2022

  15. arXiv:2203.00051  [pdf, other

    cs.CV cs.GR

    ERF: Explicit Radiance Field Reconstruction From Scratch

    Authors: Samir Aroudj, Steven Lovegrove, Eddy Ilg, Tanner Schmidt, Michael Goesele, Richard Newcombe

    Abstract: We propose a novel explicit dense 3D reconstruction approach that processes a set of images of a scene with sensor poses and calibrations and estimates a photo-real digital model. One of the key innovations is that the underlying volumetric representation is completely explicit in contrast to neural network-based (implicit) alternatives. We encode scenes explicitly using clear and understandable m… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: 23 pages, 18 figures

    ACM Class: I.3.3; I.4.5

  16. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  17. arXiv:2108.10165  [pdf, other

    cs.CV

    ODAM: Object Detection, Association, and Mapping using Posed RGB Video

    Authors: Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe

    Abstract: Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them t… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Accepted in ICCV 2021 as oral

  18. arXiv:2103.02597  [pdf, other

    cs.CV cs.GR

    Neural 3D Video Synthesis from Multi-view Video

    Authors: Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, Zhaoyang Lv

    Abstract: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting. At the core of… ▽ More

    Submitted 2 May, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted as an oral presentation for CVPR 2022. Project website: https://neural-3d-video.github.io/

  19. arXiv:2005.05125  [pdf, other

    cs.CV

    FroDO: From Detections to 3D Objects

    Authors: Kejie Li, Martin Rünz, Meng Tang, Lingni Ma, Chen Kong, Tanner Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, Richard Newcombe

    Abstract: Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel le… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: To be published in CVPR 2020. The first two authors contributed equally

  20. arXiv:2003.10983  [pdf, other

    cs.CV cs.CG cs.LG

    Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction

    Authors: Rohan Chabra, Jan Eric Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, Richard Newcombe

    Abstract: Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception. To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables encoding and reconstruction of high-quality 3D shapes without prohibitive memory requirements. DeepLS replaces the dense volumetric signed distance function (SDF) representation us… ▽ More

    Submitted 21 August, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

    Comments: Accepted at ECCV 2020

  21. arXiv:1906.05797  [pdf, other

    cs.CV cs.GR eess.IV

    The Replica Dataset: A Digital Replica of Indoor Spaces

    Authors: Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra , et al. (5 additional authors not shown)

    Abstract: We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometr… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  22. arXiv:1904.02251  [pdf, other

    cs.CV

    StereoDRNet: Dilated Residual Stereo Net

    Authors: Rohan Chabra, Julian Straub, Chris Sweeney, Richard Newcombe, Henry Fuchs

    Abstract: We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene. Our proposed depth refinement architecture, predicts view-consistent disparity and occlusion maps that helps the fusion system to produce geometrically consistent reconstructions. We utilize 3D dil… ▽ More

    Submitted 2 June, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at CVPR 2019

  23. arXiv:1901.05103  [pdf, other

    cs.CV

    DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

    Authors: Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove

    Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape re… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  24. arXiv:1809.02057  [pdf

    cs.CV cs.GR

    Surface Light Field Fusion

    Authors: Jeong Joon Park, Richard Newcombe, Steve Seitz

    Abstract: We present an approach for interactively scanning highly reflective objects with a commodity RGBD sensor. In addition to shape, our approach models the surface light field, encoding scene appearance from all directions. By factoring the surface light field into view-independent and wavelength-independent components, we arrive at a representation that can be robustly estimated with IR-equipped comm… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: Project Website: http://grail.cs.washington.edu/projects/slfusion/

    Journal ref: 3DV 2018

  25. arXiv:1512.05471  [pdf, other

    cs.HC

    Breaking the Barriers to True Augmented Reality

    Authors: Christian Sandor, Martin Fuchs, Alvaro Cassinelli, Hao Li, Richard Newcombe, Goshiro Yamamoto, Steven Feiner

    Abstract: In recent years, Augmented Reality (AR) and Virtual Reality (VR) have gained considerable commercial traction, with Facebook acquiring Oculus VR for \$2 billion, Magic Leap attracting more than \$500 million of funding, and Microsoft announcing their HoloLens head-worn computer. Where is humanity headed: a brave new dystopia-or a paradise come true? In this article, we present discussions, which… ▽ More

    Submitted 17 December, 2015; originally announced December 2015.