(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–17 of 17 results for author: Gutfreund, D

.
  1. arXiv:2405.20592  [pdf, other

    cs.LG cs.AI

    LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis

    Authors: Amin Heyrani Nobari, Akash Srivastava, Dan Gutfreund, Kai Xu, Faez Ahmed

    Abstract: In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques for solving complex inverse problems in engineering design with discrete and continuous variables. We focus on the path synthesis problem for planar linkage mechanisms. By leveraging a multi-modal and transformation-invariant contrastive learning fra… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2306.15166  [pdf, other

    cs.LG cs.CE

    Learning from Invalid Data: On Constraint Satisfaction in Generative Models

    Authors: Giorgio Giannone, Lyle Regenwetter, Akash Srivastava, Dan Gutfreund, Faez Ahmed

    Abstract: Generative models have demonstrated impressive results in vision, language, and speech. However, even with massive datasets, they struggle with precision, generating physically invalid or factually incorrect data. This is particularly problematic when the generated data must satisfy constraints, for example, to meet product specifications in engineering design or to adhere to the laws of physics i… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  3. arXiv:2302.03744  [pdf, other

    cs.CV

    3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation

    Authors: Guangyao Zhou, Nishad Gothoskar, Lirui Wang, Joshua B. Tenenbaum, Dan Gutfreund, Miguel Lázaro-Gredilla, Dileep George, Vikash K. Mansinghka

    Abstract: The ability to perceive and understand 3D scenes is crucial for many applications in computer vision and robotics. Inverse graphics is an appealing approach to 3D scene understanding that aims to infer the 3D scene structure from 2D images. In this paper, we introduce probabilistic modeling to the inverse graphics framework to quantify uncertainty and achieve robustness in 6D pose estimation tasks… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: ICCV 2023 camera ready

  4. arXiv:2302.02913  [pdf, other

    cs.LG

    Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

    Authors: Lyle Regenwetter, Akash Srivastava, Dan Gutfreund, Faez Ahmed

    Abstract: Deep generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown great promise in a variety of applications, including image and speech synthesis, natural language processing, and drug discovery. However, when applied to engineering design problems, evaluating the performance of these models can be challenging, a… ▽ More

    Submitted 14 October, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  5. arXiv:2208.14567  [pdf, other

    cs.LG cs.RO

    LINKS: A dataset of a hundred million planar linkage mechanisms for data-driven kinematic design

    Authors: Amin Heyrani Nobari, Akash Srivastava, Dan Gutfreund, Faez Ahmed

    Abstract: In this paper, we introduce LINKS, a dataset of 100 million one degree of freedom planar linkage mechanisms and 1.1 billion coupler curves, which is more than 1000 times larger than any existing database of planar mechanisms and is not limited to specific kinds of mechanisms such as four-bars, six-bars, \etc which are typically what most databases include. LINKS is made up of various components in… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Code & Data: https://github.com/ahnobari/LINKS

  6. arXiv:2208.02914  [pdf, other

    cs.AI

    Solving the Baby Intuitions Benchmark with a Hierarchically Bayesian Theory of Mind

    Authors: Tan Zhi-Xuan, Nishad Gothoskar, Falk Pollok, Dan Gutfreund, Joshua B. Tenenbaum, Vikash K. Mansinghka

    Abstract: To facilitate the development of new models to bridge the gap between machine and human social intelligence, the recently proposed Baby Intuitions Benchmark (arXiv:2102.11938) provides a suite of tasks designed to evaluate commonsense reasoning about agents' goals and actions that even young infants exhibit. Here we present a principled Bayesian solution to this benchmark, based on a hierarchicall… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: 6 pages, 2 figures. Presented at the Robotics: Science and Systems 2022 Workshop on Social Intelligence in Humans and Robots

  7. arXiv:2207.03483  [pdf, other

    cs.CV cs.LG cs.RO cs.SD eess.AS

    Finding Fallen Objects Via Asynchronous Audio-Visual Integration

    Authors: Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh McDermott, Antonio Torralba

    Abstract: The way an object looks and sounds provide complementary reflections of its physical properties. In many settings cues from vision and audition arrive asynchronously but must be integrated, as when we hear an object dropped on the floor and then must find it. In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. An object is dropped som… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: CVPR 2022. Project page: http://fallen-object.csail.mit.edu

  8. arXiv:2111.00312  [pdf, other

    cs.CV cs.AI

    3DP3: 3D Scene Perception via Probabilistic Programming

    Authors: Nishad Gothoskar, Marco Cusumano-Towner, Ben Zinberg, Matin Ghavamizadeh, Falk Pollok, Austin Garrett, Joshua B. Tenenbaum, Dan Gutfreund, Vikash K. Mansinghka

    Abstract: We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images. 3DP3 uses (i) voxel models to represent the 3D shape of objects, (ii) hierarchical scene graphs to decompose scenes into objects and the contacts between them, and (iii) depth image likelihoods based on real-time graphics. Given an observed RGB-D image, 3DP3's infe… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

    Comments: NeurIPS 2021

  9. arXiv:2110.10298  [pdf, other

    cs.RO

    Incorporating Rich Social Interactions Into MDPs

    Authors: Ravi Tejwani, Yen-Ling Kuo, Tianmin Shu, Bennett Stankovits, Dan Gutfreund, Joshua B. Tenenbaum, Boris Katz, Andrei Barbu

    Abstract: Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's hidden rewards. This extended Social MDP allows us to encode the five basi… ▽ More

    Submitted 7 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted to the 39th International Conference on Robotics and Automation (ICRA 2022)

  10. arXiv:2103.14025  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

    Authors: Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh McDermott, Antonio Torralba, Joshua B. Tenenbaum

    Abstract: We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desi… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: Project page: http://tdw-transport.csail.mit.edu/

  11. arXiv:2102.12321  [pdf, other

    cs.AI cs.CV cs.LG

    AGENT: A Benchmark for Core Psychological Reasoning

    Authors: Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman

    Abstract: For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraint… ▽ More

    Submitted 25 July, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: ICML 2021, 12 pages, 7 figures

  12. arXiv:2011.03722  [pdf, other

    cs.AI cs.CL cs.LG cs.NE

    Template Controllable keywords-to-text Generation

    Authors: Abhijit Mishra, Md Faisal Mahbub Chowdhury, Sagar Manohar, Dan Gutfreund, Karthik Sankaranarayanan

    Abstract: This paper proposes a novel neural model for the understudied task of generating text from keywords. The model takes as input a set of un-ordered keywords, and part-of-speech (POS) based template instructions. This makes it ideal for surface realization in any NLG setup. The framework is based on the encode-attend-decode paradigm, where keywords and templates are encoded first, and the decoder jud… ▽ More

    Submitted 7 November, 2020; originally announced November 2020.

  13. arXiv:2010.13187  [pdf, other

    stat.ML cs.CV cs.LG

    Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

    Authors: Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaquan Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund

    Abstract: Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 October, 2020; originally announced October 2020.

  14. arXiv:2007.04954  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

    Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

    Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org

  15. arXiv:1911.08051  [pdf, other

    stat.ML cs.LG

    SimVAE: Simulator-Assisted Training forInterpretable Generative Models

    Authors: Akash Srivastava, Jessie Rosenberg, Dan Gutfreund, David D. Cox

    Abstract: This paper presents a simulator-assisted training method (SimVAE) for variational autoencoders (VAE) that leads to a disentangled and interpretable latent space. Training SimVAE is a two-step process in which first a deep generator network(decoder) is trained to approximate the simulator. During this step, the simulator acts as the data source or as a teacher network. Then an inference network (en… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  16. arXiv:1911.00232  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

    Authors: Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva

    Abstract: Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not… ▽ More

    Submitted 27 September, 2021; v1 submitted 1 November, 2019; originally announced November 2019.

  17. arXiv:1909.04743  [pdf, other

    cs.CV

    Reasoning About Human-Object Interactions Through Dual Attention Networks

    Authors: Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

    Abstract: Objects are entities we act upon, where the functionality of an object is determined by how we interact with it. In this work we propose a Dual Attention Network model which reasons about human-object interactions. The dual-attentional framework weights the important features for objects and actions respectively. As a result, the recognition of objects and actions mutually benefit each other. The… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: ICCV 2019