(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 59 results for author: Mozer, M

.
  1. arXiv:2405.17283  [pdf, other

    cs.LG cs.NE

    Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery

    Authors: Anand Gopalakrishnan, Aleksandar Stanić, Jürgen Schmidhuber, Michael Curtis Mozer

    Abstract: Current state-of-the-art synchrony-based models encode object bindings with complex-valued activations and compute with real-valued weights in feedforward architectures. We argue for the computational advantages of a recurrent architecture with complex-valued weights. We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction: at each iteration, a hidden l… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: minor typo fixed

  2. arXiv:2405.12205  [pdf, other

    cs.AI cs.LG

    Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

    Authors: Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

    Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interac… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  3. arXiv:2403.09613  [pdf, other

    cs.LG cs.CL

    Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

    Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren

    Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, reco… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 19 pages, 18 figures

  4. arXiv:2401.01623  [pdf, other

    cs.AI cs.CL

    Can AI Be as Creative as Humans?

    Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

    Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More

    Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

  5. arXiv:2311.15268  [pdf, other

    cs.LG cs.AI

    Unlearning via Sparse Representations

    Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

    Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  6. arXiv:2310.16228  [pdf, other

    cs.LG cs.CV

    On the Foundations of Shortcut Learning

    Authors: Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

    Abstract: Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example tex… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  7. arXiv:2307.09542  [pdf, other

    cs.LG cs.CV

    Can Neural Network Memorization Be Localized?

    Authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang

    Abstract: Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual lay… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2023

  8. arXiv:2305.19550  [pdf, other

    cs.CV cs.AI cs.LG

    Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

    Authors: Ayush Chakravarthy, Trang Nguyen, Anirudh Goyal, Yoshua Bengio, Michael C. Mozer

    Abstract: The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local patches of an image. The competition has a weak inductive bias to preserve spatial continuity; consequently, one slot may claim patches scattered diffusely throug… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 16 pages, 3 figures, under review at NeurIPS 2023

  9. arXiv:2304.05823  [pdf, other

    q-bio.MN cs.LG q-bio.GN

    DiscoGen: Learning to Discover Gene Regulatory Networks

    Authors: Nan Rosemary Ke, Sara-Jane Dunn, Jorg Bornschein, Silvia Chiappa, Melanie Rey, Jean-Baptiste Lespiau, Albin Cassirer, Jane Wang, Theophane Weber, David Barrett, Matthew Botvinick, Anirudh Goyal, Mike Mozer, Danilo Rezende

    Abstract: Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery meth… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  10. arXiv:2301.11790  [pdf, other

    cs.CV cs.LG stat.ML

    Leveraging the Third Dimension in Contrastive Learning

    Authors: Sumukh Aithal, Anirudh Goyal, Alex Lamb, Yoshua Bengio, Michael Mozer

    Abstract: Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks. Most SSL methods rely on augmentations obtained by transforming the 2D image pixel map. These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment, and that low-level biological vision relies h… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  11. arXiv:2211.10193  [pdf, other

    cs.LG

    Layer-Stack Temperature Scaling

    Authors: Amr Khalifa, Michael C. Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin

    Abstract: Recent works demonstrate that early layers in a neural network contain useful information for prediction. Inspired by this, we show that extending temperature scaling across all layers improves both calibration and accuracy. We call this procedure "layer-stack temperature scaling" (LATES). Informally, LATES grants each layer a weighted vote during inference. We evaluate it on five popular convolut… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 10 pages, 7 figures, 3 tables

    ACM Class: I.2.6; I.2.10

  12. arXiv:2211.05183  [pdf, other

    cs.CV cs.LG

    An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

    Authors: Tyler R. Scott, Ting Liu, Michael C. Mozer, Andrew C. Gallagher

    Abstract: Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separ… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  13. arXiv:2210.03022  [pdf, other

    cs.AI cs.LG

    Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

    Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

    Abstract: In cooperative multi-agent reinforcement learning, a team of agents works together to achieve a common goal. Different environments or tasks may require varying degrees of coordination among agents in order to achieve the goal in an optimal way. The nature of coordination will depend on the properties of the environment -- its spatial layout, distribution of obstacles, dynamics, etc. We term this… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Published at ICLR 2023

  14. arXiv:2207.11240  [pdf, other

    cs.LG cs.AI

    Discrete Key-Value Bottleneck

    Authors: Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf

    Abstract: Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however,… ▽ More

    Submitted 12 June, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: 40th International Conference on Machine Learning (ICML 2023)

  15. arXiv:2206.07764  [pdf, other

    cs.CV cs.LG

    SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

    Authors: Gamaleldin F. Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf

    Abstract: The visual world can be parsimoniously characterized in terms of distinct entities with sparse interactions. Discovering this compositional structure in dynamic visual scenes has proven challenging for end-to-end computer vision approaches unless explicit instance-level supervision is provided. Slot-based models leveraging motion cues have recently shown great promise in learning to represent, seg… ▽ More

    Submitted 23 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Project page at https://slot-attention-video.github.io/savi++/

  16. arXiv:2205.10607  [pdf, other

    cs.AI

    Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

    Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

    Abstract: In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance. To ensure that this… ▽ More

    Submitted 25 May, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

  17. arXiv:2204.04875  [pdf, other

    stat.ML cs.LG

    Learning to Induce Causal Structure

    Authors: Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Anirudh Goyal, Jorg Bornschein, Melanie Rey, Theophane Weber, Matthew Botvinic, Michael Mozer, Danilo Jimenez Rezende

    Abstract: The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data. Most existing causal induction algorithms operate by generating candidate graphs and evaluating them using either score-based methods (including continuous optimization) or independence tests. In our work, we instead treat the inference process as a black box and… ▽ More

    Submitted 7 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  18. arXiv:2203.05782  [pdf, other

    cs.LG q-bio.NC

    Overcoming Temptation: Incentive Design For Intertemporal Choice

    Authors: Shruthi Sukumar, Adrian F. Ward, Camden Elliott-Williams, Shabnam Hakimi, Michael C. Mozer

    Abstract: Individuals are often faced with temptations that can lead them astray from long-term goals. We're interested in developing interventions that steer individuals toward making good initial decisions and then maintaining those decisions over time. In the realm of financial decision making, a particularly successful approach is the prize-linked savings account: individuals are incentivized to make de… ▽ More

    Submitted 14 March, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

  19. arXiv:2202.01334  [pdf, other

    cs.LG cs.AI

    Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization

    Authors: Dianbo Liu, Alex Lamb, Xu Ji, Pascal Notsawo, Mike Mozer, Yoshua Bengio, Kenji Kawaguchi

    Abstract: Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit. It has been theoretically and empirically shown that discretization of representations leads to improved generalization, including in reinforcement learning where discretization can be used to bottleneck multi-agent communication to promote agent specialization and… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  20. arXiv:2201.03529  [pdf, other

    cs.LG cs.CV

    Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

    Authors: Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

    Abstract: Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source mo… ▽ More

    Submitted 25 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: presented at ICML 2022 (Oral)

    Journal ref: ICML 2022, Proceedings of the 39th International Conference on Machine Learning

  21. arXiv:2109.05675  [pdf, other

    cs.CV cs.LG stat.ML

    Online Unsupervised Learning of Visual Representations and Categories

    Authors: Mengye Ren, Tyler R. Scott, Michael L. Iuzzolino, Michael C. Mozer, Richard Zemel

    Abstract: Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised mode… ▽ More

    Submitted 28 May, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Technical report, 32 pages

  22. arXiv:2109.02429  [pdf, other

    stat.ML cs.LG

    Learning Neural Causal Models with Active Interventions

    Authors: Nino Scherrer, Olexa Bilaniuk, Yashas Annadani, Anirudh Goyal, Patrick Schwab, Bernhard Schölkopf, Michael C. Mozer, Yoshua Bengio, Stefan Bauer, Nan Rosemary Ke

    Abstract: Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far, differentiable causal discovery has focused on static datasets of observational or fixed int… ▽ More

    Submitted 5 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

  23. arXiv:2108.00106  [pdf, other

    cs.LG cs.AI

    Soft Calibration Objectives for Neural Networks

    Authors: Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

    Abstract: Optimal decision making requires that classifiers produce uncertainty estimates consistent with their empirical accuracy. However, deep neural networks are often under- or over-confident in their predictions. Consequently, methods have been developed to improve the calibration of their predictive uncertainty both during training and post-hoc. In this work, we propose differentiable losses to impro… ▽ More

    Submitted 7 December, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: 17 pages total, 10 page main paper, 5 page appendix, 10 figures total, 8 figures in main paper, 2 figures in appendix

  24. arXiv:2107.02367  [pdf, other

    cs.LG cs.AI

    Discrete-Valued Neural Communication

    Authors: Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

    Abstract: Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore… ▽ More

    Submitted 10 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

  25. arXiv:2107.00848  [pdf, other

    stat.ML cs.LG

    Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

    Authors: Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

    Abstract: Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables,… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  26. arXiv:2105.07601  [pdf

    astro-ph.SR physics.space-ph

    On the origin of switchbacks observed in the solar wind

    Authors: Forrest S. Mozer, Stuart Bale, John Bonnell, James Drake, Elizabeth Hanson, Michael C. Mozer

    Abstract: The origin of switchbacks in the solar wind is discussed in two classes of theory that differ in the location of the source being either in the transition region near the Sun or in the solar wind, itself. The two classes of theory differ in their predictions of the switchback rate as a function of distance from the Sun. To test these theories, one-hour averages of Parker Solar Probe data were summ… ▽ More

    Submitted 12 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: 10 figures and an appendix with four more figures

  27. arXiv:2103.15718  [pdf, other

    cs.LG cs.CV

    von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning

    Authors: Tyler R. Scott, Andrew C. Gallagher, Michael C. Mozer

    Abstract: Recent work has argued that classification losses utilizing softmax cross-entropy are superior not only for fixed-set classification tasks, but also by outperforming losses developed specifically for open-set tasks including few-shot learning and retrieval. Softmax classifiers have been studied using different embedding geometries -- Euclidean, hyperbolic, and spherical -- and claims have been mad… ▽ More

    Submitted 3 December, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  28. arXiv:2103.07470  [pdf, other

    cs.LG

    Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers

    Authors: Piotr Teterwak, Chiyuan Zhang, Dilip Krishnan, Michael C. Mozer

    Abstract: A discriminatively trained neural net classifier can fit the training data perfectly if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the logit vector. This finding is based on inversion techniques that map deep embeddings back to images. We explore t… ▽ More

    Submitted 21 July, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: Camera Ready ICML 2021

  29. arXiv:2103.01937  [pdf, other

    cs.AI cs.LG stat.ML

    Neural Production Systems: Learning Rule-Governed Visual Dynamics

    Authors: Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

    Abstract: Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent -- that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph… ▽ More

    Submitted 23 March, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: NeurIPS'21

  30. arXiv:2103.01197  [pdf, other

    cs.LG cs.AI stat.ML

    Coordination Among Neural Modules Through a Shared Global Workspace

    Authors: Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio

    Abstract: Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorp… ▽ More

    Submitted 22 March, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: ICLR'22 accepted paper

  31. arXiv:2102.09808  [pdf, other

    cs.LG cs.CV

    Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss

    Authors: Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio

    Abstract: Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in serial stages wherein each layer completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but tra… ▽ More

    Submitted 2 November, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

  32. arXiv:2012.08668  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Mitigating Bias in Calibration Error Estimation

    Authors: Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

    Abstract: For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared. Most research in calibration focuses on techniques to reduce this empirical measure of calibration error, ECE_bin. We instead focus on assessing statistical bias i… ▽ More

    Submitted 10 February, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: To be published in AISTATS 2022. Code is available https://github.com/google-research/google-research/tree/master/caltrain

  33. arXiv:2010.08012  [pdf, other

    cs.LG stat.ML

    Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

    Authors: Alex Lamb, Anirudh Goyal, Agnieszka Słowik, Michael Mozer, Philippe Beaudoin, Yoshua Bengio

    Abstract: Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods whi… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  34. arXiv:2010.06512  [pdf, other

    cs.NE

    Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

    Authors: Maria Attarian, Brett D. Roads, Michael C. Mozer

    Abstract: Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how to bring machine visual representations into better alignment with human representations. Human representations are often inferred from behavioral evidence such as the selection of an image most similar to a query image. We find that with appropriate linear transformation… ▽ More

    Submitted 11 January, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

  35. arXiv:2007.04546  [pdf, other

    cs.LG cs.CV stat.ML

    Wandering Within a World: Online Contextualized Few-Shot Learning

    Authors: Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

    Abstract: We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retriev… ▽ More

    Submitted 22 April, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: ICLR 2021

  36. arXiv:2006.16981  [pdf, other

    cs.LG cs.NE stat.ML

    Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

    Authors: Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

    Abstract: Robust perception relies on both bottom-up and top-down signals. Bottom-up signals consist of what's directly observed through sensation. Top-down signals consist of beliefs and expectations based on past experience and short-term memory, such as how the phrase `peanut butter and~...' will be completed. The optimal combination of bottom-up and top-down information remains an open question, but the… ▽ More

    Submitted 15 November, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  37. arXiv:2006.16225  [pdf, other

    cs.LG stat.ML

    Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

    Authors: Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, Michael Mozer

    Abstract: Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction… ▽ More

    Submitted 12 November, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Type/Token Distinction in Deep learning Framework

  38. arXiv:2002.04193  [pdf, other

    cs.LG stat.ML

    Compositional Embeddings for Multi-Label One-Shot Learning

    Authors: Zeqian Li, Michael C. Mozer, Jacob Whitehill

    Abstract: We present a compositional embedding framework that infers not just a single class per input image, but a set of classes, in the setting of one-shot learning. Specifically, we propose and evaluate several novel models consisting of (1) an embedding function f trained jointly with a "composition" function g that computes set union operations between the classes encoded in two embedding vectors; and… ▽ More

    Submitted 13 November, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  39. arXiv:2002.03206  [pdf, other

    cs.LG stat.ML

    Characterizing Structural Regularities of Labeled Data in Overparameterized Models

    Authors: Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer

    Abstract: Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how indiv… ▽ More

    Submitted 15 June, 2021; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: 17 pages, 20 figures, ICML 2021

  40. arXiv:1910.01075  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Neural Causal Models from Unknown Interventions

    Authors: Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

    Abstract: Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the… ▽ More

    Submitted 23 August, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  41. arXiv:1909.11702  [pdf, other

    stat.ML cs.LG

    Stochastic Prototype Embeddings

    Authors: Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

    Abstract: Supervised deep-embedding methods project inputs of a domain to a representational space in which same-class instances lie near one another and different-class instances lie far apart. We propose a probabilistic method that treats embeddings as random variables. Extending a state-of-the-art deterministic method, Prototypical Networks (Snell et al., 2017), our approach supposes the existence of a c… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: 15 pages, 8 figures

  42. arXiv:1906.11332  [pdf, other

    hep-ph hep-ex

    VBSCan Thessaloniki 2018 Workshop Summary

    Authors: Riccardo Bellan, Jakob Beyer, Carsten Bittrich, Giacomo Boldrini, Ilaria Brivio, Lucrezia Stella Bruni, Diogo Buarque Franzosi, Claude Charlot, Vitaliano Ciulli, Roberto Covarelli, Duje Giljanovic, Giulia Gonella, Pietro Govoni, Philippe Gras, Michele Grossi, Tim Herrmann, Jan Kalinowski, Alexander Karlberg, Kimmo Kallonen, Eirini Kasimi, Aysel Kayis Topaksu, Borut Kersevan, Henning Kirschenmann, Michael Kobel, Konstantinos Kordas , et al. (39 additional authors not shown)

    Abstract: This document reports the first year of activity of the VBSCan COST Action network, as summarised by the talks and discussions happened during the VBSCan Thessaloniki 2018 workshop. The VBSCan COST action is aiming at a consistent and coordinated study of vector-boson scattering from the phenomenological and experimental point of view, for the best exploitation of the data that will be delivered b… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

    Comments: Editors: Lucrezia Stella Bruni, Roberto Covarelli, Pietro Govoni, Piergiulio Lenzi, Narei Lorenzo-Martinez, Joany Manjarres, Matthias Ulrich Mozer, Giacomo Ortona, Mathieu Pellen, Daniela Rebuzzi, Magdalena Slawinska, Marco Zaro. Proceedings for the second annual meeting of the VBSCan COST action

    Report number: VBSCAN-PUB-05-19, DESY 19-108, Nikhef/2019-025, UWThPh 2019-20

  43. arXiv:1906.03504  [pdf, other

    cs.LG cs.NE stat.ML

    Convolutional Bipartite Attractor Networks

    Authors: Michael Iuzzolino, Yoram Singer, Michael C. Mozer

    Abstract: In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network---a recurrent neural net that performs constraint satisfaction, imputation of missing fea… ▽ More

    Submitted 26 September, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

  44. arXiv:1905.11382  [pdf, other

    cs.LG cs.AI stat.ML

    State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

    Authors: Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer

    Abstract: Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We intr… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: ICML 2019 [full oral]. arXiv admin note: text overlap with arXiv:1805.08394

  45. arXiv:1905.10837  [pdf, other

    cs.LG stat.ML

    Sequential mastery of multiple visual tasks: Networks naturally learn to learn and forget to forget

    Authors: Guy Davidson, Michael C. Mozer

    Abstract: We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. This setting corresponds to that which human learners face as they acquire domain expertise serially, for example, as an individual studies a textbook. T… ▽ More

    Submitted 30 March, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

  46. arXiv:1903.01069  [pdf, other

    cs.LG stat.ML

    Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

    Authors: Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

    Abstract: The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity. We use deep-learning methods to investigate whether natural scene statistics might be sufficient to derive the Gestalt laws. We examine the law of closure, which asserts that human visual perception… ▽ More

    Submitted 29 June, 2020; v1 submitted 3 March, 2019; originally announced March 2019.

  47. arXiv:1902.04698  [pdf, other

    stat.ML cs.AI cs.LG

    Identity Crisis: Memorization and Generalization under Extreme Overparameterization

    Authors: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

    Abstract: We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task. We examine fully-connected and convolutional networks (FCN and CNN), both linear and nonlinear, initialized randomly and then trained to minimize the reconstruction error. The trained networks stereotypically take one of two for… ▽ More

    Submitted 8 January, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: ICLR 2020

  48. arXiv:1810.00110  [pdf, other

    cs.LG stat.ML

    Open-Ended Content-Style Recombination Via Leakage Filtering

    Authors: Karl Ridgeway, Michael C. Mozer

    Abstract: We consider visual domains in which a class label specifies the content of an image, and class-irrelevant properties that differentiate instances constitute the style. We present a domain-independent method that permits the open-ended recombination of style of one image with the content of another. Open ended simply means that the method generalizes to style and content not present in the training… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

  49. arXiv:1809.03702  [pdf, other

    cs.LG stat.ML

    Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

    Authors: Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

    Abstract: Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes c… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: To appear as a Spotlight presentation at NIPS 2018

  50. arXiv:1805.08402  [pdf, other

    cs.LG stat.ML

    Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning

    Authors: Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

    Abstract: The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained… ▽ More

    Submitted 27 October, 2018; v1 submitted 22 May, 2018; originally announced May 2018.