(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 99 results for author: Zemel, R

.
  1. arXiv:2406.14562  [pdf, other

    cs.CL cs.AI cs.CV

    Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

    Authors: Sachit Menon, Richard Zemel, Carl Vondrick

    Abstract: When presented with questions involving visual thinking, humans naturally switch reasoning modalities, often forming mental images or drawing visual aids. Large language models have shown promising results in arithmetic and symbolic reasoning by expressing intermediate reasoning in text as a chain of thought, yet struggle to extend this capability to answer text queries that are easily solved by v… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project website: whiteboard.cs.columbia.edu/

  2. arXiv:2404.19132  [pdf, other

    cs.LG cs.CV

    Integrating Present and Past in Unsupervised Continual Learning

    Authors: Yipeng Zhang, Laurent Charlin, Richard Zemel, Mengye Ren

    Abstract: We formulate a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that are specific to the present and the past data, encompassing stability, plasticity, and cross-task consolidation. The framework reveals that many existing UCL approaches overlook cross-task consolidation and try to balance plasticity and stability in a shared embedding space. Thi… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CoLLAs 2024

  3. arXiv:2404.02323  [pdf, other

    cs.CL

    Toward Informal Language Processing: Knowledge of Slang in Large Language Models

    Authors: Zhewei Sun, Qian Hu, Rahul Gupta, Richard Zemel, Yang Xu

    Abstract: Recent advancement in large language models (LLMs) has offered a strong potential for natural language systems to process informal language. A representative form of informal language is slang, used commonly in daily conversations and online social media. To date, slang has not been comprehensively evaluated in LLMs due partly to the absence of a carefully designed and publicly accessible benchmar… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 main conference

  4. arXiv:2403.01615  [pdf, other

    cs.LG cs.DC

    Partial Federated Learning

    Authors: Tiantian Feng, Anil Ramakrishna, Jimit Majmudar, Charith Peris, Jixuan Wang, Clement Chung, Richard Zemel, Morteza Ziyadi, Rahul Gupta

    Abstract: Federated Learning (FL) is a popular algorithm to train machine learning models on user data constrained to edge devices (for example, mobile phones) due to privacy concerns. Typically, FL is trained with the assumption that no part of the user data can be egressed from the edge. However, in many production settings, specific data-modalities/meta-data are limited to be on device while others are n… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  5. arXiv:2401.00055  [pdf, other

    cs.LG

    Online Algorithmic Recourse by Collective Action

    Authors: Elliot Creager, Richard Zemel

    Abstract: Research on algorithmic recourse typically considers how an individual can reasonably change an unfavorable automated decision when interacting with a fixed decision-making system. This paper focuses instead on the online setting, where system parameters are updated dynamically according to interactions with data subjects. Beyond the typical individual-level recourse, the online setting opens up n… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: Appeared in the ICML 2021 Workshop on Algorithmic Recourse

  6. arXiv:2312.17463  [pdf, other

    cs.LG stat.ML

    Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

    Authors: Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel

    Abstract: Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-for… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  7. arXiv:2312.11779  [pdf, other

    cs.CL cs.AI cs.LG

    Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies

    Authors: Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter, Rahul Gupta

    Abstract: Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM), such as the inability to correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While data scarcity is a known culprit, the precise mechanisms through which scarcity affects this behavior remain underexplored. We discover LLM misgendering is significantly influ… ▽ More

    Submitted 6 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to NAACL 2024 findings

  8. arXiv:2312.07405  [pdf, other

    cs.CL cs.LG

    ICL Markup: Structuring In-Context Learning using Soft-Token Tags

    Authors: Marc-Etienne Brunet, Ashton Anderson, Richard Zemel

    Abstract: Large pretrained language models (LLMs) can be rapidly adapted to a wide variety of tasks via a text-to-text approach, where the instruction and input are fed to the model in natural language. Combined with in-context learning (ICL), this paradigm is impressively flexible and powerful. However, it also burdens users with an overwhelming number of choices, many of them arbitrary. Inspired by markup… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

  9. arXiv:2311.13628  [pdf, other

    cs.LG cs.AI cs.CL

    Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

    Authors: Thomas P. Zollo, Todd Morrill, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel

    Abstract: The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we pro… ▽ More

    Submitted 27 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 34 pages, 10 figures, published as conference paper at ICLR 2024, and accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023

  10. arXiv:2311.09473  [pdf, other

    cs.AI cs.CL

    JAB: Joint Adversarial Prompting and Belief Augmentation

    Authors: Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Jwala Dhamala, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

    Abstract: With the recent surge of language models in different applications, attention to safety and robustness of these models has gained significant importance. Here we introduce a joint framework in which we simultaneously probe and improve the robustness of a black-box target model via adversarial prompting and belief augmentation using iterative feedback loops. This framework utilizes an automated red… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  11. arXiv:2311.04978  [pdf, other

    cs.CL

    On the steerability of large language models toward data-driven personas

    Authors: Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, Aram Galstyan, Richard Zemel, Rahul Gupta

    Abstract: Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs, that can be leveraged to produce multiple perspectives and to reflect the diverse opinions. Moving beyond the traditional reliance on demographics like a… ▽ More

    Submitted 2 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  12. arXiv:2310.15054  [pdf, other

    cs.LG

    Coordinated Replay Sample Selection for Continual Federated Learning

    Authors: Jack Good, Jimit Majmudar, Christophe Dupuy, Jixuan Wang, Charith Peris, Clement Chung, Richard Zemel, Rahul Gupta

    Abstract: Continual Federated Learning (CFL) combines Federated Learning (FL), the decentralized learning of a central model on a number of client devices that may not communicate their data, and Continual Learning (CL), the learning of a model from a continual stream of data without keeping the entire history. In CL, the main challenge is \textit{forgetting} what was learned from past data. While replay-ba… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 7 pages, 6 figures, accepted to EMNLP (industry track)

  13. arXiv:2309.13786  [pdf, other

    cs.LG stat.ML

    Distribution-Free Statistical Dispersion Control for Societal Applications

    Authors: Zhun Deng, Thomas P. Zollo, Jake C. Snell, Toniann Pitassi, Richard Zemel

    Abstract: Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the disp… ▽ More

    Submitted 6 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted by NeurIPS as spotlight (top 3% among submissions)

  14. arXiv:2308.04265  [pdf, other

    cs.AI

    FLIRT: Feedback Loop In-context Red Teaming

    Authors: Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

    Abstract: Warning: this paper contains content that may be inappropriate or offensive. As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. O… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  15. arXiv:2305.09941  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    "I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation

    Authors: Anaelia Ovalle, Palash Goyal, Jwala Dhamala, Zachary Jaggers, Kai-Wei Chang, Aram Galstyan, Richard Zemel, Rahul Gupta

    Abstract: Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. Given the recent popularity and adoption of language generation technologies, the potential to further marginalize this population only grows. Although a multitude of NLP fairness literature focuses on illuminating and addressing gender biases, assessing gender harms for TGNB i… ▽ More

    Submitted 1 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    ACM Class: I.2; I.7; K.4

    Journal ref: 2023 ACM Conference on Fairness, Accountability, and Transparency

  16. arXiv:2304.06197  [pdf, other

    cs.LG physics.flu-dyn

    SURFSUP: Learning Fluid Simulation for Novel Surfaces

    Authors: Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel

    Abstract: Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed dista… ▽ More

    Submitted 8 September, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Website: https://surfsup.cs.columbia.edu/

  17. arXiv:2212.13629  [pdf, other

    cs.LG stat.ML

    Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

    Authors: Jake C. Snell, Thomas P. Zollo, Zhun Deng, Toniann Pitassi, Richard Zemel

    Abstract: Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantile… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

    Comments: 24 pages, 4 figures. Code is available at https://github.com/jakesnell/quantile-risk-control

  18. arXiv:2211.12503  [pdf, other

    cs.CL cs.CV cs.LG cs.MM

    Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

    Authors: Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta

    Abstract: Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benc… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  19. arXiv:2205.13621  [pdf, other

    cs.CL cs.LG

    Differentially Private Decoding in Large Language Models

    Authors: Jimit Majmudar, Christophe Dupuy, Charith Peris, Sami Smaili, Rahul Gupta, Richard Zemel

    Abstract: Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information proce… ▽ More

    Submitted 8 September, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  20. arXiv:2205.00616  [pdf, other

    cs.CL cs.AI

    Semantically Informed Slang Interpretation

    Authors: Zhewei Sun, Richard Zemel, Yang Xu

    Abstract: Slang is a predominant form of informal language making flexible and extended use of words that is notoriously hard for natural language processing systems to interpret. Existing approaches to slang interpretation tend to rely on context but ignore semantic extensions common in slang word usage. We propose a semantically informed slang interpretation (SSI) framework that considers jointly the cont… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Accepted as a long paper at NAACL 2022

  21. arXiv:2204.03558  [pdf, other

    cs.CL

    Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in English, Spanish, and Arabic

    Authors: António Câmara, Nina Taneja, Tamjeed Azad, Emily Allaway, Richard Zemel

    Abstract: As natural language processing systems become more widespread, it is necessary to address fairness issues in their implementation and deployment to ensure that their negative impacts on society are understood and minimized. However, there is limited work that studies fairness using a multilingual and intersectional framework or on downstream tasks. In this paper, we introduce four multilingual Equ… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: LT-EDI 2022

  22. arXiv:2202.06985  [pdf, other

    cs.LG stat.ML

    Deep Ensembles Work, But Are They Necessary?

    Authors: Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, Richard Zemel, John P. Cunningham

    Abstract: Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: nam… ▽ More

    Submitted 13 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

  23. arXiv:2201.10787  [pdf, other

    cs.LG cs.CR

    Variational Model Inversion Attacks

    Authors: Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani

    Abstract: Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  24. arXiv:2112.14754  [pdf, other

    cs.LG cs.CV stat.ML

    Disentanglement and Generalization Under Correlation Shifts

    Authors: Christina M. Funke, Paul Vicol, Kuan-Chieh Wang, Matthias Kümmerer, Richard Zemel, Matthias Bethge

    Abstract: Correlations between factors of variation are prevalent in real-world data. Exploiting such correlations may increase predictive performance on noisy data; however, often correlations are not robust (e.g., they may change between domains, datasets, or applications) and models that exploit them do not generalize when correlations shift. Disentanglement methods aim to learn representations which cap… ▽ More

    Submitted 23 December, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: CoLLAs 2022

  25. arXiv:2110.13223  [pdf, other

    cs.LG cs.CV

    Identifying and Benchmarking Natural Out-of-Context Prediction Problems

    Authors: David Madras, Richard Zemel

    Abstract: Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have recently been introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate ho… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  26. arXiv:2109.05675  [pdf, other

    cs.CV cs.LG stat.ML

    Online Unsupervised Learning of Visual Representations and Categories

    Authors: Mengye Ren, Tyler R. Scott, Michael L. Iuzzolino, Michael C. Mozer, Richard Zemel

    Abstract: Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised mode… ▽ More

    Submitted 28 May, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Technical report, 32 pages

  27. arXiv:2108.04227  [pdf, other

    cs.CV cs.LG

    Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data

    Authors: Jacob Kelly, Richard Zemel, Will Grathwohl

    Abstract: Multi-attribute classification generalizes classification, presenting new challenges for making accurate predictions and quantifying uncertainty. We build upon recent work and show that architectures for multi-attribute prediction can be reinterpreted as energy-based models (EBMs). While existing EBM approaches achieve strong discriminative performance, they are unable to generate samples conditio… ▽ More

    Submitted 19 July, 2021; originally announced August 2021.

  28. arXiv:2106.13435  [pdf, other

    cs.CV cs.LG

    NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

    Authors: Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

    Abstract: In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per… ▽ More

    Submitted 4 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: UAI2021, code at https://github.com/ZENGXH/NPDRAW

  29. arXiv:2105.07029  [pdf, other

    cs.LG cs.CV

    Learning a Universal Template for Few-shot Dataset Generalization

    Authors: Eleni Triantafillou, Hugo Larochelle, Richard Zemel, Vincent Dumoulin

    Abstract: Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from new datasets using only a few examples. To this end, we propose to utilize the diverse training set to construct a universal template: a partial model t… ▽ More

    Submitted 21 June, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

  30. arXiv:2104.11044  [pdf, other

    cs.LG cs.AI stat.ML

    Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

    Authors: James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

    Abstract: Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective. This Monotonic Linear Interpolation (MLI) property, first observed by Goodfellow et al. (2014) persists in spite of the non-convex objectives and highly non-linear training dynamics of neural… ▽ More

    Submitted 23 April, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: 15 pages in main paper, 4 pages of references, 24 pages in appendix. 29 figures in total

  31. A Computational Framework for Slang Generation

    Authors: Zhewei Sun, Richard Zemel, Yang Xu

    Abstract: Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker's word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word whil… ▽ More

    Submitted 22 May, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted for publication in TACL 2021. Author's final version

    Journal ref: Transactions of the Association for Computational Linguistics 2021; 9 462-478

  32. arXiv:2012.07690  [pdf, other

    cs.LG

    A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks

    Authors: Renjie Liao, Raquel Urtasun, Richard Zemel

    Abstract: In this paper, we derive generalization bounds for the two primary classes of graph neural networks (GNNs), namely graph convolutional networks (GCNs) and message passing GNNs (MPGNNs), via a PAC-Bayesian approach. Our result reveals that the maximum node degree and spectral norm of the weights govern the generalization bounds of both models. We also show that our bound for GCNs is a natural gener… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

  33. arXiv:2012.05895  [pdf, other

    cs.LG cs.CV stat.ML

    Probing Few-Shot Generalization with Attributes

    Authors: Mengye Ren, Eleni Triantafillou, Kuan-Chieh Wang, James Lucas, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel

    Abstract: Despite impressive progress in deep learning, generalizing far beyond the training distribution is an important open challenge. In this work, we consider few-shot classification, and aim to shed light on what makes some novel classes easier to learn than others, and what types of learned representations generalize better. To this end, we define a new paradigm in terms of attributes -- simple build… ▽ More

    Submitted 30 May, 2022; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Technical report, 26 pages

  34. arXiv:2011.06485  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Fairness and Robustness in Invariant Learning: A Case Study in Toxicity Classification

    Authors: Robert Adragna, Elliot Creager, David Madras, Richard Zemel

    Abstract: Robustness is of central importance in machine learning and has given rise to the fields of domain generalization and invariant learning, which are concerned with improving performance on a test distribution distinct from but related to the training distribution. In light of recent work suggesting an intimate connection between fairness and robustness, we investigate whether algorithms from robust… ▽ More

    Submitted 1 December, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 12 pages, 5 figures. Appears in the NeurIPS 2020 Workshop on Algorithmic Fairness through the Lens of Causality and Interpretability

  35. arXiv:2010.07249  [pdf, other

    cs.LG cs.AI

    Environment Inference for Invariant Learning

    Authors: Elliot Creager, Jörn-Henrik Jacobsen, Richard Zemel

    Abstract: Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domain-invariant. An important assumption in this area is that the training examples are partitioned into "domains" or… ▽ More

    Submitted 15 July, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

  36. arXiv:2010.07140  [pdf, other

    stat.ML cs.LG math.ST

    Theoretical bounds on estimation error for meta-learning

    Authors: James Lucas, Mengye Ren, Irene Kameni, Toniann Pitassi, Richard Zemel

    Abstract: Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models can be adapted to more realistic settings where train and test distributions differ. Unfortunately, there is severely limited theoretical support for these alg… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 12 pages in main paper,22 pages in appendix,4 figures total

  37. arXiv:2009.04806  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    SketchEmbedNet: Learning Novel Concepts by Imitating Drawings

    Authors: Alexander Wang, Mengye Ren, Richard S. Zemel

    Abstract: Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this gener… ▽ More

    Submitted 22 June, 2021; v1 submitted 27 August, 2020; originally announced September 2020.

    Comments: ICML 2021

  38. arXiv:2008.00104  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

    Authors: Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

    Abstract: Most recommender systems (RS) research assumes that a user's utility can be maximized independently of the utility of the other agents (e.g., other users, content providers). In realistic settings, this is often not true---the dynamics of an RS ecosystem couple the long-term utility of all agents. In this work, we explore settings in which content providers cannot remain viable unless they receive… ▽ More

    Submitted 18 August, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

  39. arXiv:2007.10417  [pdf, other

    cs.LG stat.ML

    Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes

    Authors: Jake Snell, Richard Zemel

    Abstract: Few-shot classification (FSC), the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. Bayesian methods are well-suited to tackling the fundamental issue of overfitting in the few-shot scenario because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data.… ▽ More

    Submitted 21 January, 2021; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: Extended version of accepted ICLR 2021 submission. 34 pages, 9 figures

  40. arXiv:2007.04546  [pdf, other

    cs.LG cs.CV stat.ML

    Wandering Within a World: Online Contextualized Few-Shot Learning

    Authors: Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

    Abstract: We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retriev… ▽ More

    Submitted 22 April, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: ICLR 2021

  41. arXiv:2006.10833  [pdf, other

    cs.LG stat.ML

    Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data

    Authors: Sindy Löwe, David Madras, Richard Zemel, Max Welling

    Abstract: On time-series data, most causal discovery methods fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information which is lost when following this approach. Specifically, different samples may share the dynamics which describe the effects of their causal relations. We propose Amortized Causal Discovery, a novel framework… ▽ More

    Submitted 21 February, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted as a conference paper at CLeaR 2022

  42. arXiv:2004.07780  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Shortcut Learning in Deep Neural Networks

    Authors: Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix A. Wichmann

    Abstract: Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distill how many of deep learning's problems can be seen as different symptoms of the same underlying… ▽ More

    Submitted 21 November, 2023; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: perspective article published at Nature Machine Intelligence (https://doi.org/10.1038/s42256-020-00257-z)

  43. arXiv:2002.05616  [pdf, other

    stat.ML cs.LG

    Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling

    Authors: Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Richard Zemel

    Abstract: We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the… ▽ More

    Submitted 14 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  44. arXiv:1911.02256  [pdf, other

    cs.LG stat.ML

    A Divergence Minimization Perspective on Imitation Learning Methods

    Authors: Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu

    Abstract: In many settings, it is desirable to learn decision-making and control policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Behavioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited se… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: Published at Conference on Robot Learning (CoRL) 2019. For datasets and reproducing results please refer to https://github.com/KamyarGh/rl_swiss/blob/master/reproducing/fmax_paper.md

  45. arXiv:1910.00760  [pdf, other

    cs.LG stat.ML

    Efficient Graph Generation with Graph Recurrent Attention Networks

    Authors: Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L. Hamilton, David Duvenaud, Raquel Urtasun, Richard S. Zemel

    Abstract: We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs). Our model generates graphs one block of nodes and associated edges at a time. The block size and sampling stride allow us to trade off sample quality for efficiency. Compared to previous RNN-based graph generative models, our framework better captures the auto-re… ▽ More

    Submitted 17 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Neural Information Processing Systems (NeurIPS) 2019

  46. arXiv:1909.09141  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Causal Modeling for Fairness in Dynamical Systems

    Authors: Elliot Creager, David Madras, Toniann Pitassi, Richard Zemel

    Abstract: In many application areas---lending, education, and online recommenders, for example---fairness and equity concerns emerge when a machine learning system interacts with a dynamically changing environment to produce both immediate and long-term effects for individuals and demographic groups. We discuss causal directed acyclic graphs (DAGs) as a unifying framework for the recent literature on fairne… ▽ More

    Submitted 6 July, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

  47. arXiv:1906.09427  [pdf, other

    cs.LG stat.ML

    Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models

    Authors: Guangyong Chen, Pengfei Chen, Chang-Yu Hsieh, Chee-Kong Lee, Benben Liao, Renjie Liao, Weiwen Liu, Jiezhong Qiu, Qiming Sun, Jie Tang, Richard Zemel, Shengyu Zhang

    Abstract: We introduce a new molecular dataset, named Alchemy, for developing machine learning models useful in chemistry and material science. As of June 20th 2019, the dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database. The Alchemy dataset expands the volume and diversity of existing molecular datasets. Our ex… ▽ More

    Submitted 22 June, 2019; originally announced June 2019.

    Comments: Authors are listed in alphabetical order

  48. arXiv:1906.02589  [pdf, other

    cs.LG cs.AI stat.ML

    Flexibly Fair Representation Learning by Disentanglement

    Authors: Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel

    Abstract: We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easi… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

    Journal ref: Proceedings of the International Conference on Machine Learning (ICML), 2019

  49. arXiv:1906.01171  [pdf, other

    cs.LG stat.ML

    Understanding the Limitations of Conditional Generative Models

    Authors: Ethan Fetaya, Jörn-Henrik Jacobsen, Will Grathwohl, Richard Zemel

    Abstract: Class-conditional generative models hold promise to overcome the shortcomings of their discriminative counterparts. They are a natural choice to solve discriminative tasks in a robust manner as they jointly optimize for predictive performance and accurate modeling of the input distribution. In this work, we investigate robust classification with likelihood-based generative models from a theoretica… ▽ More

    Submitted 17 February, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

  50. arXiv:1903.10920  [pdf, other

    cs.CV cs.AI cs.LG

    High-Level Perceptual Similarity is Enabled by Learning Diverse Tasks

    Authors: Amir Rosenfeld, Richard Zemel, John K. Tsotsos

    Abstract: Predicting human perceptual similarity is a challenging subject of ongoing research. The visual process underlying this aspect of human vision is thought to employ multiple different levels of visual analysis (shapes, objects, texture, layout, color, etc). In this paper, we postulate that the perception of image similarity is not an explicitly learned capability, but rather one that is a byproduct… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.