(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–49 of 49 results for author: Kornblith, S

.
  1. arXiv:2312.05355  [pdf, ps, other

    cs.LG cs.CV q-bio.NC

    Neither hype nor gloom do DNNs justice

    Authors: Felix A. Wichmann, Simon Kornblith, Robert Geirhos

    Abstract: Neither the hype exemplified in some exaggerated claims about deep neural networks (DNNs), nor the gloom expressed by Bowers et al. do DNNs as models in vision science justice: DNNs rapidly evolve, and today's limitations are often tomorrow's successes. In addition, providing explanations as well as prediction and image-computability are model desiderata; one should not be favoured at the expense… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Preprint version of a commentary published by Behavioral and Brain Sciences (https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/neither-hype-nor-gloom-do-dnns-justice/639AA5BC7F6E3B91E9B9EC8463D39F77)

  2. arXiv:2311.07864  [pdf, other

    cs.LG cs.CV

    Probing clustering in neural network representations

    Authors: Thao Nguyen, Simon Kornblith

    Abstract: Neural network representations contain structure beyond what was present in the training labels. For instance, representations of images that are visually or semantically similar tend to lie closer to each other than to dissimilar images, regardless of their labels. Clustering these representations can thus provide insights into dataset properties as well as the network internals. In this work, we… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  3. arXiv:2311.07587  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

    Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

    Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More

    Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  4. arXiv:2310.13018  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Getting aligned on representational alignment

    Authors: Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell , et al. (5 additional authors not shown)

    Abstract: Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Working paper, changes to be made in upcoming revisions

  5. arXiv:2309.14322  [pdf, other

    cs.LG

    Small-scale proxies for large-scale Transformer training instabilities

    Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study train… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  6. arXiv:2309.08586  [pdf, other

    cs.CV cs.LG

    Replacing softmax with ReLU in Vision Transformers

    Authors: Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-at… ▽ More

    Submitted 16 October, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

  7. arXiv:2308.01390  [pdf, other

    cs.CV cs.AI cs.LG

    OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

    Authors: Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

    Abstract: We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperpar… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  8. arXiv:2307.16686  [pdf, other

    cs.CV cs.LG

    Guiding Image Captioning Models Toward More Specific Captions

    Authors: Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen

    Abstract: Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not uniquely identify the images they describe. These problems are further exacerbated when models are trained directly on image-alt text pairs collected from the inter… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  9. arXiv:2307.14334  [pdf, other

    cs.CL cs.CV

    Towards Generalist Biomedical AI

    Authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral , et al. (7 additional authors not shown)

    Abstract: Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  10. arXiv:2306.04507  [pdf, other

    cs.CV cs.LG

    Improving neural network representations using human similarity judgments

    Authors: Lukas Muttenthaler, Lorenz Linhardt, Jonas Dippel, Robert A. Vandermeulen, Katherine Hermann, Andrew K. Lampinen, Simon Kornblith

    Abstract: Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning i… ▽ More

    Submitted 26 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Published as a conference paper at NeurIPS 2023

  11. arXiv:2304.08466  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Synthetic Data from Diffusion Models Improves ImageNet Classification

    Authors: Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet

    Abstract: Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  12. arXiv:2301.04644  [pdf, other

    cs.CV

    Does progress on ImageNet transfer to real-world datasets?

    Authors: Alex Fang, Simon Kornblith, Ludwig Schmidt

    Abstract: Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collec… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  13. arXiv:2212.08013  [pdf, other

    cs.CV cs.AI cs.LG

    FlexiViT: One Model for All Patch Sizes

    Authors: Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

    Abstract: Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of w… ▽ More

    Submitted 23 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Code and pre-trained models available at https://github.com/google-research/big_vision. All authors made significant technical contributions. CVPR 2023

  14. arXiv:2212.06925  [pdf, other

    cs.LG stat.ME stat.ML

    On the Relationship Between Explanation and Prediction: A Causal View

    Authors: Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim

    Abstract: Being able to provide explanations for a model's decision has become a central requirement for the development, deployment, and adoption of machine learning models. However, we are yet to understand what explanation methods can and cannot do. How do upstream factors such as data, model prediction, hyperparameters, and random initialization influence downstream explanations? While previous work rai… ▽ More

    Submitted 12 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  15. arXiv:2212.00653  [pdf, other

    cs.CV cs.LG

    Hyperbolic Contrastive Learning for Visual Representations beyond Objects

    Authors: Songwei Ge, Shlok Mishra, Simon Kornblith, Chun-Liang Li, David Jacobs

    Abstract: Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that th… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  16. arXiv:2211.01201  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Human alignment of neural network representations

    Authors: Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen, Simon Kornblith

    Abstract: Today's computer vision models achieve human or near-human level performance across a wide variety of vision tasks. However, their architectures, data, and learning algorithms differ in numerous ways from those that give rise to human vision. In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations i… ▽ More

    Submitted 3 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at ICLR 2023

  17. arXiv:2210.10318  [pdf, other

    cs.LG cs.AI stat.ML

    Gaussian-Bernoulli RBMs Without Tears

    Authors: Renjie Liao, Simon Kornblith, Mengye Ren, David J. Fleet, Geoffrey Hinton

    Abstract: We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations. We propose a novel Gibbs-Langevin sampling algorithm that outperforms existing methods like Gibbs sampling. We propose a modified contrastive divergence (CD) algorithm so that one can generate images with GRBMs starting from noise. This enables direct comparison of… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  18. arXiv:2210.05063  [pdf, other

    cs.CV

    Improving Dense Contrastive Learning with Dense Negative Pairs

    Authors: Berk Iskender, Zhenlin Xu, Simon Kornblith, En-Hung Chu, Maryam Khademi

    Abstract: Many contrastive representation learning methods learn a single global representation of an entire image. However, dense contrastive representation learning methods such as DenseCL (Wang et al., 2021) can learn better representations for tasks requiring stronger spatial localization of features, such as multi-label classification, detection, and segmentation. In this work, we study how to improve… ▽ More

    Submitted 10 January, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

  19. arXiv:2210.03310  [pdf, other

    cs.LG cs.CV cs.NE

    Scaling Forward Gradient With Local Losses

    Authors: Mengye Ren, Simon Kornblith, Renjie Liao, Geoffrey Hinton

    Abstract: Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from high variance when the number of parameters to be learned is large. In this paper, we propose a series of architectural and algorithmic modifications that toget… ▽ More

    Submitted 1 March, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 31 pages, ICLR 2023

  20. arXiv:2208.05592  [pdf, other

    cs.CV cs.LG

    Patching open-vocabulary models by interpolating weights

    Authors: Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

    Abstract: Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method tha… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  21. arXiv:2207.04186  [pdf, other

    cs.CV

    A Study on Self-Supervised Object Detection Pretraining

    Authors: Trung Dang, Simon Kornblith, Huy Thong Nguyen, Peter Chin, Maryam Khademi

    Abstract: In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as bo… ▽ More

    Submitted 10 August, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

  22. Interpretability of artificial neural network models in artificial Intelligence vs. neuroscience

    Authors: Kohitij Kar, Simon Kornblith, Evelina Fedorenko

    Abstract: Computationally explicit hypotheses of brain function derived from machine learning (ML)-based models have recently revolutionized neuroscience. Despite the unprecedented ability of these artificial neural networks (ANNs) to capture responses in biological neural networks (brains), and our full access to all internal model components (unlike the brain), ANNs are often referred to as black-boxes wi… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: 3 pages

    Journal ref: Nat Mach Intell 4, 1065-1067 (2022)

  23. arXiv:2205.11423  [pdf, other

    cs.CV

    Decoder Denoising Pretraining for Semantic Segmentation

    Authors: Emmanuel Brempong Asiedu, Simon Kornblith, Ting Chen, Niki Parmar, Matthias Minderer, Mohammad Norouzi

    Abstract: Semantic segmentation labels are expensive and time consuming to acquire. Hence, pretraining is commonly used to improve the label-efficiency of segmentation models. Typically, the encoder of a segmentation model is pretrained as a classifier and the decoder is randomly initialized. Here, we argue that random initialization of the decoder can be suboptimal, especially when few labeled examples are… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    ACM Class: I.4.6; I.5.4; I.2.10

  24. arXiv:2205.09723  [pdf, other

    cs.CV cs.AI cs.LG

    Robust and Efficient Medical Imaging with Self-Supervision

    Authors: Shekoofeh Azizi, Laura Culp, Jan Freyberg, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen, Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, Umesh Telang, Yun Liu , et al. (9 additional authors not shown)

    Abstract: Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific d… ▽ More

    Submitted 3 July, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

  25. arXiv:2203.05482  [pdf, other

    cs.LG cs.CL cs.CV

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

    Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

    Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low… ▽ More

    Submitted 1 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ICML 2022. The last three authors contributed equally

  26. arXiv:2202.07184  [pdf, other

    cs.LG

    On the Origins of the Block Structure Phenomenon in Neural Network Representations

    Authors: Thao Nguyen, Maithra Raghu, Simon Kornblith

    Abstract: Recent work has uncovered a striking phenomenon in large-capacity neural networks: they contain blocks of contiguous hidden layers with highly similar representations. This block structure has two seemingly contradictory properties: on the one hand, its constituent layers exhibit highly similar dominant first principal components (PCs), but on the other hand, their representations, and their commo… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  27. arXiv:2111.01754  [pdf, other

    cs.LG

    Meta-Learning to Improve Pre-Training

    Authors: Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith, Matthew McDermott, David Duvenaud

    Abstract: Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of representations learned. The hyperparameters intr… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021

  28. arXiv:2110.14739  [pdf, other

    stat.ML cs.LG

    Generalized Shape Metrics on Neural Representations

    Authors: Alex H. Williams, Erin Kunz, Simon Kornblith, Scott W. Linderman

    Abstract: Understanding the operation of biological and artificial networks remains a difficult and important challenge. To identify general principles, researchers are increasingly interested in surveying large collections of networks that are trained on, or biologically adapted to, similar tasks. A standardized set of analysis tools is now needed to identify how network-level covariates -- such as archite… ▽ More

    Submitted 12 January, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: 26 pages, 7 figures, NeurIPS 2021

  29. arXiv:2109.01903  [pdf, other

    cs.CV cs.LG

    Robust fine-tuning of zero-shot models

    Authors: Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

    Abstract: Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple a… ▽ More

    Submitted 21 June, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: CVPR 2022

  30. arXiv:2108.08810  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Do Vision Transformers See Like Convolutional Neural Networks?

    Authors: Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

    Abstract: Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual re… ▽ More

    Submitted 3 March, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

  31. arXiv:2101.05224  [pdf, other

    eess.IV cs.CV cs.LG

    Big Self-Supervised Models Advance Medical Image Classification

    Authors: Shekoofeh Azizi, Basil Mustafa, Fiona Ryan, Zachary Beaver, Jan Freyberg, Jonathan Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, Mohammad Norouzi

    Abstract: Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but has received limited attention in medical image analysis. This paper studies the effectiveness of self-supervised learning as a pretraining strategy for medical image classification. We conduct experiments on two distinct tasks: dermatology skin con… ▽ More

    Submitted 1 April, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

  32. arXiv:2011.11765  [pdf, other

    cs.CV cs.LG

    Boosting Contrastive Self-Supervised Learning with False Negative Cancellation

    Authors: Tri Huynh, Simon Kornblith, Matthew R. Walter, Michael Maire, Maryam Khademi

    Abstract: Self-supervised representation learning has made significant leaps fueled by progress in contrastive learning, which seeks to learn transformations that embed positive input pairs nearby, while pushing negative pairs far apart. While positive pairs can be generated reliably (e.g., as different views of the same image), it is difficult to accurately establish negative pairs, defined as samples from… ▽ More

    Submitted 2 January, 2022; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Code is available at https://github.com/google-research/fnc

  33. arXiv:2011.03037  [pdf, other

    cs.LG

    Teaching with Commentaries

    Authors: Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

    Abstract: Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a fle… ▽ More

    Submitted 11 March, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

    Comments: ICLR 2021

  34. arXiv:2010.16402  [pdf, other

    cs.CV cs.LG

    Why Do Better Loss Functions Lead to Less Transferable Features?

    Authors: Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi

    Abstract: Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. W… ▽ More

    Submitted 3 November, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2021

  35. arXiv:2010.15327  [pdf, other

    cs.LG

    Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

    Authors: Thao Nguyen, Maithra Raghu, Simon Kornblith

    Abstract: A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study t… ▽ More

    Submitted 9 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: ICLR 2021

  36. arXiv:2006.10029  [pdf, other

    cs.LG cs.CV stat.ML

    Big Self-Supervised Models are Strong Semi-Supervised Learners

    Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

    Abstract: One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on Ima… ▽ More

    Submitted 25 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: NeurIPS'2020. Code and pretrained models at https://github.com/google-research/simclr

  37. arXiv:2002.05709  [pdf, other

    cs.LG cs.CV stat.ML

    A Simple Framework for Contrastive Learning of Visual Representations

    Authors: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

    Abstract: This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framewo… ▽ More

    Submitted 30 June, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: ICML'2020. Code and pretrained models at https://github.com/google-research/simclr

  38. arXiv:2002.04197  [pdf, other

    cs.LG stat.ML

    Generalised Lipschitz Regularisation Equals Distributional Robustness

    Authors: Zac Cranko, Zhan Shi, Xinhua Zhang, Richard Nock, Simon Kornblith

    Abstract: The problem of adversarial examples has highlighted the need for a theory of regularisation that is general enough to apply to exotic function classes, such as universal approximators. In response, we give a very general equality result regarding the relationship between distributional robustness and regularisation, as defined with a transportation cost uncertainty set. The theory allows us to (ti… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  39. arXiv:2002.03936  [pdf, other

    cs.LG stat.ML

    Subclass Distillation

    Authors: Rafael Müller, Simon Kornblith, Geoffrey Hinton

    Abstract: After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small "student" model to match these probabilities, it is possible to transfer most of the generalization ability of the teacher to the student, often producing a much bette… ▽ More

    Submitted 10 June, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Under review, corrected citation spelling

  40. arXiv:2002.02959  [pdf, other

    cs.CV cs.LG stat.ML

    Revisiting Spatial Invariance with Low-Rank Local Connectivity

    Authors: Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith

    Abstract: Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias. Locally connected layers, which differ from convolutional layers only in their lack of spatial invariance, usually perform poorly in practice. However, these observations still leave open the possibi… ▽ More

    Submitted 14 August, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Machine Learning, 2020

  41. arXiv:1911.09071  [pdf, other

    cs.CV cs.LG q-bio.NC

    The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

    Authors: Katherine L. Hermann, Ting Chen, Simon Kornblith

    Abstract: Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on Im… ▽ More

    Submitted 3 November, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: NeurIPS'2020

  42. arXiv:1908.07644  [pdf, other

    cs.CV cs.LG stat.ML

    Saccader: Improving Accuracy of Hard Attention Models for Vision

    Authors: Gamaleldin F. Elsayed, Simon Kornblith, Quoc V. Le

    Abstract: Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which uses only relevant portions of the image. However, training hard attention models with only class label supervision is challengin… ▽ More

    Submitted 6 December, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  43. arXiv:1906.02629  [pdf, other

    cs.LG stat.ML

    When Does Label Smoothing Help?

    Authors: Rafael Müller, Simon Kornblith, Geoffrey Hinton

    Abstract: The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification,… ▽ More

    Submitted 10 June, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted at NeurIPS 2019, corrected mutual information formulas

  44. arXiv:1905.11940  [pdf, other

    cs.CV cs.AI cs.LG

    Cerberus: A Multi-headed Derenderer

    Authors: Boyang Deng, Simon Kornblith, Geoffrey Hinton

    Abstract: To generalize to novel visual scenes with new viewpoints and new object poses, a visual system needs representations of the shapes of the parts of an object that are invariant to changes in viewpoint or pose. 3D graphics representations disentangle visual factors such as viewpoints and lighting from object structure in a natural way. It is possible to learn to invert the process that converts 3D g… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  45. arXiv:1905.00414  [pdf, other

    cs.LG q-bio.NC stat.ML

    Similarity of Neural Network Representations Revisited

    Authors: Simon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey Hinton

    Abstract: Recent work has sought to understand the behavior of neural networks by comparing representations between layers and between different trained models. We examine methods for comparing neural network representations based on canonical correlation analysis (CCA). We show that CCA belongs to a family of statistics for measuring multivariate similarity, but that neither CCA nor any other statistic tha… ▽ More

    Submitted 19 July, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  46. arXiv:1811.10725  [pdf, other

    cs.CV

    MIST: Multiple Instance Spatial Transformer Network

    Authors: Baptiste Angles, Yuhe Jin, Simon Kornblith, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: We propose a deep network that can be trained to tackle image reconstruction and classification problems that involve detection of multiple object instances, without any supervision regarding their whereabouts. The network learns to extract the most significant top-K patches, and feeds these patches to a task-specific network -- e.g., auto-encoder or classifier -- to solve a domain specific proble… ▽ More

    Submitted 4 December, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

  47. arXiv:1811.07056  [pdf, other

    cs.CV cs.LG

    Domain Adaptive Transfer Learning with Specialist Models

    Authors: Jiquan Ngiam, Daiyi Peng, Vijay Vasudevan, Simon Kornblith, Quoc V. Le, Ruoming Pang

    Abstract: Transfer learning is a widely used method to build high performing computer vision models. In this paper, we study the efficacy of transfer learning by examining how the choice of data impacts performance. We find that more pre-training data does not always help, and transfer performance depends on a judicious choice of pre-training data. These findings are important given the continued increase i… ▽ More

    Submitted 11 December, 2018; v1 submitted 16 November, 2018; originally announced November 2018.

  48. arXiv:1809.01129  [pdf, ps, other

    stat.ML cs.LG

    Lipschitz Networks and Distributional Robustness

    Authors: Zac Cranko, Simon Kornblith, Zhan Shi, Richard Nock

    Abstract: Robust risk minimisation has several advantages: it has been studied with regards to improving the generalisation properties of models and robustness to adversarial perturbation. We bound the distributionally robust risk for a model class rich enough to include deep neural networks by a regularised empirical risk involving the Lipschitz constant of the model. This allows us to interpretand quantif… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

  49. arXiv:1805.08974  [pdf, other

    cs.CV cs.LG stat.ML

    Do Better ImageNet Models Transfer Better?

    Authors: Simon Kornblith, Jonathon Shlens, Quoc V. Le

    Abstract: Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance… ▽ More

    Submitted 17 June, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: CVPR 2019 Oral