(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–22 of 22 results for author: Achille, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2308.12221  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    Critical Learning Periods Emerge Even in Deep Linear Networks

    Authors: Michael Kleinman, Alessandro Achille, Stefano Soatto

    Abstract: Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: ICLR 2024 (Spotlight)

  2. arXiv:2203.16701  [pdf, other

    cs.LG cs.CR stat.ML

    Towards Differential Relational Privacy and its use in Question Answering

    Authors: Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

    Abstract: Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  3. arXiv:2202.12457  [pdf, other

    cs.LG eess.SY stat.ML

    Stacked Residuals of Dynamic Layers for Time Series Anomaly Detection

    Authors: L. Zancato, A. Achille, G. Paolini, A. Chiuso, S. Soatto

    Abstract: We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are mod… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  4. arXiv:2101.06640  [pdf, other

    cs.LG stat.ML

    Estimating informativeness of samples with Smooth Unique Information

    Authors: Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a lin… ▽ More

    Submitted 28 March, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: ICLR 2021, 22 pages

  5. arXiv:2012.11140  [pdf, other

    cs.LG cs.CV stat.ML

    LQF: Linear Quadratic Fine-Tuning

    Authors: Alessandro Achille, Aditya Golatkar, Avinash Ravichandran, Marzia Polito, Stefano Soatto

    Abstract: Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs), typically trained by non-linear fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have led to intere… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  6. arXiv:2010.02459  [pdf, other

    cs.LG cs.IT stat.ML

    Usable Information and Evolution of Optimal Representations During Training

    Authors: Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan C. Kao

    Abstract: We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations… ▽ More

    Submitted 28 February, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: ICLR 2021

  7. arXiv:2008.12478  [pdf, other

    cs.LG stat.ML

    Predicting Training Time Without Training

    Authors: Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

  8. arXiv:2007.11259  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial Training Reduces Information and Improves Transferability

    Authors: Matteo Terzi, Alessandro Achille, Marco Maggipinto, Gian Antonio Susto

    Abstract: Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility. The latter property may seem counter-intuitive as it is widely accepted by the community that classification models should only capture the minimal information (features) required for the task. Motivated by this discrepancy, we inve… ▽ More

    Submitted 15 December, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

  9. arXiv:2003.02960  [pdf, other

    cs.LG cs.CV cs.IT stat.ML

    Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for whic… ▽ More

    Submitted 28 October, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  10. arXiv:2002.04162  [pdf, other

    cs.LG cs.CV stat.ML

    Incremental Meta-Learning via Indirect Discriminant Alignment

    Authors: Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: Majority of the modern meta-learning methods for few-shot classification tasks operate in two phases: a meta-training phase where the meta-learner learns a generic representation by solving multiple few-shot tasks sampled from a large dataset and a testing phase, where the meta-learner leverages its learnt internal representation for a specific few-shot task involving classes which were not seen d… ▽ More

    Submitted 21 April, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  11. arXiv:1911.04933  [pdf, other

    cs.LG stat.ML

    Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose a method for "scrubbing'" the weights clean of information about a particular set of training data. The method… ▽ More

    Submitted 31 March, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Accepted at CVPR 2020

  12. arXiv:1908.01091  [pdf, other

    cs.LG cs.CV stat.ML

    Toward Understanding Catastrophic Forgetting in Continual Learning

    Authors: Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

    Abstract: We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlat… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  13. arXiv:1905.13277  [pdf, other

    cs.LG cs.AI stat.ML

    Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularizatio… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  14. arXiv:1905.12213  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Where is the Information in a Deep Neural Network?

    Authors: Alessandro Achille, Giovanni Paolini, Stefano Soatto

    Abstract: Whatever information a deep neural network has gleaned from training data is encoded in its weights. How this information affects the response of the network to future data remains largely an open question. Indeed, even defining and measuring information entails some subtleties, since a trained network is a deterministic map, so standard information measures can be degenerate. We measure informati… ▽ More

    Submitted 21 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Report number: UCLA-TR:190005

  15. arXiv:1904.03292  [pdf, other

    cs.LG cs.IT stat.ML

    The Information Complexity of Learning Tasks, their Structure and their Distance

    Authors: Alessandro Achille, Giovanni Paolini, Glen Mbeng, Stefano Soatto

    Abstract: We introduce an asymmetric distance in the space of learning tasks, and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine-tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset, and allows distinguishing learnin… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

    Report number: UCLA CSD180003

  16. arXiv:1902.03545  [pdf, other

    cs.LG cs.AI stat.ML

    Task2Vec: Task Embedding for Meta-Learning

    Authors: Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona

    Abstract: We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined over those labels, we process images through a "probe network" and compute an embedding based on estimates of the Fisher information matrix associated with the… ▽ More

    Submitted 10 February, 2019; originally announced February 2019.

  17. arXiv:1810.02440  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamics and Reachability of Learning Tasks

    Authors: Alessandro Achille, Glen Mbeng, Stefano Soatto

    Abstract: We compute the transition probability between two learning tasks, and show that it decomposes into two factors. The first depends on the geometry of the loss landscape of a model trained on each task, independent of any particular model used. This is related to an information theoretic distance function, but is insufficient to predict success in transfer learning, as nearby tasks can be unreachabl… ▽ More

    Submitted 29 May, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

  18. arXiv:1808.06508  [pdf, other

    cs.LG stat.ML

    Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies

    Authors: Alessandro Achille, Tom Eccles, Loic Matthey, Christopher P. Burgess, Nick Watters, Alexander Lerchner, Irina Higgins

    Abstract: Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge. We propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). Based on the Minimum Description Length principle, VASE automati… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

  19. arXiv:1711.08856  [pdf, other

    cs.LG q-bio.NC stat.ML

    Critical Learning Periods in Deep Neural Networks

    Authors: Alessandro Achille, Matteo Rovere, Stefano Soatto

    Abstract: Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the imag… ▽ More

    Submitted 25 February, 2019; v1 submitted 23 November, 2017; originally announced November 2017.

    Report number: UCLA-TR-170017

  20. arXiv:1711.03321  [pdf, ps, other

    stat.ML cs.LG

    A Separation Principle for Control in the Age of Deep Learning

    Authors: Alessandro Achille, Stefano Soatto

    Abstract: We review the problem of defining and inferring a "state" for a control system based on complex, high-dimensional, highly uncertain measurement streams such as videos. Such a state, or representation, should contain all and only the information needed for control, and discount nuisance variability in the data. It should also have finite complexity, ideally modulated depending on available resource… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

  21. arXiv:1706.01350  [pdf, other

    cs.LG cs.AI stat.ML

    Emergence of Invariance and Disentanglement in Deep Representations

    Authors: Alessandro Achille, Stefano Soatto

    Abstract: Using established principles from Statistics and Information Theory, we show that invariance to nuisance factors in a deep neural network is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then decompose the cross-entropy loss used during traini… ▽ More

    Submitted 28 June, 2018; v1 submitted 5 June, 2017; originally announced June 2017.

    Comments: Deep learning, neural network, representation, flat minima, information bottleneck, overfitting, generalization, sufficiency, minimality, sensitivity, information complexity, stochastic gradient descent, regularization, total correlation, PAC-Bayes

  22. arXiv:1611.01353  [pdf, other

    stat.ML cs.LG stat.CO

    Information Dropout: Learning Optimal Representations Through Noisy Computation

    Authors: Alessandro Achille, Stefano Soatto

    Abstract: The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of drop… ▽ More

    Submitted 12 February, 2017; v1 submitted 4 November, 2016; originally announced November 2016.