Search | arXiv e-print repository

Maximum Weight Entropy

Authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we c… ▽ More This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 60 pages, 9 figures, 6 tables

arXiv:2304.04042 [pdf, other]

Deep Anti-Regularized Ensembles provide reliable out-of-distribution uncertainty quantification

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: We consider the problem of uncertainty quantification in high dimensional regression and classification for which deep ensemble have proven to be promising methods. Recent observations have shown that deep ensemble often return overconfident estimates outside the training domain, which is a major limitation because shifted distributions are often encountered in real-life scenarios. The principal c… ▽ More We consider the problem of uncertainty quantification in high dimensional regression and classification for which deep ensemble have proven to be promising methods. Recent observations have shown that deep ensemble often return overconfident estimates outside the training domain, which is a major limitation because shifted distributions are often encountered in real-life scenarios. The principal challenge for this problem is to solve the trade-off between increasing the diversity of the ensemble outputs and making accurate in-distribution predictions. In this work, we show that an ensemble of networks with large weights fitting the training data are likely to meet these two objectives. We derive a simple and practical approach to produce such ensembles, based on an original anti-regularization term penalizing small weights and a control process of the weight increase which maintains the in-distribution loss under an acceptable threshold. The developed approach does not require any out-of-distribution training data neither any trade-off hyper-parameter calibration. We derive a theoretical framework for this approach and show that the proposed optimization can be seen as a "water-filling" problem. Several experiments in both regression and classification settings highlight that Deep Anti-Regularized Ensembles (DARE) significantly improve uncertainty quantification outside the training domain in comparison to recent deep ensembles and out-of-distribution detection methods. All the conducted experiments are reproducible and the source code is available at \url{https://github.com/antoinedemathelin/DARE}. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 26 pages, 9 figures

arXiv:2209.04215 [pdf, other]

Fast and Accurate Importance Weighting for Correcting Sample Bias

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks o… ▽ More Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 16 pages, 3 figures

arXiv:2107.03049 [pdf, other]

ADAPT : Awesome Domain Adaptation Python Toolbox

Authors: Antoine de Mathelin, Mounir Atiq, Guillaume Richard, Alejandro de la Concha, Mouad Yachouti, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.… ▽ More In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.github.io/adapt/ with a substantial gallery of examples. △ Less

Submitted 1 February, 2023; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: 11 pages, 6 figures

arXiv:2103.03757 [pdf, other]

Discrepancy-Based Active Learning for Domain Adaptation

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labelin… ▽ More The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images. △ Less

Submitted 14 September, 2022; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: 32 pages, 15 figures

arXiv:2006.08251 [pdf, other]

Adversarial Weighting for Domain Adaptation in Regression

Authors: Antoine de Mathelin, Guillaume Richard, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation under an assumption of covariate shift. The approach developed in this paper is based on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimiz… ▽ More We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation under an assumption of covariate shift. The approach developed in this paper is based on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimization objective for domain adaptation which relies on a discrepancy distance characterizing the difference between domains according to a specific task and a class of hypotheses. To solve this problem, we develop an adversarial network algorithm which learns both the source weighting scheme and the task in one feed-forward gradient descent. We provide numerical evidence of the relevance of the method on public data sets for regression domain adaptation through reproducible experiments. △ Less

Submitted 15 September, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 8 pages, 6 figures

arXiv:1105.0562 [pdf, other]

Metamodel-based importance sampling for structural reliability analysis

Authors: V. Dubourg, F. Deheeger, B. Sudret

Abstract: Structural reliability methods aim at computing the probability of failure of systems with respect to some prescribed performance functions. In modern engineering such functions usually resort to running an expensive-to-evaluate computational model (e.g. a finite element model). In this respect simulation methods, which may require $10^{3-6}$ runs cannot be used directly. Surrogate models such as… ▽ More Structural reliability methods aim at computing the probability of failure of systems with respect to some prescribed performance functions. In modern engineering such functions usually resort to running an expensive-to-evaluate computational model (e.g. a finite element model). In this respect simulation methods, which may require $10^{3-6}$ runs cannot be used directly. Surrogate models such as quadratic response surfaces, polynomial chaos expansions or kriging (which are built from a limited number of runs of the original model) are then introduced as a substitute of the original model to cope with the computational cost. In practice it is almost impossible to quantify the error made by this substitution though. In this paper we propose to use a kriging surrogate of the performance function as a means to build a quasi-optimal importance sampling density. The probability of failure is eventually obtained as the product of an augmented probability computed by substituting the meta-model for the original performance function and a correction term which ensures that there is no bias in the estimation even if the meta-model is not fully accurate. The approach is applied to analytical and finite element reliability problems and proves efficient up to 100 random variables. △ Less

Submitted 7 May, 2011; v1 submitted 3 May, 2011; originally announced May 2011.

Comments: 20 pages, 7 figures, 2 tables. Preprint submitted to Probabilistic Engineering Mechanics

arXiv:1104.3476 [pdf, other]

Metamodel-based importance sampling for the simulation of rare events

Authors: V. Dubourg, F. Deheeger, B. Sudret

Abstract: In the field of structural reliability, the Monte-Carlo estimator is considered as the reference probability estimator. However, it is still untractable for real engineering cases since it requires a high number of runs of the model. In order to reduce the number of computer experiments, many other approaches known as reliability methods have been proposed. A certain approach consists in replacing… ▽ More In the field of structural reliability, the Monte-Carlo estimator is considered as the reference probability estimator. However, it is still untractable for real engineering cases since it requires a high number of runs of the model. In order to reduce the number of computer experiments, many other approaches known as reliability methods have been proposed. A certain approach consists in replacing the original experiment by a surrogate which is much faster to evaluate. Nevertheless, it is often difficult (or even impossible) to quantify the error made by this substitution. In this paper an alternative approach is developed. It takes advantage of the kriging meta-modeling and importance sampling techniques. The proposed alternative estimator is finally applied to a finite element based structural reliability analysis. △ Less

Submitted 18 April, 2011; originally announced April 2011.

Comments: 8 pages, 3 figures, 1 table. Preprint submitted to ICASP11 Mini-symposia entitled "Meta-models/surrogate models for uncertainty propagation, sensitivity and reliability analysis"

Journal ref: Proceedings of the 11th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP11). Zurich, Switzerland, August 2011

Showing 1–8 of 8 results for author: Deheeger, F