Search | arXiv e-print repository

A Primer on Variational Inference for Physics-Informed Deep Generative Modelling

Authors: Alex Glyn-Davies, Arnaud Vadeboncoeur, O. Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami

Abstract: Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central… ▽ More Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2408.01362 [pdf, other]

Autoencoders in Function Space

Authors: Justin Bunker, Mark Girolami, Hefin Lambley, Andrew M. Stuart, T. J. Sullivan

Abstract: Autoencoders have found widespread application, in both their original deterministic form and in their variational formulation (VAEs). In scientific applications it is often of interest to consider data that are comprised of functions; the same perspective is useful in image processing. In practice, discretisation (of differential equations arising in the sciences) or pixellation (of images) rende… ▽ More Autoencoders have found widespread application, in both their original deterministic form and in their variational formulation (VAEs). In scientific applications it is often of interest to consider data that are comprised of functions; the same perspective is useful in image processing. In practice, discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional, but conceiving first of algorithms that operate on functions, and only then discretising or pixellating, leads to better algorithms that smoothly operate between different levels of discretisation or pixellation. In this paper function-space versions of the autoencoder (FAE) and variational autoencoder (FVAE) are introduced, analysed, and deployed. Well-definedness of the objective function governing VAEs is a subtle issue, even in finite dimension, and more so on function space. The FVAE objective is well defined whenever the data distribution is compatible with the chosen generative model; this happens, for example, when the data arise from a stochastic differential equation. The FAE objective is valid much more broadly, and can be straightforwardly applied to data governed by differential equations. Pairing these objectives with neural operator architectures, which can thus be evaluated on any mesh, enables new applications of autoencoders to inpainting, superresolution, and generative modelling of scientific data. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 56 pages, 25 figures

MSC Class: 62G07 (Primary) 65M99; 68T07 (Secondary) ACM Class: I.2.6

arXiv:2405.17955 [pdf, other]

Efficient Prior Calibration From Indirect Data

Authors: O. Deniz Akyildiz, Mark Girolami, Andrew M. Stuart, Arnaud Vadeboncoeur

Abstract: Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model mapping the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator mapping an element of the solution sp… ▽ More Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model mapping the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator mapping an element of the solution space to the data space; a noise model describing how noise pollutes the observations; and a prior model describing knowledge about the unknown parameter before the data is acquired. This paper is concerned with learning the prior model from data; in particular, learning the prior from multiple realizations of indirect data obtained through the noisy observation process. The prior is represented, using a generative model, as the pushforward of a Gaussian in a latent space; the pushforward map is learned by minimizing an appropriate loss function. A metric that is well-defined under empirical approximation is used to define the loss function for the pushforward map to make an implementable methodology. Furthermore, an efficient residual-based neural operator approximation of the forward model is proposed and it is shown that this may be learned concurrently with the pushforward map, using a bilevel optimization formulation of the problem; this use of neural operator approximation has the potential to make prior learning from indirect data more computationally efficient, especially when the observation process is expensive, non-smooth or not known. The ideas are illustrated with the Darcy flow inverse problem of finding permeability from piezometric head measurements. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2311.17598 [pdf, other]

Improving embedding of graphs with missing data by soft manifolds

Authors: Andrea Marinoni, Pietro Lio', Alessandro Barp, Christian Jutten, Mark Girolami

Abstract: Embedding graphs in continous spaces is a key factor in designing and developing algorithms for automatic information extraction to be applied in diverse tasks (e.g., learning, inferring, predicting). The reliability of graph embeddings directly depends on how much the geometry of the continuous space matches the graph structure. Manifolds are mathematical structure that can enable to incorporate… ▽ More Embedding graphs in continous spaces is a key factor in designing and developing algorithms for automatic information extraction to be applied in diverse tasks (e.g., learning, inferring, predicting). The reliability of graph embeddings directly depends on how much the geometry of the continuous space matches the graph structure. Manifolds are mathematical structure that can enable to incorporate in their topological spaces the graph characteristics, and in particular nodes distances. State-of-the-art of manifold-based graph embedding algorithms take advantage of the assumption that the projection on a tangential space of each point in the manifold (corresponding to a node in the graph) would locally resemble a Euclidean space. Although this condition helps in achieving efficient analytical solutions to the embedding problem, it does not represent an adequate set-up to work with modern real life graphs, that are characterized by weighted connections across nodes often computed over sparse datasets with missing records. In this work, we introduce a new class of manifold, named soft manifold, that can solve this situation. In particular, soft manifolds are mathematical structures with spherical symmetry where the tangent spaces to each point are hypocycloids whose shape is defined according to the velocity of information propagation across the data points. Using soft manifolds for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets. Experimental results on reconstruction tasks on synthetic and real datasets show how the proposed approach enable more accurate and reliable characterization of graphs in continuous spaces with respect to the state-of-the-art. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.02766 [pdf, other]

Riemannian Laplace Approximation with the Fisher Metric

Authors: Hanlin Yu, Marcelo Hartmann, Bernardo Williams, Mark Girolami, Arto Klami

Abstract: Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according… ▽ More Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments. △ Less

Submitted 7 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: AISTATS 2024, with additional fixes and improvements

arXiv:2308.08305 [pdf, other]

Warped geometric information on the optimisation of Euclidean functions

Authors: Marcelo Hartmann, Bernardo Williams, Hanlin Yu, Mark Girolami, Alessandro Barp, Arto Klami

Abstract: We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with… ▽ More We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence. △ Less

Submitted 18 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

arXiv:2305.08657 [pdf, other]

Encoding Domain Expertise into Multilevel Models for Source Location

Authors: Lawrence A. Bull, Matthew R. Jones, Elizabeth J. Cross, Andrew Duncan, Mark Girolami

Abstract: Data from populations of systems are prevalent in many industrial applications. Machines and infrastructure are increasingly instrumented with sensing systems, emitting streams of telemetry data with complex interdependencies. In practice, data-centric monitoring procedures tend to consider these assets (and respective models) as distinct -- operating in isolation and associated with independent d… ▽ More Data from populations of systems are prevalent in many industrial applications. Machines and infrastructure are increasingly instrumented with sensing systems, emitting streams of telemetry data with complex interdependencies. In practice, data-centric monitoring procedures tend to consider these assets (and respective models) as distinct -- operating in isolation and associated with independent data. In contrast, this work captures the statistical correlations and interdependencies between models of a group of systems. Utilising a Bayesian multilevel approach, the value of data can be extended, since the population can be considered as a whole, rather than constituent parts. Most interestingly, domain expertise and knowledge of the underlying physics can be encoded in the model at the system, subgroup, or population level. We present an example of acoustic emission (time-of-arrival) mapping for source location, to illustrate how multilevel models naturally lend themselves to representing aggregate systems in engineering. In particular, we focus on constraining the combined models with domain knowledge to enhance transfer learning and enable further insights at the population level. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2303.18059 [pdf, other]

Inferring networks from time series: a neural approach

Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami

Abstract: Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work we present a powerful computational method to infer large network adjacency matrices from time series data… ▽ More Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work we present a powerful computational method to infer large network adjacency matrices from time series data using a neural network, in order to provide uncertainty quantification on the prediction in a manner that reflects both the degree to which the inference problem is underdetermined as well as the noise on the data. This is a feature that other approaches have hitherto been lacking. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from its response to a power cut, providing probability densities on each edge and allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the cut. Our method is significantly more accurate than both Markov-chain Monte Carlo sampling and least squares regression on noisy data and when the problem is underdetermined, while naturally extending to the case of non-linear dynamics, which we demonstrate by learning an entire cost matrix for a non-linear model of economic activity in Greater London. Not having been specifically engineered for network inference, this method in fact represents a general parameter estimation scheme that is applicable to any high-dimensional parameter space. △ Less

Submitted 1 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

MSC Class: 68T07; 49M41; 65K05; 37A50 ACM Class: G.1.6; I.2.8; G.3; J.2

arXiv:2301.11040 [pdf, other]

Random Grid Neural Processes for Parametric Partial Differential Equations

Authors: Arnaud Vadeboncoeur, Ieva Kazlauskaite, Yanni Papandreou, Fehmi Cirak, Mark Girolami, Ömer Deniz Akyildiz

Abstract: We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this… ▽ More We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models. △ Less

Submitted 7 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2211.09196 [pdf, ps, other]

Sobolev Spaces, Kernels and Discrepancies over Hyperspheres

Authors: Simon Hubbert, Emilio Porcu, Chris. J. Oates, Mark Girolami

Abstract: This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicabilit… ▽ More This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicability of cubature algorithms based on Stein's method. We first introduce a suitable characterisation on Sobolev spaces on the $d$-dimensional hypersphere embedded in $(d+1)$-dimensional Euclidean spaces. Our characterisation is based on the Fourier--Schoenberg sequences associated with a given kernel. Such sequences are hard (if not impossible) to compute analytically on $d$-dimensional spheres, but often feasible over Hilbert spheres. We circumvent this problem by finding a projection operator that allows to Fourier mapping from Hilbert into finite dimensional hyperspheres. We illustrate our findings through some parametric families of kernels. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2209.15609 [pdf, other]

$Φふぁい$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation

Authors: Alex Glyn-Davies, Connor Duffin, Ö. Deniz Akyildiz, Mark Girolami

Abstract: Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To… ▽ More Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($Φふぁい$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $Φふぁい$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted. △ Less

Submitted 24 July, 2024; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 29 pages, 9 figures, updated version

arXiv:2209.13565 [pdf, other]

doi 10.1073/pnas.2216415120

Neural parameter calibration for large-scale multi-agent models

Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami

Abstract: Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiolo… ▽ More Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multi-agent models acting as forward solvers for systems of ordinary or stochastic differential equations, and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection, and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a non-convex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster. △ Less

Submitted 31 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Report number: 120, 7 MSC Class: 68T07; 49M41; 65K05; 37A50 ACM Class: G.1.6; I.2.8; G.3; J.2

Journal ref: PNAS 2023

arXiv:2209.12835 [pdf, ps, other]

Targeted Separation and Convergence with Kernel Discrepancies

Authors: Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

Abstract: Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to… ▽ More Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent. △ Less

Submitted 6 December, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2208.04856 [pdf, other]

doi 10.1016/j.jcp.2023.112369

Fully probabilistic deep models for forward and inverse problems in parametric PDEs

Authors: Arnaud Vadeboncoeur, Ömer Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami, Fehmi Cirak

Abstract: We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coheren… ▽ More We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates. △ Less

Submitted 14 July, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2204.12404 [pdf, other]

doi 10.1111/mice.12901

Hierarchical Bayesian Modelling for Knowledge Transfer Across Engineering Fleets via Multitask Learning

Authors: L. A. Bull, D. Di Francesco, M. Dhada, O. Steinert, T. Lindgren, A. K. Parlikad, A. B. Duncan, M. Girolami

Abstract: A population-level analysis is proposed to address data sparsity when building predictive models for engineering infrastructure. Utilising an interpretable hierarchical Bayesian approach and operational fleet data, domain expertise is naturally encoded (and appropriately shared) between different sub-groups, representing (i) use-type, (ii) component, or (iii) operating condition. Specifically, dom… ▽ More A population-level analysis is proposed to address data sparsity when building predictive models for engineering infrastructure. Utilising an interpretable hierarchical Bayesian approach and operational fleet data, domain expertise is naturally encoded (and appropriately shared) between different sub-groups, representing (i) use-type, (ii) component, or (iii) operating condition. Specifically, domain expertise is exploited to constrain the model via assumptions (and prior distributions) allowing the methodology to automatically share information between similar assets, improving the survival analysis of a truck fleet and power prediction in a wind farm. In each asset management example, a set of correlated functions is learnt over the fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets share correlated information at different levels of the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The statistical correlations enable knowledge transfer via Bayesian transfer learning, and the correlations can be inspected to inform which assets share information for which effect (i.e. parameter). Both case studies demonstrate the wide applicability to practical infrastructure monitoring, since the approach is naturally adapted between interpretable fleet models of different in situ examples. △ Less

Submitted 12 May, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

Journal ref: Hierarchical Bayesian modeling for knowledge transfer across engineering fleets via multitask learning (2022) Computer-Aided Civil and Infrastructure Engineering 1-28

arXiv:2203.10592 [pdf, other]

doi 10.1016/bs.host.2022.03.005

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Authors: Alessandro Barp, Lancelot Da Costa, Guilherme França, Karl Friston, Mark Girolami, Michael I. Jordan, Grigorios A. Pavliotis

Abstract: In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving pr… ▽ More In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein. △ Less

Submitted 25 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 30 pages, 4 figures; 42 pages including table of contents and references

Journal ref: Handbook of Statistics, vol. 46, pp. 21--78 (2022)

arXiv:2202.00755 [pdf, other]

Lagrangian Manifold Monte Carlo on Monge Patches

Authors: Marcelo Hartmann, Mark Girolami, Arto Klami

Abstract: The efficiency of Markov Chain Monte Carlo (MCMC) depends on how the underlying geometry of the problem is taken into account. For distributions with strongly varying curvature, Riemannian metrics help in efficient exploration of the target distribution. Unfortunately, they have significant computational overhead due to e.g. repeated inversion of the metric tensor, and current geometric MCMC metho… ▽ More The efficiency of Markov Chain Monte Carlo (MCMC) depends on how the underlying geometry of the problem is taken into account. For distributions with strongly varying curvature, Riemannian metrics help in efficient exploration of the target distribution. Unfortunately, they have significant computational overhead due to e.g. repeated inversion of the metric tensor, and current geometric MCMC methods using the Fisher information matrix to induce the manifold are in practice slow. We propose a new alternative Riemannian metric for MCMC, by embedding the target distribution into a higher-dimensional Euclidean space as a Monge patch and using the induced metric determined by direct geometric reasoning. Our metric only requires first-order gradient information and has fast inverse and determinants, and allows reducing the computational complexity of individual iterations from cubic to quadratic in the problem dimensionality. We demonstrate how Lagrangian Monte Carlo in this metric efficiently explores the target distributions. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2112.04388 [pdf, other]

A graph representation based on fluid diffusion model for data analysis: theoretical aspects and enhanced community detection

Authors: Andrea Marinoni, Christian Jutten, Mark Girolami

Abstract: Representing data by means of graph structures identifies one of the most valid approach to extract information in several data analysis applications. This is especially true when multimodal datasets are investigated, as records collected by means of diverse sensing strategies are taken into account and explored. Nevertheless, classic graph signal processing is based on a model for information pro… ▽ More Representing data by means of graph structures identifies one of the most valid approach to extract information in several data analysis applications. This is especially true when multimodal datasets are investigated, as records collected by means of diverse sensing strategies are taken into account and explored. Nevertheless, classic graph signal processing is based on a model for information propagation that is configured according to heat diffusion mechanism. This system provides several constraints and assumptions on the data properties that might be not valid for multimodal data analysis, especially when large scale datasets collected from heterogeneous sources are considered, so that the accuracy and robustness of the outcomes might be severely jeopardized. In this paper, we introduce a novel model for graph definition based on fluid diffusion. The proposed approach improves the ability of graph-based data analysis to take into account several issues of modern data analysis in operational scenarios, so to provide a platform for precise, versatile, and efficient understanding of the phenomena underlying the records under exam, and to fully exploit the potential provided by the diversity of the records in obtaining a thorough characterization of the data and their significance. In this work, we focus our attention to using this fluid diffusion model to drive a community detection scheme, i.e., to divide multimodal datasets into many groups according to similarity among nodes in an unsupervised fashion. Experimental results achieved by testing real multimodal datasets in diverse application scenarios show that our method is able to strongly outperform state-of-the-art schemes for community detection in multimodal data analysis. △ Less

Submitted 17 October, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: 30 pages, 25 figures

arXiv:2111.10510 [pdf, other]

Bayesian Learning via Neural Schrödinger-Föllmer Flows

Authors: Francisco Vargas, Andrius Ovsianas, David Fernandes, Mark Girolami, Neil D. Lawrence, Nikolas Nüsken

Abstract: In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schrödinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework… ▽ More In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schrödinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models. △ Less

Submitted 25 October, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

arXiv:2105.04150 [pdf, other]

doi 10.1016/j.cma.2021.114085

PeriPy -- A High Performance OpenCL Peridynamics Package

Authors: B. Boys, T. J. Dodwell, M. Hobbs, M. Girolami

Abstract: This paper presents a lightweight, open-source and high-performance python package for solving peridynamics problems in solid mechanics. The development of this solver is motivated by the need for fast analysis tools to achieve the large number of simulations required for `outer-loop' applications, including sensitivity analysis, uncertainty quantification and optimisation. Our python software too… ▽ More This paper presents a lightweight, open-source and high-performance python package for solving peridynamics problems in solid mechanics. The development of this solver is motivated by the need for fast analysis tools to achieve the large number of simulations required for `outer-loop' applications, including sensitivity analysis, uncertainty quantification and optimisation. Our python software toolbox utilises the heterogeneous nature of OpenCL so that it can be executed on any platform with CPU or GPU cores. We illustrate the package use through a range of industrially motivated examples, which should enable other researchers to build on and extend the solver for use in their own applications. Step improvements in execution speed and functionality over existing techniques are presented. A comparison between this solver and an existing OpenCL implementation in the literature is presented, tested on benchmarks with hundreds of thousands to tens of millions of nodes. We demonstrate the scalability of the solver on the GeForce RTX 2080 TiGPU from NVIDIA, and the memory-bound limitations are analysed. In all test cases, the implementation is between 1.4 and 10.0 times faster than a similar existing GPU implementation in the literature. In particular, this improvement has been achieved by utilising local memory on the GPU. △ Less

Submitted 23 August, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: peripy.readthedocs.org

arXiv:2105.03682 [pdf, other]

Enhancing ensemble learning and transfer learning in multimodal data analysis by adaptive dimensionality reduction

Authors: Andrea Marinoni, Saloua Chlaily, Eduard Khachatrian, Torbjørn Eltoft, Sivasakthy Selvakumaran, Mark Girolami, Christian Jutten

Abstract: Modern data analytics take advantage of ensemble learning and transfer learning approaches to tackle some of the most relevant issues in data analysis, such as lack of labeled data to use to train the analysis models, sparsity of the information, and unbalanced distributions of the records. Nonetheless, when applied to multimodal datasets (i.e., datasets acquired by means of multiple sensing techn… ▽ More Modern data analytics take advantage of ensemble learning and transfer learning approaches to tackle some of the most relevant issues in data analysis, such as lack of labeled data to use to train the analysis models, sparsity of the information, and unbalanced distributions of the records. Nonetheless, when applied to multimodal datasets (i.e., datasets acquired by means of multiple sensing techniques or strategies), the state-of-theart methods for ensemble learning and transfer learning might show some limitations. In fact, in multimodal data analysis, not all observations would show the same level of reliability or information quality, nor an homogeneous distribution of errors and uncertainties. This condition might undermine the classic assumptions ensemble learning and transfer learning methods rely on. In this work, we propose an adaptive approach for dimensionality reduction to overcome this issue. By means of a graph theory-based approach, the most relevant features across variable size subsets of the considered datasets are identified. This information is then used to set-up ensemble learning and transfer learning architectures. We test our approach on multimodal datasets acquired in diverse research fields (remote sensing, brain-computer interfaces, photovoltaic energy). Experimental results show the validity and the robustness of our approach, able to outperform state-of-the-art techniques. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 18 pages, 10 figures, submitted to Pattern Recognition

arXiv:2012.07751 [pdf, other]

Near Real-Time Social Distance Estimation in London

Authors: James Walsh, Oluwafunmilola Kesa, Andrew Wang, Mihai Ilas, Patrick O'Hara, Oscar Giles, Neil Dhir, Mark Girolami, Theodoros Damoulas

Abstract: During the COVID-19 pandemic, policy makers at the Greater London Authority, the regional governance body of London, UK, are reliant upon prompt and accurate data sources. Large well-defined heterogeneous compositions of activity throughout the city are sometimes difficult to acquire, yet are a necessity in order to learn 'busyness' and consequently make safe policy decisions. One component of our… ▽ More During the COVID-19 pandemic, policy makers at the Greater London Authority, the regional governance body of London, UK, are reliant upon prompt and accurate data sources. Large well-defined heterogeneous compositions of activity throughout the city are sometimes difficult to acquire, yet are a necessity in order to learn 'busyness' and consequently make safe policy decisions. One component of our project within this space is to utilise existing infrastructure to estimate social distancing adherence by the general public. Our method enables near immediate sampling and contextualisation of activity and physical distancing on the streets of London via live traffic camera feeds. We introduce a framework for inspecting and improving upon existing methods, whilst also describing its active deployment on over 900 real-time feeds. △ Less

Submitted 14 August, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: Version accepted by The Computer Journal

arXiv:2011.14698 [pdf, other]

Bayesian Assessments of Aeroengine Performance with Transfer Learning

Authors: Pranay Seshadri, Andrew Duncan, George Thorne, Geoffrey Parks, Raul Vazquez Diaz, Mark Girolami

Abstract: Aeroengine performance is determined by temperature and pressure profiles along various axial stations within an engine. Given limited sensor measurements both along and between axial stations, we require a statistically principled approach to inferring these profiles. In this paper we detail a Bayesian methodology for interpolating the spatial temperature or pressure profile at axial stations wit… ▽ More Aeroengine performance is determined by temperature and pressure profiles along various axial stations within an engine. Given limited sensor measurements both along and between axial stations, we require a statistically principled approach to inferring these profiles. In this paper we detail a Bayesian methodology for interpolating the spatial temperature or pressure profile at axial stations within an aeroengine. The profile at any given axial station is represented as a spatial Gaussian random field on an annulus, with circumferential variations modelled using a Fourier basis and radial variations modelled with a squared exponential kernel. This Gaussian random field is extended to ingest data from multiple axial measurement planes, with the aim of transferring information across the planes. To facilitate this type of transfer learning, a novel planar covariance kernel is proposed, with hyperparameters that characterise the correlation between any two measurement planes. In the scenario where precise frequencies comprising the temperature field are unknown, we utilise a sparsity-promoting prior on the frequencies to encourage sparse representations. This easily extends to cases with multiple engine planes whilst accommodating frequency variations between the planes. The main quantity of interest, the spatial area average is readily obtained in closed form. We term this the Bayesian area average and demonstrate how this metric offers far more precise averages than a sector area average -- a widely used area averaging approach. Furthermore, the Bayesian area average naturally decomposes the posterior uncertainty into terms characterising insufficient sampling and sensor measurement error respectively. This too provides a significant improvement over prior standard deviation based uncertainty breakdowns. △ Less

Submitted 18 December, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

arXiv:2011.09810 [pdf, other]

Continuous calibration of a digital twin: comparison of particle filter and Bayesian calibration approaches

Authors: Rebecca Ward, Ruchi Choudhary, Alastair Gregory, Melanie Jans-Singh, Mark Girolami

Abstract: Assimilation of continuously streamed monitored data is an essential component of a digital twin; the assimilated data are used to ensure the digital twin is a true representation of the monitored system. One way this is achieved is by calibration of simulation models, whether data-derived or physics-based, or a combination of both. Traditional manual calibration is not possible in this context he… ▽ More Assimilation of continuously streamed monitored data is an essential component of a digital twin; the assimilated data are used to ensure the digital twin is a true representation of the monitored system. One way this is achieved is by calibration of simulation models, whether data-derived or physics-based, or a combination of both. Traditional manual calibration is not possible in this context hence new methods are required for continuous calibration. In this paper, a particle filter methodology for continuous calibration of the physics-based model element of a digital twin is presented and applied to an example of an underground farm. The methodology is applied to a synthetic problem with known calibration parameter values prior to being used in conjunction with monitored data. The proposed methodology is compared against static and sequential Bayesian calibration approaches and compares favourably in terms of determination of the distribution of parameter values and analysis run-times, both essential requirements. The methodology is shown to be potentially useful as a means to ensure continuing model fidelity. △ Less

Submitted 10 May, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

Comments: 23 pages, 19 figures

ACM Class: J.2

arXiv:2001.10818 [pdf, ps, other]

Convergence Guarantees for Gaussian Process Means With Misspecified Likelihoods and Smoothness

Authors: George Wynne, François-Xavier Briol, Mark Girolami

Abstract: Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness o… ▽ More Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness of the model and the likelihood function are misspecified. In this setting, an important theoretical question of practial relevance is how accurate the Gaussian process approximations will be given the difficulty of the problem, our model and the extent of the misspecification. The answer to this problem is particularly useful since it can inform our choice of model and experimental design. In particular, we describe how the experimental design and choice of kernel and kernel hyperparameters can be adapted to alleviate model misspecification. △ Less

Submitted 18 May, 2021; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: Accepted to JMLR

arXiv:1906.08344 [pdf, other]

Multi-resolution Multi-task Gaussian Processes

Authors: Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, Mark Girolami

Abstract: We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We develop a multi-resolution multi-task (MRGP) framework while allowing for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likel… ▽ More We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We develop a multi-resolution multi-task (MRGP) framework while allowing for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likelihood with a composite one and deep GP constructions that naturally handle biases in the mean. By doing so, we generalize and outperform state of the art GP compositions and offer information-theoretic corrections and efficient variational approximations. We demonstrate the competitiveness of MRGPs on synthetic settings and on the challenging problem of hyper-local estimation of air pollution levels across London from multiple sensing modalities operating at disparate spatio-temporal resolutions. △ Less

Submitted 5 November, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1906.08283 [pdf, other]

Minimum Stein Discrepancy Estimators

Authors: Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

Abstract: When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co… ▽ More When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities. △ Less

Submitted 5 October, 2022; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: Accepted for publication at NeurIPS 2019

arXiv:1906.05944 [pdf, other]

Statistical Inference for Generative Models with Maximum Mean Discrepancy

Authors: Francois-Xavier Briol, Alessandro Barp, Andrew B. Duncan, Mark Girolami

Abstract: While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation… ▽ More While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1812.00318 [pdf, ps, other]

Efficiency and robustness in Monte Carlo sampling of 3-D geophysical inversions with Obsidian v0.1.2: Setting up for success

Authors: Richard Scalzo, David Kohn, Hugo Olierook, Gregory Houseman, Rohitash Chandra, Mark Girolami, Sally Cripps

Abstract: The rigorous quantification of uncertainty in geophysical inversions is a challenging problem. Inversions are often ill-posed and the likelihood surface may be multi-modal; properties of any single mode become inadequate uncertainty measures, and sampling methods become inefficient for irregular posteriors or high-dimensional parameter spaces. We explore the influences of different choices made by… ▽ More The rigorous quantification of uncertainty in geophysical inversions is a challenging problem. Inversions are often ill-posed and the likelihood surface may be multi-modal; properties of any single mode become inadequate uncertainty measures, and sampling methods become inefficient for irregular posteriors or high-dimensional parameter spaces. We explore the influences of different choices made by the practitioner on the efficiency and accuracy of Bayesian geophysical inversion methods that rely on Markov chain Monte Carlo sampling to assess uncertainty, using a multi-sensor inversion of the three-dimensional structure and composition of a region in the Cooper Basin of South Australia as a case study. The inversion is performed using an updated version of the Obsidian distributed inversion software. We find that the posterior for this inversion has complex local covariance structure, hindering the efficiency of adaptive sampling methods that adjust the proposal based on the chain history. Within the context of a parallel-tempered Markov chain Monte Carlo scheme for exploring high-dimensional multi-modal posteriors, a preconditioned Crank-Nicholson proposal outperforms more conventional forms of random walk. Aspects of the problem setup, such as priors on petrophysics or on 3-D geological structure, affect the shape and separation of posterior modes, influencing sampling performance as well as the inversion results. Use of uninformative priors on sensor noise can improve inversion results by enabling optimal weighting among multiple sensors even if noise levels are uncertain. Efficiency could be further increased by using posterior gradient information within proposals, which Obsidian does not currently support, but which could be emulated using posterior surrogates. △ Less

Submitted 1 December, 2018; originally announced December 2018.

Comments: 17 pages, 5 figures, 1 table; submitted to Geoscientific Model Development

arXiv:1811.10275 [pdf, ps, other]

Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"

Authors: Francois-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme… ▽ More This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation? △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted to Statistical Science

arXiv:1706.03369 [pdf, other]

On the Sampling Problem for Kernel Quadrature

Authors: Francois-Xavier Briol, Chris J. Oates, Jon Cockayne, Wilson Ye Chen, Mark Girolami

Abstract: The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the… ▽ More The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the random points. In contrast to standard Monte Carlo integration, for which optimal importance sampling is well-understood, the sampling distribution that minimises $C$ for Kernel Quadrature does not admit a closed form. This paper argues that the practical choice of sampling distribution is an important open problem. One solution is considered; a novel automatic approach based on adaptive tempering and sequential Monte Carlo. Empirical results demonstrate a dramatic reduction in integration error of up to 4 orders of magnitude can be achieved with the proposed method. △ Less

Submitted 11 June, 2017; originally announced June 2017.

Comments: To appear at Thirty-fourth International Conference on Machine Learning (ICML 2017)

Journal ref: Proceedings of the 34th International Conference on Machine Learning, PMLR 70:586-595, 2017

arXiv:1705.02891 [pdf, other]

Geometry and Dynamics for Markov Chain Monte Carlo

Authors: Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy, Mark Girolami

Abstract: Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a… ▽ More Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a series of authors through the last thirty years. However, there is currently a gap between the intuitions and knowledge of users of the methodology and our deep understanding of these theoretical foundations. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field which we believe will become increasingly relevant to applied scientists. △ Less

Submitted 8 May, 2017; originally announced May 2017.

Comments: Submitted to "Annual Review of Statistics and Its Applications"

arXiv:1506.01326 [pdf, other]

doi 10.1098/rspa.2015.0142

Probabilistic Numerics and Uncertainty in Computations

Authors: Philipp Hennig, Michael A Osborne, Mark Girolami

Abstract: We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and i… ▽ More We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations. △ Less

Submitted 3 June, 2015; originally announced June 2015.

Comments: Author Generated Postprint. 17 pages, 4 Figures, 1 Table

arXiv:1501.03326 [pdf, ps, other]

Unbiased Bayes for Big Data: Paths of Partial Posteriors

Authors: Heiko Strathmann, Dino Sejdinovic, Mark Girolami

Abstract: A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration… ▽ More A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations -- without explicit simulation from the full posterior. The scheme's variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets. △ Less

Submitted 9 February, 2015; v1 submitted 14 January, 2015; originally announced January 2015.

Comments: 18 pages, 10 figures

arXiv:1403.6678 [pdf, other]

Probabilistic Model Checking of DTMC Models of User Activity Patterns

Authors: Oana Andrei, Muffy Calder, Matthew Higgs, Mark Girolami

Abstract: Software developers cannot always anticipate how users will actually use their software as it may vary from user to user, and even from use to use for an individual user. In order to address questions raised by system developers and evaluators about software usage, we define new probabilistic models that characterise user behaviour, based on activity patterns inferred from actual logged user trace… ▽ More Software developers cannot always anticipate how users will actually use their software as it may vary from user to user, and even from use to use for an individual user. In order to address questions raised by system developers and evaluators about software usage, we define new probabilistic models that characterise user behaviour, based on activity patterns inferred from actual logged user traces. We encode these new models in a probabilistic model checker and use probabilistic temporal logics to gain insight into software usage. We motivate and illustrate our approach by application to the logged user traces of an iOS app. △ Less

Submitted 20 March, 2014; originally announced March 2014.

arXiv:1310.0740 [pdf, ps, other]

Pseudo-Marginal Bayesian Inference for Gaussian Processes

Authors: Maurizio Filippone, Mark Girolami

Abstract: The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal… ▽ More The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses both of these issues. The results presented in this paper show improvements over existing sampling methods to simulate from the posterior distribution over the parameters defining the covariance function of the Gaussian Process prior. This is particularly important as it offers a powerful tool to carry out full Bayesian inference of Gaussian Process based hierarchic statistical models in general. The results also demonstrate that Monte Carlo based integration of all model parameters is actually feasible in this class of models providing a superior quantification of uncertainty in predictions. Extensive comparisons with respect to state-of-the-art probabilistic classifiers confirm this assertion. △ Less

Submitted 7 April, 2014; v1 submitted 2 October, 2013; originally announced October 2013.

Comments: 14 pages double column

arXiv:1206.4666 [pdf]

A Bayesian Approach to Approximate Joint Diagonalization of Square Matrices

Authors: Mingjun Zhong, Mark Girolami

Abstract: We present a Bayesian scheme for the approximate diagonalisation of several square matrices which are not necessarily symmetric. A Gibbs sampler is derived to simulate samples of the common eigenvectors and the eigenvalues for these matrices. Several synthetic examples are used to illustrate the performance of the proposed Gibbs sampler and we then provide comparisons to several other joint diagon… ▽ More We present a Bayesian scheme for the approximate diagonalisation of several square matrices which are not necessarily symmetric. A Gibbs sampler is derived to simulate samples of the common eigenvectors and the eigenvalues for these matrices. Several synthetic examples are used to illustrate the performance of the proposed Gibbs sampler and we then provide comparisons to several other joint diagonalization algorithms, which shows that the Gibbs sampler achieves the state-of-the-art performance on the examples considered. As a byproduct, the output of the Gibbs sampler could be used to estimate the log marginal likelihood, however we employ the approximation based on the Bayesian information criterion (BIC) which in the synthetic examples considered correctly located the number of common eigenvectors. We then succesfully applied the sampler to the source separation problem as well as the common principal component analysis and the common spatial pattern analysis problems. △ Less

Submitted 18 June, 2012; originally announced June 2012.

Comments: ICML2012

Showing 1–37 of 37 results for author: Girolami, M