Search | arXiv e-print repository

RotRNN: Modelling Long Sequences with Rotations

Authors: Rares Dolga, Kai Biegun, Jake Cunningham, David Barber

Abstract: Linear recurrent models, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent… ▽ More Linear recurrent models, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple model with fewer theoretical assumptions than prior works, with a practical implementation that remains faithful to its theoretical derivation, achieving comparable scores to the LRU and SSMs on several long sequence modelling datasets. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Next Generation of Sequence Modeling Architectures Workshop at ICML 2024

arXiv:2406.07457 [pdf, other]

Estimating the Hallucination Rate of Generative AI

Authors: Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P. Cunningham, David Blei

Abstract: This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. W… ▽ More This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. With this perspective, we define a \textit{hallucination} as a generated prediction that has low-probability under the true latent parameter. We develop a new method that takes an ICL problem -- that is, a CGM, a dataset, and a prediction question -- and estimates the probability that a CGM will generate a hallucination. Our method only requires generating queries and responses from the model and evaluating its response log probability. We empirically evaluate our method on synthetic regression and natural language ICL tasks using large language models. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07169 [pdf, other]

RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

Authors: Mirgahney Mohamed, Harry Jake Cunningham, Marc P. Deisenroth, Lourdes Agapito

Abstract: Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expre… ▽ More Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 20 pages, 6 figures

arXiv:2406.04308 [pdf, other]

Approximation-Aware Bayesian Optimization

Authors: Natalie Maus, Kyurae Kim, Geoff Pleiss, David Eriksson, John P. Cunningham, Jacob R. Gardner

Abstract: High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we mo… ▽ More High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03972 [pdf, ps, other]

Eigenpath traversal by Poisson-distributed phase randomisation

Authors: Joseph Cunningham, Jérémie Roland

Abstract: We present a framework for quantum computation, similar to Adiabatic Quantum Computation (AQC), that is based on the quantum Zeno effect. By performing randomised dephasing operations at intervals determined by a Poisson process, we are able to track the eigenspace associated to a particular eigenvalue. We derive a simple differential equation for the fidelity, leading to general theorems boundi… ▽ More We present a framework for quantum computation, similar to Adiabatic Quantum Computation (AQC), that is based on the quantum Zeno effect. By performing randomised dephasing operations at intervals determined by a Poisson process, we are able to track the eigenspace associated to a particular eigenvalue. We derive a simple differential equation for the fidelity, leading to general theorems bounding the time complexity of a whole class of algorithms. We also use eigenstate filtering to optimise the scaling of the complexity in the error tolerance $εいぷしろん$. In many cases the bounds given by our general theorems are optimal, giving a time complexity of $O(1/Δでるた_m)$ with $Δでるた_m$ the minimum of the gap. This allows us to prove optimal results using very general features of problems, minimising the problem-specific insight necessary. As two applications of our framework, we obtain optimal scaling for the Grover problem (i.e.\ $O(\sqrt{N})$ where $N$ is the database size) and the Quantum Linear System Problem (i.e.\ $O(κかっぱ\log(1/εいぷしろん))$ where $κかっぱ$ is the condition number and $εいぷしろん$ the error tolerance) by direct applications of our theorems. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2405.09673 [pdf, other]

LoRA Learns Less and Forgets Less

Authors: Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham

Abstract: Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($\approx$100K prompt-response pairs) a… ▽ More Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($\approx$100K prompt-response pairs) and continued pretraining ($\approx$10B unstructured tokens) data regimes. Our results show that, in most settings, LoRA substantially underperforms full finetuning. Nevertheless, LoRA exhibits a desirable form of regularization: it better maintains the base model's performance on tasks outside the target domain. We show that LoRA provides stronger regularization compared to common techniques such as weight decay and dropout; it also helps maintain more diverse generations. We show that full finetuning learns perturbations with a rank that is 10-100X greater than typical LoRA configurations, possibly explaining some of the reported gaps. We conclude by proposing best practices for finetuning with LoRA. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.01606 [pdf, other]

Improving Trainability of Variational Quantum Circuits via Regularization Strategies

Authors: Jun Zhuang, Jack Cunningham, Chaowen Guan

Abstract: In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddl… ▽ More In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddle points during training. These gradient issues can significantly undermine the trainability of VQC. In this work, we propose a strategy that regularizes model parameters with prior knowledge of the train data and Gaussian noise diffusion. We conduct ablation studies to verify the effectiveness of our strategy across four public datasets and demonstrate that our method can improve the trainability of VQCs against the above-mentioned gradient issues. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: preprint, under review. TL;DR: we propose a regularization strategy to improve the trainability of VQCs

arXiv:2404.14418 [pdf, other]

Mitigating Cascading Effects in Large Adversarial Graph Environments

Authors: James D. Cunningham, Conrad S. Tucker

Abstract: A significant amount of society's infrastructure can be modeled using graph structures, from electric and communication grids, to traffic networks, to social networks. Each of these domains are also susceptible to the cascading spread of negative impacts, whether this be overloaded devices in the power grid or the reach of a social media post containing misinformation. The potential harm of a casc… ▽ More A significant amount of society's infrastructure can be modeled using graph structures, from electric and communication grids, to traffic networks, to social networks. Each of these domains are also susceptible to the cascading spread of negative impacts, whether this be overloaded devices in the power grid or the reach of a social media post containing misinformation. The potential harm of a cascade is compounded when considering a malicious attack by an adversary that is intended to maximize the cascading impact. However, by exploiting knowledge of the cascading dynamics, targets with the largest cascading impact can be preemptively prioritized for defense, and the damage an adversary can inflict can be mitigated. While game theory provides tools for finding an optimal preemptive defense strategy, existing methods struggle to scale to the context of large graph environments because of the combinatorial explosion of possible actions that occurs when the attacker and defender can each choose multiple targets in the graph simultaneously. The proposed method enables a data-driven deep learning approach that uses multi-node representation learning and counterfactual data augmentation to generalize to the full combinatorial action space by training on a variety of small restricted subsets of the action space. We demonstrate through experiments that the proposed method is capable of identifying defense strategies that are less exploitable than SOTA methods for large graphs, while still being able to produce strategies near the Nash equilibrium for small-scale scenarios for which it can be computed. Moreover, the proposed method demonstrates superior prediction accuracy on a validation set of unseen cascades compared to other deep learning approaches. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages, 7 figures

arXiv:2403.02508 [pdf, other]

Collision Avoidance and Geofencing for Fixed-wing Aircraft with Control Barrier Functions

Authors: Tamas G. Molnar, Suresh K. Kannan, James Cunningham, Kyle Dunlap, Kerianne L. Hobbs, Aaron D. Ames

Abstract: Safety-critical failures often have fatal consequences in aerospace control. Control systems on aircraft, therefore, must ensure the strict satisfaction of safety constraints, preferably with formal guarantees of safe behavior. This paper establishes the safety-critical control of fixed-wing aircraft in collision avoidance and geofencing tasks. A control framework is developed wherein a run-time a… ▽ More Safety-critical failures often have fatal consequences in aerospace control. Control systems on aircraft, therefore, must ensure the strict satisfaction of safety constraints, preferably with formal guarantees of safe behavior. This paper establishes the safety-critical control of fixed-wing aircraft in collision avoidance and geofencing tasks. A control framework is developed wherein a run-time assurance (RTA) system modulates the nominal flight controller of the aircraft whenever necessary to prevent it from colliding with other aircraft or crossing a boundary (geofence) in space. The RTA is formulated as a safety filter using control barrier functions (CBFs) with formal guarantees of safe behavior. CBFs are constructed and compared for a nonlinear kinematic fixed-wing aircraft model. The proposed CBF-based controllers showcase the capability of safely executing simultaneous collision avoidance and geofencing, as demonstrated by simulations on the kinematic model and a high-fidelity dynamical model. △ Less

Submitted 6 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Submitted to the IEEE Transactions on Control System Technology. 13 pages, 7 figures

arXiv:2402.06169 [pdf]

Development of an updated, comprehensive food composition database for Australian-grown horticultural commodities

Authors: Eleanor Dunlop, Judy Cunningham, Paul Adorno, Shari Fatupaito, Stuart K Johnson, Lucinda J Black

Abstract: Australian agriculture supplies many horticultural commodities to domestic and international markets; however, food composition data for many commodities are outdated or unavailable. We produced an up-to-date, nationally representative dataset of up to 148 nutrients and related components in 92 Australian-grown fruit (fresh n=39, dried n=6), vegetables (n=43) and nuts (n=4) by replacing outdated d… ▽ More Australian agriculture supplies many horticultural commodities to domestic and international markets; however, food composition data for many commodities are outdated or unavailable. We produced an up-to-date, nationally representative dataset of up to 148 nutrients and related components in 92 Australian-grown fruit (fresh n=39, dried n=6), vegetables (n=43) and nuts (n=4) by replacing outdated data (pre-2000), confirming concentrations of important nutrients and retaining relevant existing data. Primary samples (n = 902) were purchased during peak growing season in Sydney, Melbourne and Perth between June 2021 and May 2022. While new data reflect current growing practices, varieties, climate and analytical methods, few notable differences were found between old and new data where methods of analysis are comparable. The new data will be incorporated into the Australian Food Composition Database, allowing free online access to stakeholders. The approach used could serve as a model for cost-effective updates of national food composition databases worldwide. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 34 pages, 4 tables

arXiv:2401.14581 [pdf, other]

AVELA -- A Vision for Engineering Literacy & Access: Understanding Why Technology Alone Is Not Enough

Authors: Kyle Johnson, Vicente Arroyos, Celeste Garcia, Liban Hussein, Aisha Cora, Tsewone Melaku, Jay L. Cunningham, R. Benjamin Shapiro, Vikram Iyer

Abstract: Unequal technology access for Black and Latine communities has been a persistent economic, social justice, and human rights issue despite increased technology accessibility due to advancements in consumer electronics like phones, tablets, and computers. We contextualize socio-technical access inequalities for Black and Latine urban communities and find that many students are hesitant to engage wit… ▽ More Unequal technology access for Black and Latine communities has been a persistent economic, social justice, and human rights issue despite increased technology accessibility due to advancements in consumer electronics like phones, tablets, and computers. We contextualize socio-technical access inequalities for Black and Latine urban communities and find that many students are hesitant to engage with available technologies due to a lack of engaging support systems. We present a holistic student-led STEM engagement model through AVELA - A Vision for Engineering Literacy and Access leveraging culturally responsive lessons, mentor embodied community representation, and service learning. To evaluate the model's impact after 4 years of mentoring 200+ university student instructors in teaching to 2,500+ secondary school students in 100+ classrooms, we conducted 24 semi-structured interviews with college AnonymizedOrganization members. We identify access barriers and provide principled recommendations for designing future STEM education programs. △ Less

Submitted 29 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: This is the author's version of the work. It is posted here for personal use, not for redistribution

arXiv:2401.07473 [pdf]

Vitamin K content of Australian-grown horticultural commodities

Authors: Eleanor Dunlop, Judy Cunningham, Paul Adorno, Georgios Dabos, Stuart K Johnson, Lucinda J Black

Abstract: Vitamin K is emerging as a multi-function vitamin that plays a role in bone, brain and vascular health. Vitamin K composition data remain limited globally and Australia has lacked nationally representative data for vitamin K1 (phylloquinone, PK) in horticultural commodities. Primary samples (n = 927) of 90 different Australian-grown fruit, vegetable and nut commodities were purchased in three Aust… ▽ More Vitamin K is emerging as a multi-function vitamin that plays a role in bone, brain and vascular health. Vitamin K composition data remain limited globally and Australia has lacked nationally representative data for vitamin K1 (phylloquinone, PK) in horticultural commodities. Primary samples (n = 927) of 90 different Australian-grown fruit, vegetable and nut commodities were purchased in three Australian cities. We measured PK in duplicate in 95 composite samples using liquid chromatography with electrospray ionisation-tandem mass spectrometry. The greatest mean concentrations of PK were found in kale (565 ug/100 g), baby spinach (255 ug/100 g) and Brussels sprouts (195 ug/100 g). The data contribute to the global collection of vitamin K food composition data. They add to the evidence that PK concentrations vary markedly between geographic regions, supporting development of region-specific datasets for national food composition databases that do not yet contain data for vitamin K. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 22 pages, 2 tables

arXiv:2306.17775 [pdf, other]

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

Authors: Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham

Abstract: Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requir… ▽ More Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: Code: https://github.com/blt2114/twisted_diffusion_sampler

arXiv:2305.16006 [pdf, other]

doi 10.1063/5.0207929

Transport of skyrmions by surface acoustic waves

Authors: Jintao Shuai, Luis Lopez-Diaz, John E. Cunningham, Thomas A. Moore

Abstract: Magnetic skyrmions in thin films with perpendicular magnetic anisotropy are promising candidates for magnetic memory and logic devices, making the development of ways to transport skyrmions efficiently and precisely of significant interest. Here, we investigate the transport of skyrmions by surface acoustic waves (SAWs) via several modalities using micromagnetic simulations. We show skyrmion pinni… ▽ More Magnetic skyrmions in thin films with perpendicular magnetic anisotropy are promising candidates for magnetic memory and logic devices, making the development of ways to transport skyrmions efficiently and precisely of significant interest. Here, we investigate the transport of skyrmions by surface acoustic waves (SAWs) via several modalities using micromagnetic simulations. We show skyrmion pinning sites created by standing SAWs at anti-nodes and skyrmion Hall-like motion without pinning driven by travelling SAWs. We also show how orthogonal SAWs formed by combining a longitudinal travelling SAW and a transverse standing SAW can be used for the 2D positioning of skyrmions. Our results also suggest SAWs offer a viable approach to the transport of multiple skyrmions along multichannel racetrack. △ Less

Submitted 8 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Appl. Phys. Lett. 124, 202407 (2024)

arXiv:2304.09150 [pdf, other]

Constraints on Europa's water group torus from HST/COS observations

Authors: Lorenz Roth, H. Todd Smith, Kazuo Yoshioka, Tracy M. Becker, Aljona Blöcker, Nathaniel J. Cunningham, Nickolay Ivchenko, Kurt D. Retherford, Joachim Saur, Michael Velez, Fuminori Tsuchiya

Abstract: In-situ plasma measurements as well as remote mapping of energetic neutral atoms around Jupiter provide indirect evidence that an enhancement of neutral gas is present near the orbit of the moon Europa. Simulations suggest that such a neutral gas torus can be sustained by escape from Europa's atmosphere and consists primarily of molecular hydrogen, but the neutral gas torus has not yet been measur… ▽ More In-situ plasma measurements as well as remote mapping of energetic neutral atoms around Jupiter provide indirect evidence that an enhancement of neutral gas is present near the orbit of the moon Europa. Simulations suggest that such a neutral gas torus can be sustained by escape from Europa's atmosphere and consists primarily of molecular hydrogen, but the neutral gas torus has not yet been measured directly through emissions or in-situ. Here we present observations by the Cosmic Origins Spectrograph of the Hubble Space Telescope (HST/COS) from 2020 and 2021, which scanned the equatorial plane between 8 and 10 planetary radii west of Jupiter. No neutral gas emissions are detected. We derive upper limits on the emissions and compare these to modelled emissions from electron impact and resonant scattering using a Europa torus Monte Carlo model for the neutral gases. The comparison supports the previous findings that the torus is dilute and primarily consists of molecular hydrogen. A detection of sulfur ion emissions radially inward of the Europa orbit is consistent with emissions from the extended Io torus and with sulfur ion fractional abundances as previously detected. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.05091 [pdf, other]

Actually Sparse Variational Gaussian Processes

Authors: Harry Jake Cunningham, Daniel Augusto de Souza, So Takao, Mark van der Wilk, Marc Peter Deisenroth

Abstract: Gaussian processes (GPs) are typically criticised for their unfavourable scaling in both computational and memory requirements. For large datasets, sparse GPs reduce these demands by conditioning on a small set of inducing variables designed to summarise the data. In practice however, for large datasets requiring many inducing variables, such as low-lengthscale spatial data, even sparse GPs can be… ▽ More Gaussian processes (GPs) are typically criticised for their unfavourable scaling in both computational and memory requirements. For large datasets, sparse GPs reduce these demands by conditioning on a small set of inducing variables designed to summarise the data. In practice however, for large datasets requiring many inducing variables, such as low-lengthscale spatial data, even sparse GPs can become computationally expensive, limited by the number of inducing variables one can use. In this work, we propose a new class of inter-domain variational GP, constructed by projecting a GP onto a set of compactly supported B-spline basis functions. The key benefit of our approach is that the compact support of the B-spline basis functions admits the use of sparse linear algebra to significantly speed up matrix operations and drastically reduce the memory footprint. This allows us to very efficiently model fast-varying spatial phenomena with tens of thousands of inducing variables, where previous approaches failed. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 14 pages, 5 figures, published in AISTATS 2023

arXiv:2302.00704 [pdf, other]

Pathologies of Predictive Diversity in Deep Ensembles

Authors: Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, John P. Cunningham

Abstract: Classic results establish that encouraging predictive diversity improves performance in ensembles of low-capacity models, e.g. through bagging or boosting. Here we demonstrate that these intuitions do not apply to high-capacity neural network ensembles (deep ensembles), and in fact the opposite is often true. In a large scale study of nearly 600 neural network classification ensembles, we examine… ▽ More Classic results establish that encouraging predictive diversity improves performance in ensembles of low-capacity models, e.g. through bagging or boosting. Here we demonstrate that these intuitions do not apply to high-capacity neural network ensembles (deep ensembles), and in fact the opposite is often true. In a large scale study of nearly 600 neural network classification ensembles, we examine a variety of interventions that trade off component model performance for predictive diversity. While such interventions can improve the performance of small neural network ensembles (in line with standard intuitions), they harm the performance of the large neural network ensembles most often used in practice. Surprisingly, we also find that discouraging predictive diversity is often benign in large-network ensembles, fully inverting standard intuitions. Even when diversity-promoting interventions do not sacrifice component model performance (e.g. using heterogeneous architectures and training paradigms), we observe an opportunity cost associated with pursuing increased predictive diversity. Examining over 1000 ensembles, we observe that the performance benefits of diverse architectures/training procedures are easily dwarfed by the benefits of simply using higher-capacity models, despite the fact that such higher capacity models often yield significantly less predictive diversity. Overall, our findings demonstrate that standard intuitions around predictive diversity, originally developed for low-capacity ensembles, do not directly apply to modern high-capacity deep ensembles. This work clarifies fundamental challenges to the goal of improving deep ensembles by making them more diverse, while suggesting an alternative path: simply forming ensembles from ever more powerful (and less diverse) component models. △ Less

Submitted 9 January, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

Comments: now published in Transactions on Machine Learning Research

arXiv:2301.00537 [pdf, other]

Posterior Collapse and Latent Variable Non-identifiability

Authors: Yixin Wang, David M. Blei, John P. Cunningham

Abstract: Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful re… ▽ More Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data. △ Less

Submitted 2 January, 2023; originally announced January 2023.

Comments: 19 pages, 4 figures; NeurIPS 2021

arXiv:2212.01265 [pdf, other]

Denoising Deep Generative Models

Authors: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Luhuan Wu, John P. Cunningham, Jesse C. Cresswell, Anthony L. Caterini

Abstract: Likelihood-based deep generative models have recently been shown to exhibit pathological behaviour under the manifold hypothesis as a consequence of using high-dimensional densities to model data with low-dimensional structure. In this paper we propose two methodologies aimed at addressing this problem. Both are based on adding Gaussian noise to the data to remove the dimensionality mismatch durin… ▽ More Likelihood-based deep generative models have recently been shown to exhibit pathological behaviour under the manifold hypothesis as a consequence of using high-dimensional densities to model data with low-dimensional structure. In this paper we propose two methodologies aimed at addressing this problem. Both are based on adding Gaussian noise to the data to remove the dimensionality mismatch during training, and both provide a denoising mechanism whose goal is to sample from the model as though no noise had been added to the data. Our first approach is based on Tweedie's formula, and the second on models which take the variance of added noise as a conditional input. We show that surprisingly, while well motivated, these approaches only sporadically improve performance over not adding noise, and that other methods of addressing the dimensionality mismatch are more empirically adequate. △ Less

Submitted 4 January, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

Comments: NeurIPS 2022 ICBINB workshop (spotlight)

arXiv:2209.06066 [pdf]

doi 10.1364/OE.477664

Efficient free-space to chip coupling of ultrafast sub-ps THz pulse for biomolecule fingerprint sensing

Authors: Yanbing Qiu, Kun Meng, Wanling Wang, Jing Chen, John Cunningham, Ian Robertson, Binbin Hong, Guo Ping Wang

Abstract: Ultrafast sub-ps THz pulse conveys rich distinctive spectral fingerprints related to the vibrational or rotational modes of biomolecules and can be used to resolve the time-dependent dynamics of the motions. Thus, an efficient platform for enhancing the THz light-matter interaction is strongly demanded. Waveguides, owing to their tightly spatial confinement of the electromagnetic fields and the lo… ▽ More Ultrafast sub-ps THz pulse conveys rich distinctive spectral fingerprints related to the vibrational or rotational modes of biomolecules and can be used to resolve the time-dependent dynamics of the motions. Thus, an efficient platform for enhancing the THz light-matter interaction is strongly demanded. Waveguides, owing to their tightly spatial confinement of the electromagnetic fields and the longer interaction distance, are promising platforms. However, the efficient feeding of the sub-ps THz pulse to the waveguides remains challenging due to the ultra-wide bandwidth property of the ultrafast signal. We propose a sensing chip comprised of a pair of back-to-back Vivaldi antennas and a 90° bent slotline waveguide to overcome the challenge. The effective operating bandwidth of the sensing chip ranges from 0.2 to 1.15 THz, with the free-space to chip coupling efficiency up to 50%. Over the entire band, the THz signal is 42.44 dBでしべる above the noise level with a peak of 73.40 dBでしべる. To take advantages of the efficient sensing chip, we have measured the characteristic fingerprint of αあるふぁ-lactose monohydrate, and a sharp absorption dip at near 0.53 THz has been successfully observed demonstrating the accuracy of the proposed solution. The proposed sensing chip has the advantages of efficient in-plane coupling, ultra-wide bandwidth, easy integration and fabrication, large-scale manufacturing capability, and cost-effective, and can be a strong candidate for THz light-matter interaction platform. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Corresponding authors: Binbin Hong's E-mail: b.hong@szu.edu.cn; Guo Ping Wang's E-mail: gpwang@szu.edu.cn

arXiv:2209.02580 [pdf, other]

Design of the ECCE Detector for the Electron Ion Collider

Authors: J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann, M. H. S. Bukhari, A. Bylinkin, R. Capobianco , et al. (259 additional authors not shown)

Abstract: The EIC Comprehensive Chromodynamics Experiment (ECCE) detector has been designed to address the full scope of the proposed Electron Ion Collider (EIC) physics program as presented by the National Academy of Science and provide a deeper understanding of the quark-gluon structure of matter. To accomplish this, the ECCE detector offers nearly acceptance and energy coverage along with excellent track… ▽ More The EIC Comprehensive Chromodynamics Experiment (ECCE) detector has been designed to address the full scope of the proposed Electron Ion Collider (EIC) physics program as presented by the National Academy of Science and provide a deeper understanding of the quark-gluon structure of matter. To accomplish this, the ECCE detector offers nearly acceptance and energy coverage along with excellent tracking and particle identification. The ECCE detector was designed to be built within the budget envelope set out by the EIC project while simultaneously managing cost and schedule risks. This detector concept has been selected to be the basis for the EIC project detector. △ Less

Submitted 11 May, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 32 pages, 29 figures, 9 tables

arXiv:2208.14575 [pdf, other]

doi 10.1016/j.nima.2023.168238

Detector Requirements and Simulation Results for the EIC Exclusive, Diffractive and Tagging Physics Program using the ECCE Detector Concept

Authors: A. Bylinkin, C. T. Dean, S. Fegan, D. Gangadharan, K. Gates, S. J. D. Kay, I. Korover, W. B. Li, X. Li, R. Montgomery, D. Nguyen, G. Penman, J. R. Pybus, N. Santiesteban, R. Trotta, A. Usman, M. D. Baker, J. Frantz, D. I. Glazier, D. W. Higinbotham, T. Horn, J. Huang, G. Huber, R. Reed, J. Roche , et al. (258 additional authors not shown)

Abstract: This article presents a collection of simulation studies using the ECCE detector concept in the context of the EIC's exclusive, diffractive, and tagging physics program, which aims to further explore the rich quark-gluon structure of nucleons and nuclei. To successfully execute the program, ECCE proposed to utilize the detecter system close to the beamline to ensure exclusivity and tag ion beam/fr… ▽ More This article presents a collection of simulation studies using the ECCE detector concept in the context of the EIC's exclusive, diffractive, and tagging physics program, which aims to further explore the rich quark-gluon structure of nucleons and nuclei. To successfully execute the program, ECCE proposed to utilize the detecter system close to the beamline to ensure exclusivity and tag ion beam/fragments for a particular reaction of interest. Preliminary studies confirmed the proposed technology and design satisfy the requirements. The projected physics impact results are based on the projected detector performance from the simulation at 10 or 100 fb^-1 of integrated luminosity. Additionally, a few insights on the potential 2nd Interaction Region can (IR) were also documented which could serve as a guidepost for the future development of a second EIC detector. △ Less

Submitted 6 March, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

arXiv:2207.10893 [pdf, other]

doi 10.1016/j.nima.2023.168458

ECCE unpolarized TMD measurements

Authors: R. Seidl, A. Vladimirov, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann, M. H. S. Bukhari , et al. (258 additional authors not shown)

Abstract: We performed feasibility studies for various measurements that are related to unpolarized TMD distribution and fragmentation functions. The processes studied include semi-inclusive Deep inelastic scattering (SIDIS) where single hadrons (pions and kaons) were detected in addition to the scattered DIS lepton. The single hadron cross sections and multiplicities were extracted as a function of the DIS… ▽ More We performed feasibility studies for various measurements that are related to unpolarized TMD distribution and fragmentation functions. The processes studied include semi-inclusive Deep inelastic scattering (SIDIS) where single hadrons (pions and kaons) were detected in addition to the scattered DIS lepton. The single hadron cross sections and multiplicities were extracted as a function of the DIS variables $x$ and $Q^2$, as well as the semi-inclusive variables $z$, which corresponds to the momentum fraction the detected hadron carries relative to the struck parton and $P_T$, which corresponds to the transverse momentum of the detected hadron relative to the virtual photon. The expected statistical precision of such measurements is extrapolated to accumulated luminosities of 10 fb$^{-1}$ and potential systematic uncertainties are approximated given the deviations between true and reconstructed yields. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 12 pages, 9 figures, to be submitted in joint ECCE proposal NIM-A volume

Report number: ecce-paper-phys-2022-09

arXiv:2207.10890 [pdf, other]

doi 10.1016/j.nima.2023.168017

ECCE Sensitivity Studies for Single Hadron Transverse Single Spin Asymmetry Measurements

Authors: R. Seidl, A. Vladimirov, D. Pitonyak, A. Prokudin, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks , et al. (260 additional authors not shown)

Abstract: We performed feasibility studies for various single transverse spin measurements that are related to the Sivers effect, transversity and the tensor charge, and the Collins fragmentation function. The processes studied include semi-inclusive deep inelastic scattering (SIDIS) where single hadrons (pions and kaons) were detected in addition to the scattered DIS lepton. The data were obtained in {\sc… ▽ More We performed feasibility studies for various single transverse spin measurements that are related to the Sivers effect, transversity and the tensor charge, and the Collins fragmentation function. The processes studied include semi-inclusive deep inelastic scattering (SIDIS) where single hadrons (pions and kaons) were detected in addition to the scattered DIS lepton. The data were obtained in {\sc pythia}6 and {\sc geant}4 simulated e+p collisions at 18 GeV on 275 GeV, 18 on 100, 10 on 100, and 5 on 41 that use the ECCE detector configuration. Typical DIS kinematics were selected, most notably $Q^2 > 1 $ GeV$^2$, and cover the $x$ range from $10^{-4}$ to $1$. The single spin asymmetries were extracted as a function of $x$ and $Q^2$, as well as the semi-inclusive variables $z$, and $P_T$. They are obtained in azimuthal moments in combinations of the azimuthal angles of the hadron transverse momentum and transverse spin of the nucleon relative to the lepton scattering plane. The initially unpolarized MonteCarlo was re-weighted in the true kinematic variables, hadron types and parton flavors based on global fits of fixed target SIDIS experiments and $e^+e^-$ annihilation data. The expected statistical precision of such measurements is extrapolated to 10 fb$^{-1}$ and potential systematic uncertainties are approximated given the deviations between true and reconstructed yields. The impact on the knowledge of the Sivers functions, transversity and tensor charges, and the Collins function has then been evaluated in the same phenomenological extractions as in the Yellow Report. The impact is found to be comparable to that obtained with the parameterized Yellow Report detector and shows that the ECCE detector configuration can fulfill the physics goals on these quantities. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 22 pages, 22 figures, to be submitted to joint ECCE proposal NIM-A volume

Report number: ecce-paper-phys-2022-08

arXiv:2207.10632 [pdf, other]

Open Heavy Flavor Studies for the ECCE Detector at the Electron Ion Collider

Authors: X. Li, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann, M. H. S. Bukhari, A. Bylinkin , et al. (262 additional authors not shown)

Abstract: The ECCE detector has been recommended as the selected reference detector for the future Electron-Ion Collider (EIC). A series of simulation studies have been carried out to validate the physics feasibility of the ECCE detector. In this paper, detailed studies of heavy flavor hadron and jet reconstruction and physics projections with the ECCE detector performance and different magnet options will… ▽ More The ECCE detector has been recommended as the selected reference detector for the future Electron-Ion Collider (EIC). A series of simulation studies have been carried out to validate the physics feasibility of the ECCE detector. In this paper, detailed studies of heavy flavor hadron and jet reconstruction and physics projections with the ECCE detector performance and different magnet options will be presented. The ECCE detector has enabled precise EIC heavy flavor hadron and jet measurements with a broad kinematic coverage. These proposed heavy flavor measurements will help systematically study the hadronization process in vacuum and nuclear medium especially in the underexplored kinematic region. △ Less

Submitted 23 July, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: Open heavy flavor studies with the EIC reference detector design by the ECCE consortium. 11 pages, 11 figures, to be submitted to the Nuclear Instruments and Methods A

Report number: LANL report number: LA-UR-22-27181

arXiv:2207.10356 [pdf, other]

doi 10.1016/j.nima.2022.167956

Exclusive J/$ψぷさい$ Detection and Physics with ECCE

Authors: X. Li, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann, M. H. S. Bukhari, A. Bylinkin , et al. (262 additional authors not shown)

Abstract: Exclusive heavy quarkonium photoproduction is one of the most popular processes in EIC, which has a large cross section and a simple final state. Due to the gluonic nature of the exchange Pomeron, this process can be related to the gluon distributions in the nucleus. The momentum transfer dependence of this process is sensitive to the interaction sites, which provides a powerful tool to probe the… ▽ More Exclusive heavy quarkonium photoproduction is one of the most popular processes in EIC, which has a large cross section and a simple final state. Due to the gluonic nature of the exchange Pomeron, this process can be related to the gluon distributions in the nucleus. The momentum transfer dependence of this process is sensitive to the interaction sites, which provides a powerful tool to probe the spatial distribution of gluons in the nucleus. Recently the problem of the origin of hadron mass has received lots of attention in determining the anomaly contribution $M_{a}$. The trace anomaly is sensitive to the gluon condensate, and exclusive production of quarkonia such as J/$ψぷさい$ and $Υうぷしろん$ can serve as a sensitive probe to constrain it. In this paper, we present the performance of the ECCE detector for exclusive J/$ψぷさい$ detection and the capability of this process to investigate the above physics opportunities with ECCE. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: 11 pages, 14 figures, 1 table

arXiv:2207.10261 [pdf, other]

doi 10.1016/j.nima.2023.168276

Search for $e\toτたう$ Charged Lepton Flavor Violation at the EIC with the ECCE Detector

Authors: J. -L. Zhang, S. Mantry, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann, M. H. S. Bukhari , et al. (262 additional authors not shown)

Abstract: The recently approved Electron-Ion Collider (EIC) will provide a unique new opportunity for searches of charged lepton flavor violation (CLFV) and other new physics scenarios. In contrast to the $e \leftrightarrow μみゅー$ CLFV transition for which very stringent limits exist, there is still a relatively large discovery space for the $e \to τたう$ CLFV transition, potentially to be explored by the EIC. With… ▽ More The recently approved Electron-Ion Collider (EIC) will provide a unique new opportunity for searches of charged lepton flavor violation (CLFV) and other new physics scenarios. In contrast to the $e \leftrightarrow μみゅー$ CLFV transition for which very stringent limits exist, there is still a relatively large discovery space for the $e \to τたう$ CLFV transition, potentially to be explored by the EIC. With the latest detector design of ECCE (EIC Comprehensive Chromodynamics Experiment) and projected integral luminosity of the EIC, we find the $τたう$-leptons created in the DIS process $ep\to τたうX$ are expected to be identified with high efficiency. A first ECCE simulation study, restricted to the 3-prong $τたう$-decay mode and with limited statistics for the Standard Model backgrounds, estimates that the EIC will be able to improve the current exclusion limit on $e\to τたう$ CLFV by an order of magnitude. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: 11 pages, 8 figures, to be submitted to NIM

arXiv:2207.09437 [pdf, other]

doi 10.1016/j.nima.2023.168464

Design and Simulated Performance of Calorimetry Systems for the ECCE Detector at the Electron Ion Collider

Authors: F. Bock, N. Schmidt, P. K. Wang, N. Santiesteban, T. Horn, J. Huang, J. Lajoie, C. Munoz Camacho, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, W. Boeglin, M. Borysova, E. Brash , et al. (263 additional authors not shown)

Abstract: We describe the design and performance the calorimeter systems used in the ECCE detector design to achieve the overall performance specifications cost-effectively with careful consideration of appropriate technical and schedule risks. The calorimeter systems consist of three electromagnetic calorimeters, covering the combined pseudorapdity range from -3.7 to 3.8 and two hadronic calorimeters. Key… ▽ More We describe the design and performance the calorimeter systems used in the ECCE detector design to achieve the overall performance specifications cost-effectively with careful consideration of appropriate technical and schedule risks. The calorimeter systems consist of three electromagnetic calorimeters, covering the combined pseudorapdity range from -3.7 to 3.8 and two hadronic calorimeters. Key calorimeter performances which include energy and position resolutions, reconstruction efficiency, and particle identification will be presented. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: 19 pages, 22 figures, 5 tables

arXiv:2205.15449 [pdf, other]

Posterior and Computational Uncertainty in Gaussian Processes

Authors: Jonathan Wenger, Geoff Pleiss, Marvin Pförtner, Philipp Hennig, John P. Cunningham

Abstract: Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are abo… ▽ More Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets. △ Less

Submitted 9 October, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: Advances in Neural Information Processing Systems (NeurIPS 2022)

arXiv:2205.09906 [pdf, other]

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Authors: Elliott Gordon-Rodriguez, Thomas P. Quinn, John P. Cunningham

Abstract: Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human mi… ▽ More Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease. In addition, our data augmentations enable us to define a novel contrastive learning model, which improves on previous representation learning approaches for microbiome compositional data. Our code is available at https://github.com/cunningham-lab/AugCoDa. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2205.08607 [pdf, other]

doi 10.1016/j.nima.2022.167859

Scientific Computing Plan for the ECCE Detector at the Electron Ion Collider

Authors: J. C. Bernauer, C. T. Dean, C. Fanelli, J. Huang, K. Kauder, D. Lawrence, J. D. Osborn, C. Paus, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, F. Bock, W. Boeglin, M. Borysova, E. Brash , et al. (256 additional authors not shown)

Abstract: The Electron Ion Collider (EIC) is the next generation of precision QCD facility to be built at Brookhaven National Laboratory in conjunction with Thomas Jefferson National Laboratory. There are a significant number of software and computing challenges that need to be overcome at the EIC. During the EIC detector proposal development period, the ECCE consortium began identifying and addressing thes… ▽ More The Electron Ion Collider (EIC) is the next generation of precision QCD facility to be built at Brookhaven National Laboratory in conjunction with Thomas Jefferson National Laboratory. There are a significant number of software and computing challenges that need to be overcome at the EIC. During the EIC detector proposal development period, the ECCE consortium began identifying and addressing these challenges in the process of producing a complete detector proposal based upon detailed detector and physics simulations. In this document, the software and computing efforts to produce this proposal are discussed; furthermore, the computing and software model and resources required for the future of ECCE are described. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Journal ref: NIMA 1047, 167859 (2023)

arXiv:2204.13290 [pdf, other]

On the Normalizing Constant of the Continuous Categorical Distribution

Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Andres Potapczynski, John P. Cunningham

Abstract: Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in c… ▽ More Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2203.11402 [pdf]

Vitamin K content of cheese, yoghurt and meat products in Australia

Authors: Eleanor Dunlop, Jette Jakobsen, Marie Bagge Jensen, Jayashree Arcot, Liang Qiao, Judy Cunningham, Lucinda J Black

Abstract: Vitamin K is vital for normal blood coagulation, and may influence bone, neurological and vascular health. Data on the vitamin K content of Australian foods are limited, preventing estimation of vitamin K intakes in the Australian population. We measured phylloquinone (PK) and menaquinone (MK) -4 to -10 in cheese, yoghurt and meat products (48 composite samples from 288 primary samples) by liquid… ▽ More Vitamin K is vital for normal blood coagulation, and may influence bone, neurological and vascular health. Data on the vitamin K content of Australian foods are limited, preventing estimation of vitamin K intakes in the Australian population. We measured phylloquinone (PK) and menaquinone (MK) -4 to -10 in cheese, yoghurt and meat products (48 composite samples from 288 primary samples) by liquid chromatography with electrospray ionisation-tandem mass spectrometry. At least one K vitamer was found in every sample. The greatest mean concentrations of PK, MK-4 and MK-9 were found in lamb liver, chicken leg meat and Cheddar cheese, respectively. Cheddar cheese and cream cheese contained MK-5. MK-8 was found in Cheddar cheese only. As the K vitamer profile and concentrations appear to vary considerably by geographical location, Australia needs a vitamin K food composition dataset that is representative of foods consumed in Australia. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 23 pages, 2 tables

arXiv:2202.06985 [pdf, other]

Deep Ensembles Work, But Are They Necessary?

Authors: Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, Richard Zemel, John P. Cunningham

Abstract: Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: nam… ▽ More Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's uncertainty quantification on out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and -- in this sense -- is not indicative of any "effective robustness". While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model. △ Less

Submitted 13 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.06797 [pdf, other]

Mapping Interstellar Dust with Gaussian Processes

Authors: Andrew C. Miller, Lauren Anderson, Boris Leistedt, John P. Cunningham, David W. Hogg, David M. Blei

Abstract: Interstellar dust corrupts nearly every stellar observation, and accounting for it is crucial to measuring physical properties of stars. We model the dust distribution as a spatially varying latent field with a Gaussian process (GP) and develop a likelihood model and inference method that scales to millions of astronomical observations. Modeling interstellar dust is complicated by two factors. The… ▽ More Interstellar dust corrupts nearly every stellar observation, and accounting for it is crucial to measuring physical properties of stars. We model the dust distribution as a spatially varying latent field with a Gaussian process (GP) and develop a likelihood model and inference method that scales to millions of astronomical observations. Modeling interstellar dust is complicated by two factors. The first is integrated observations. The data come from a vantage point on Earth and each observation is an integral of the unobserved function along our line of sight, resulting in a complex likelihood and a more difficult inference problem than in classical GP inference. The second complication is scale; stellar catalogs have millions of observations. To address these challenges we develop ziggy, a scalable approach to GP inference with integrated observations based on stochastic variational inference. We study ziggy on synthetic data and the Ananke dataset, a high-fidelity mechanistic model of the Milky Way with millions of stars. ziggy reliably infers the spatial dust map with well-calibrated posterior uncertainties. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.01694 [pdf, other]

Variational Nearest Neighbor Gaussian Process

Authors: Luhuan Wu, Geoff Pleiss, John Cunningham

Abstract: Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observati… ▽ More Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observations, thereby inducing sparse precision structure. Using the variational framework, VNNGP's objective can be factorized over both observations and inducing points, enabling stochastic optimization with a time complexity of O($K^3$). Hence, we can arbitrarily scale the inducing point size, even to the point of putting inducing points at every observed location. We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods. △ Less

Submitted 7 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2112.03638 [pdf, other]

Scaling Structured Inference with Randomization

Authors: Yao Fu, John P. Cunningham, Mirella Lapata

Abstract: Deep discrete structured models have seen considerable progress recently, but traditional inference using dynamic programming (DP) typically works with a small number of states (less than hundreds), which severely limits model capacity. At the same time, across machine learning, there is a recent trend of using randomized truncation techniques to accelerate computations involving large sums. Here,… ▽ More Deep discrete structured models have seen considerable progress recently, but traditional inference using dynamic programming (DP) typically works with a small number of states (less than hundreds), which severely limits model capacity. At the same time, across machine learning, there is a recent trend of using randomized truncation techniques to accelerate computations involving large sums. Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation: it can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique approximates the sum-product by restricting and reweighting DP on a small subset of nodes, which reduces computation by orders of magnitude. We further achieve low bias and variance via Rao-Blackwellization and importance sampling. Experiments over different graphs demonstrate the accuracy and efficiency of our approach. Furthermore, when using RDP for training a structured variational autoencoder with a scaled inference network, we achieve better test likelihood than baselines and successfully prevent posterior collapse. code at: https://github.com/FranxYao/RDP △ Less

Submitted 24 July, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: ICML 2022 camera ready

arXiv:2112.03333 [pdf, other]

doi 10.1214/22-BA1313

The Posterior Predictive Null

Authors: Gemma E. Moran, John P. Cunningham, David M. Blei

Abstract: Bayesian model criticism is an important part of the practice of Bayesian statistics. Traditionally, model criticism methods have been based on the predictive check, an adaptation of goodness-of-fit testing to Bayesian modeling and an effective method to understand how well a model captures the distribution of the data. In modern practice, however, researchers iteratively build and develop many mo… ▽ More Bayesian model criticism is an important part of the practice of Bayesian statistics. Traditionally, model criticism methods have been based on the predictive check, an adaptation of goodness-of-fit testing to Bayesian modeling and an effective method to understand how well a model captures the distribution of the data. In modern practice, however, researchers iteratively build and develop many models, exploring a space of models to help solve the problem at hand. While classical predictive checks can help assess each one, they cannot help the researcher understand how the models relate to each other. This paper introduces the posterior predictive null check (PPN), a method for Bayesian model criticism that helps characterize the relationships between models. The idea behind the PPN is to check whether data from one model's predictive distribution can pass a predictive check designed for another model. This form of criticism complements the classical predictive check by providing a comparative tool. A collection of PPNs, which we call a PPN study, can help us understand which models are equivalent and which models provide different perspectives on the data. With mixture models, we demonstrate how a PPN study, along with traditional predictive checks, can help select the number of components by the principle of parsimony. With probabilistic factor models, we demonstrate how a PPN study can help understand relationships between different classes of models, such as linear models and models based on neural networks. Finally, we analyze data from the literature on predictive checks to show how a PPN study can improve the practice of Bayesian model criticism. Code to replicate the results in this paper is available at \url{https://github.com/gemoran/ppn-code}. △ Less

Submitted 6 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: To appear in Bayesian Analysis

arXiv:2107.00243 [pdf, other]

Preconditioning for Scalable Gaussian Process Hyperparameter Optimization

Authors: Jonathan Wenger, Geoff Pleiss, Philipp Hennig, John P. Cunningham, Jacob R. Gardner

Abstract: Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for precon… ▽ More Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude. △ Less

Submitted 18 June, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: International Conference on Machine Learning (ICML)

arXiv:2106.06529 [pdf, other]

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Authors: Geoff Pleiss, John P. Cunningham

Abstract: Large width limits have been a recent focus of deep learning research: modulo computational practicalities, do wider networks outperform narrower ones? Answering this question has been challenging, as conventional networks gain representational power with width, potentially masking any negative effects. Our analysis in this paper decouples capacity and width via the generalization of neural networ… ▽ More Large width limits have been a recent focus of deep learning research: modulo computational practicalities, do wider networks outperform narrower ones? Answering this question has been challenging, as conventional networks gain representational power with width, potentially masking any negative effects. Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural nets. In doing so, we aim to understand how width affects (standard) neural networks once they have sufficient capacity for a given modeling task. Our theoretical and empirical results on Deep GP suggest that large width can be detrimental to hierarchical models. Surprisingly, we prove that even nonparametric Deep GP converge to Gaussian processes, effectively becoming shallower without any increase in representational power. The posterior, which corresponds to a mixture of data-adaptable basis functions, becomes less data-dependent with width. Our tail analysis demonstrates that width and depth have opposite effects: depth accentuates a model's non-Gaussianity, while width makes models increasingly Gaussian. We find there is a "sweet spot" that maximizes test performance before the limiting GP behavior prevents adaptability, occurring at width = 1 or width = 2 for nonparametric Deep GP. These results make strong predictions about the same phenomenon in conventional neural networks trained with L2 regularization (analogous to a Gaussian prior on parameters): we show that such neural networks may need up to 500 - 1000 hidden units for sufficient capacity - depending on the dataset - but further width degrades performance. △ Less

Submitted 8 November, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021

arXiv:2106.01413 [pdf, other]

Rectangular Flows for Manifold Learning

Authors: Anthony L. Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John P. Cunningham

Abstract: Normalizing flows are invertible neural networks with tractable change-of-volume terms, which allow optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest are typically assumed to live in some (often unknown) low-dimensional manifold embedded in a high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- t… ▽ More Normalizing flows are invertible neural networks with tractable change-of-volume terms, which allow optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest are typically assumed to live in some (often unknown) low-dimensional manifold embedded in a high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- the invertibility requirement implies high-dimensional support of the learned distribution. Injective flows, mappings from low- to high-dimensional spaces, aim to fix this discrepancy by learning distributions on manifolds, but the resulting volume-change term becomes more challenging to evaluate. Current approaches either avoid computing this term entirely using various heuristics, or assume the manifold is known beforehand and therefore are not widely applicable. Instead, we propose two methods to tractably calculate the gradient of this term with respect to the parameters of the model, relying on careful use of automatic differentiation and techniques from numerical linear algebra. Both approaches perform end-to-end nonlinear manifold learning and density estimation for data projected onto this manifold. We study the trade-offs between our proposed methods, empirically verify that we outperform approaches ignoring the volume-change term by more accurately learning manifolds and the corresponding distributions on them, and show promising results on out-of-distribution detection. Our code is available at https://github.com/layer6ai-labs/rectangular-flows. △ Less

Submitted 2 November, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 Camera Ready. Code available at https://github.com/layer6ai-labs/rectangular-flows

arXiv:2104.03902 [pdf, other]

The Autodidactic Universe

Authors: Stephon Alexander, William J. Cunningham, Jaron Lanier, Lee Smolin, Stefan Stanojevic, Michael W. Toomey, Dave Wecker

Abstract: We present an approach to cosmology in which the Universe learns its own physical laws. It does so by exploring a landscape of possible laws, which we express as a certain class of matrix models. We discover maps that put each of these matrix models in correspondence with both a gauge/gravity theory and a mathematical model of a learning machine, such as a deep recurrent, cyclic neural network. Th… ▽ More We present an approach to cosmology in which the Universe learns its own physical laws. It does so by exploring a landscape of possible laws, which we express as a certain class of matrix models. We discover maps that put each of these matrix models in correspondence with both a gauge/gravity theory and a mathematical model of a learning machine, such as a deep recurrent, cyclic neural network. This establishes a correspondence between each solution of the physical theory and a run of a neural network. This correspondence is not an equivalence, partly because gauge theories emerge from $N \rightarrow \infty $ limits of the matrix models, whereas the same limits of the neural networks used here are not well-defined. We discuss in detail what it means to say that learning takes place in autodidactic systems, where there is no supervision. We propose that if the neural network model can be said to learn without supervision, the same can be said for the corresponding physical theory. We consider other protocols for autodidactic physical systems, such as optimization of graph variety, subset-replication using self-attention and look-ahead, geometrogenesis guided by reinforcement learning, structural learning using renormalization group techniques, and extensions. These protocols together provide a number of directions in which to explore the origin of physical laws based on putting machine learning architectures in correspondence with physical theories. △ Less

Submitted 2 September, 2021; v1 submitted 28 March, 2021; originally announced April 2021.

Comments: 79 pages, 11 figures

arXiv:2104.00369 [pdf, other]

FeTaQA: Free-form Table Question Answering

Authors: Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev

Abstract: Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question an… ▽ More Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.02583 [pdf]

Simulating time to event prediction with spatiotemporal echocardiography deep learning

Authors: Rohan Shad, Nicolas Quach, Robyn Fong, Patpilai Kasinpila, Cayley Bowles, Kate M. Callon, Michelle C. Li, Jeffrey Teuteberg, John P. Cunningham, Curtis P. Langlotz, William Hiesinger

Abstract: Integrating methods for time-to-event prediction with diagnostic imaging modalities is of considerable interest, as accurate estimates of survival requires accounting for censoring of individuals within the observation period. New methods for time-to-event prediction have been developed by extending the cox-proportional hazards model with neural networks. In this paper, to explore the feasibility… ▽ More Integrating methods for time-to-event prediction with diagnostic imaging modalities is of considerable interest, as accurate estimates of survival requires accounting for censoring of individuals within the observation period. New methods for time-to-event prediction have been developed by extending the cox-proportional hazards model with neural networks. In this paper, to explore the feasibility of these methods when applied to deep learning with echocardiography videos, we utilize the Stanford EchoNet-Dynamic dataset with over 10,000 echocardiograms, and generate simulated survival datasets based on the expert annotated ejection fraction readings. By training on just the simulated survival outcomes, we show that spatiotemporal convolutional neural networks yield accurate survival estimates. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 9 pages, 5 figures

arXiv:2103.01938 [pdf]

doi 10.1038/s42256-021-00399-8

Medical Imaging and Machine Learning

Authors: Rohan Shad, John P. Cunningham, Euan A. Ashley, Curtis P. Langlotz, William Hiesinger

Abstract: Advances in computing power, deep learning architectures, and expert labelled datasets have spurred the development of medical imaging artificial intelligence systems that rival clinical experts in a variety of scenarios. The National Institutes of Health in 2018 identified key focus areas for the future of artificial intelligence in medical imaging, creating a foundational roadmap for research in… ▽ More Advances in computing power, deep learning architectures, and expert labelled datasets have spurred the development of medical imaging artificial intelligence systems that rival clinical experts in a variety of scenarios. The National Institutes of Health in 2018 identified key focus areas for the future of artificial intelligence in medical imaging, creating a foundational roadmap for research in image acquisition, algorithms, data standardization, and translatable clinical decision support systems. Among the key issues raised in the report: data availability, need for novel computing architectures and explainable AI algorithms, are still relevant despite the tremendous progress made over the past few years alone. Furthermore, translational goals of data sharing, validation of performance for regulatory approval, generalizability and mitigation of unintended bias must be accounted for early in the development process. In this perspective paper we explore challenges unique to high dimensional clinical imaging data, in addition to highlighting some of the technical and ethical considerations in developing high-dimensional, multi-modality, machine learning systems for clinical decision support. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 9 pages, 4 figures

Journal ref: Nat Mach Intell 3, 929 - 935 (2021)

arXiv:2103.00393 [pdf, other]

Hierarchical Inducing Point Gaussian Process for Inter-domain Observations

Authors: Luhuan Wu, Andrew Miller, Lauren Anderson, Geoff Pleiss, David Blei, John Cunningham

Abstract: We examine the general problem of inter-domain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not… ▽ More We examine the general problem of inter-domain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not apply to this setting. In this work, we introduce the hierarchical inducing point GP (HIP-GP), a scalable inter-domain GP inference method that enables us to improve the approximation accuracy by increasing the number of inducing points to the millions. HIP-GP, which relies on inducing points with grid structure and a stationary kernel assumption, is suitable for low-dimensional problems. In developing HIP-GP, we introduce (1) a fast whitening strategy, and (2) a novel preconditioner for conjugate gradients which can be helpful in general GP settings. Our code is available at https: //github.com/cunningham-lab/hipgp. △ Less

Submitted 24 June, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

arXiv:2103.00364 [pdf]

doi 10.1038/s41467-021-25503-9

Predicting post-operative right ventricular failure using video-based deep learning

Authors: Rohan Shad, Nicolas Quach, Robyn Fong, Patpilai Kasinpila, Cayley Bowles, Miguel Castro, Ashrith Guha, Eddie Suarez, Stefan Jovinge, Sangjin Lee, Theodore Boeve, Myriam Amsallem, Xiu Tang, Francois Haddad, Yasuhiro Shudo, Y. Joseph Woo, Jeffrey Teuteberg, John P. Cunningham, Curt P. Langlotz, William Hiesinger

Abstract: Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart fun… ▽ More Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart function. Furthermore, all modern echocardiography artificial intelligence (AI) systems are similarly limited by design - automating measurements of the same reductionist metrics rather than utilizing the wealth of data embedded within each echo study. This underutilization is most evident in situations where clinical decision making is guided by subjective assessments of disease acuity, and tools that predict disease onset within clinically actionable timeframes are unavailable. Predicting the likelihood of developing post-operative right ventricular failure (RV failure) in the setting of mechanical circulatory support is one such clinical example. To address this, we developed a novel video AI system trained to predict post-operative right ventricular failure (RV failure), using the full spatiotemporal density of information from pre-operative echocardiography scans. We achieve an AUえーゆーC of 0.729, specificity of 52% at 80% sensitivity and 46% sensitivity at 80% specificity. Furthermore, we show that our ML system significantly outperforms a team of human experts tasked with predicting RV failure on independent clinical evaluation. Finally, the methods we describe are generalizable to any cardiac clinical decision support application where treatment or patient selection is guided by qualitative echocardiography assessments. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: 12 pages, 3 figures

Journal ref: Nat Commun 12, 5192 (2021)

arXiv:2102.06695 [pdf, other]

Bias-Free Scalable Gaussian Processes via Randomized Truncations

Authors: Andres Potapczynski, Luhuan Wu, Dan Biderman, Geoff Pleiss, John P. Cunningham

Abstract: Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issu… ▽ More Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issues using randomized truncation estimators that eliminate bias in exchange for increased variance. In the case of RFF, we show that the bias-to-variance conversion is indeed a trade-off: the additional variance proves detrimental to optimization. However, in the case of CG, our unbiased learning procedure meaningfully outperforms its biased counterpart with minimal additional computation. △ Less

Submitted 28 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Journal ref: 38th International Conference on Machine Learning (ICML 2021)

arXiv:2011.05231 [pdf, other]

Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John P. Cunningham

Abstract: Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing a… ▽ More Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Showing 1–50 of 132 results for author: Cunningham, J