Search | arXiv e-print repository

Potion: Towards Poison Unlearning

Authors: Stefan Schoepf, Jack Foster, Alexandra Brintrup

Abstract: Adversarial attacks by malicious actors on machine learning systems, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. This necessitates the development of methods to remove, i.e. unlearn, poison triggers from already trained models with only a s… ▽ More Adversarial attacks by malicious actors on machine learning systems, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. This necessitates the development of methods to remove, i.e. unlearn, poison triggers from already trained models with only a subset of the poison data available. The requirements for this task significantly deviate from privacy-focused unlearning where all of the data to be forgotten by the model is known. Previous work has shown that the undiscovered poisoned samples lead to a failure of established unlearning methods, with only one method, Selective Synaptic Dampening (SSD), showing limited success. Even full retraining, after the removal of the identified poison, cannot address this challenge as the undiscovered poison samples lead to a reintroduction of the poison trigger in the model. Our work addresses two key challenges to advance the state of the art in poison unlearning. First, we introduce a novel outlier-resistant method, based on SSD, that significantly improves model protection and unlearning performance. Second, we introduce Poison Trigger Neutralisation (PTN) search, a fast, parallelisable, hyperparameter search that utilises the characteristic "unlearning versus model protection" trade-off to find suitable hyperparameters in settings where the forget set size is unknown and the retain set is contaminated. We benchmark our contributions using ResNet-9 on CIFAR10 and WideResNet-28x10 on CIFAR100. Experimental results show that our method heals 93.72% of poison compared to SSD with 83.41% and full retraining with 40.68%. We achieve this while also lowering the average model accuracy drop caused by unlearning from 5.68% (SSD) to 1.41% (ours). △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2402.19308 [pdf, other]

Loss-Free Machine Unlearning

Authors: Jack Foster, Stefan Schoepf, Alexandra Brintrup

Abstract: We present a machine unlearning approach that is both retraining- and label-free. Most existing machine unlearning approaches require a model to be fine-tuned to remove information while preserving performance. This is computationally expensive and necessitates the storage of the whole dataset for the lifetime of the model. Retraining-free approaches often utilise Fisher information, which is deri… ▽ More We present a machine unlearning approach that is both retraining- and label-free. Most existing machine unlearning approaches require a model to be fine-tuned to remove information while preserving performance. This is computationally expensive and necessitates the storage of the whole dataset for the lifetime of the model. Retraining-free approaches often utilise Fisher information, which is derived from the loss and requires labelled data which may not be available. Thus, we present an extension to the Selective Synaptic Dampening algorithm, substituting the diagonal of the Fisher information matrix for the gradient of the l2 norm of the model output to approximate sensitivity. We evaluate our method in a range of experiments using ResNet18 and Vision Transformer. Results show our label-free method is competitive with existing state-of-the-art approaches. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted as a Tiny Paper at ICLR 2024

arXiv:2402.10098 [pdf, other]

Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening

Authors: Stefan Schoepf, Jack Foster, Alexandra Brintrup

Abstract: Data entry constitutes a fundamental component of the machine learning pipeline, yet it frequently results in the introduction of labelling errors. When a model has been trained on a dataset containing such errors its performance is reduced. This leads to the challenge of efficiently unlearning the influence of the erroneous data to improve the model performance without needing to completely retra… ▽ More Data entry constitutes a fundamental component of the machine learning pipeline, yet it frequently results in the introduction of labelling errors. When a model has been trained on a dataset containing such errors its performance is reduced. This leads to the challenge of efficiently unlearning the influence of the erroneous data to improve the model performance without needing to completely retrain the model. While model editing methods exist for cases in which the correct label for a wrong entry is known, we focus on the case of data entry errors where we do not know the correct labels for the erroneous data. Our contribution is twofold. First, we introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning, making unlearning accessible to practitioners. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD), on various ResNet18 and Vision Transformer unlearning tasks. Second, we demonstrate the performance of ASSD in a supply chain delay prediction problem with labelling errors using real-world data where we randomly introduce various levels of labelling errors. The application of this approach is particularly compelling in industrial settings, such as supply chain management, where a significant portion of data entry occurs manually through Excel sheets, rendering it error-prone. ASSD shows strong performance on general unlearning benchmarks and on the error correction problem where it outperforms fine-tuning for error correction. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01401 [pdf, other]

An Information Theoretic Approach to Machine Unlearning

Authors: Jack Foster, Kyle Fogarty, Stefan Schoepf, Cengiz Öztireli, Alexandra Brintrup

Abstract: To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data g… ▽ More To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. We explore unlearning from an information theoretic perspective, connecting the influence of a sample to the information gain a model receives by observing it. From this, we derive a simple but principled zero-shot unlearning method based on the geometry of the model. Our approach takes the form of minimising the gradient of a learned function with respect to a small neighbourhood around a target forget point. This induces a smoothing effect, causing forgetting by moving the boundary of the classifier. We explore the intuition behind why this approach can jointly unlearn forget samples while preserving general model performance through a series of low-dimensional experiments. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning. △ Less

Submitted 5 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Updated, new low-dimensional experiments and updated perspective on unlearning from an information theoretic view

arXiv:2401.14183 [pdf, other]

Towards Autonomous Supply Chains: Definition, Characteristics, Conceptual Framework, and Autonomy Levels

Authors: Liming Xu, Stephen Mak, Yaniv Proselkov, Alexandra Brintrup

Abstract: Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and ac… ▽ More Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and academia over several years, ASCs lack well-established theoretical foundations. This paper addresses this research gap by presenting a formal definition of ASC along with its defining characteristics and auxiliary concepts. We propose a layered conceptual framework called the MIISI model. An illustrative case study focusing on the meat supply chain demonstrates an initial ASC implementation based on this conceptual model. Additionally, we introduce a seven-level supply chain autonomy reference model, delineating a trajectory towards achieving a full supply chain autonomy. Recognising that this work represents an initial endeavour, we emphasise the need for continued exploration in this emerging domain. We anticipate that this work will stimulate further research, both theoretical and technical, and contribute to the continual evolution of ASCs. △ Less

Submitted 13 October, 2023; originally announced January 2024.

Comments: This paper includes 20 pages and 8 figures

arXiv:2310.17485 [pdf, other]

doi 10.1016/j.trc.2023.104376

Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach

Authors: Stephen Mak, Liming Xu, Tim Pearce, Michael Ostroumov, Alexandra Brintrup

Abstract: Collaborative vehicle routing occurs when carriers collaborate through sharing their transportation requests and performing transportation requests on behalf of each other. This achieves economies of scale, thus reducing cost, greenhouse gas emissions and road congestion. But which carrier should partner with whom, and how much should each carrier be compensated? Traditional game theoretic solutio… ▽ More Collaborative vehicle routing occurs when carriers collaborate through sharing their transportation requests and performing transportation requests on behalf of each other. This achieves economies of scale, thus reducing cost, greenhouse gas emissions and road congestion. But which carrier should partner with whom, and how much should each carrier be compensated? Traditional game theoretic solution concepts are expensive to calculate as the characteristic function scales exponentially with the number of agents. This would require solving the vehicle routing problem (NP-hard) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game solved using deep multi-agent reinforcement learning, where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function; thus, when deployed in production, we only need to evaluate the expensive post-collaboration vehicle routing problem once. Our contribution is that we are the first to consider both the route allocation problem and gain sharing problem simultaneously - without access to the expensive characteristic function. Through decentralised machine learning, our agents bargain with each other and agree to outcomes that correlate well with the Shapley value - a fair profit allocation mechanism. Importantly, we are able to achieve a reduction in run-time of 88%. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Final, published version can be found here: https://www.sciencedirect.com/science/article/pii/S0968090X23003662

Journal ref: Volume 157, December 2023, 104376

arXiv:2310.17458 [pdf, other]

Coalitional Bargaining via Reinforcement Learning: An Application to Collaborative Vehicle Routing

Authors: Stephen Mak, Liming Xu, Tim Pearce, Michael Ostroumov, Alexandra Brintrup

Abstract: Collaborative Vehicle Routing is where delivery companies cooperate by sharing their delivery information and performing delivery requests on behalf of each other. This achieves economies of scale and thus reduces cost, greenhouse gas emissions, and road congestion. But which company should partner with whom, and how much should each company be compensated? Traditional game theoretic solution conc… ▽ More Collaborative Vehicle Routing is where delivery companies cooperate by sharing their delivery information and performing delivery requests on behalf of each other. This achieves economies of scale and thus reduces cost, greenhouse gas emissions, and road congestion. But which company should partner with whom, and how much should each company be compensated? Traditional game theoretic solution concepts, such as the Shapley value or nucleolus, are difficult to calculate for the real-world problem of Collaborative Vehicle Routing due to the characteristic function scaling exponentially with the number of agents. This would require solving the Vehicle Routing Problem (an NP-Hard problem) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function, and thus eliminate the need to evaluate the VRP an exponential number of times - we only need to evaluate it once. Our contribution is that our decentralised approach is both scalable and considers the self-interested nature of companies. The agents learn using a modified Independent Proximal Policy Optimisation. Our RL agents outperform a strong heuristic bot. The agents correctly identify the optimal coalitions 79% of the time with an average optimality gap of 4.2% and reduction in run-time of 62%. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2021 Workshop on Cooperative AI

arXiv:2310.09435 [pdf, other]

On Implementing Autonomous Supply Chains: a Multi-Agent System Approach

Authors: Liming Xu, Stephen Mak, Maria Minaricova, Alexandra Brintrup

Abstract: Trade restrictions, the COVID-19 pandemic, and geopolitical conflicts have significantly exposed vulnerabilities within traditional global supply chains. These events underscore the need for organisations to establish more resilient and flexible supply chains. To address these challenges, the concept of the autonomous supply chain (ASC), characterised by predictive and self-decision-making capabil… ▽ More Trade restrictions, the COVID-19 pandemic, and geopolitical conflicts have significantly exposed vulnerabilities within traditional global supply chains. These events underscore the need for organisations to establish more resilient and flexible supply chains. To address these challenges, the concept of the autonomous supply chain (ASC), characterised by predictive and self-decision-making capabilities, has recently emerged as a promising solution. However, research on ASCs is relatively limited, with no existing studies specifically focusing on their implementations. This paper aims to address this gap by presenting an implementation of ASC using a multi-agent approach. It presents a methodology for the analysis and design of such an agent-based ASC system (A2SC). This paper provides a concrete case study, the autonomous meat supply chain, which showcases the practical implementation of the A2SC system using the proposed methodology. Additionally, a system architecture and a toolkit for developing such A2SC systems are presented. Despite limitations, this work demonstrates a promising approach for implementing an effective ASC system. △ Less

Submitted 14 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: This paper includes 32 pages and 14 figures and has been accepted to Computer in Industry for publication (in process)

arXiv:2309.12781 [pdf, other]

Multi-Agent Digital Twinning for Collaborative Logistics: Framework and Implementation

Authors: Liming Xu, Stephen Mak, Stefan Schoepf, Michael Ostroumov, Alexandra Brintrup

Abstract: Collaborative logistics has been widely recognised as an effective avenue to reduce carbon emissions by enhanced truck utilisation and reduced travel distance. However, stakeholders' participation in collaborations is hindered by information-sharing barriers and the absence of integrated systems. We, thus, in this paper addresses these barriers by investigating an integrated platform that foster c… ▽ More Collaborative logistics has been widely recognised as an effective avenue to reduce carbon emissions by enhanced truck utilisation and reduced travel distance. However, stakeholders' participation in collaborations is hindered by information-sharing barriers and the absence of integrated systems. We, thus, in this paper addresses these barriers by investigating an integrated platform that foster collaboration through the integration of agents with digital twins. Specifically, we employ a multi-agent system approach to integrate stakeholders and physical mobile assets in collaborative logistics, representing them as agents. We introduce a loosely-coupled system architecture that facilitates the connection between physical and digital systems, enabling the integration of agents with digital twins. Using this architecture, we implement the platform (or testbed). The resulting testbed, comprising a physical environment and a digital replica, is a digital twin that integrates distributed entities involved in collaborative logistics. The effectiveness of the testbed is demonstrated through a carrier collaboration scenario. This paper is among the earliest few efforts to investigate the integration of agents and digital twin concepts and goes beyond the conceptual discussion of existing studies to the technical implementation of such integration. △ Less

Submitted 10 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: This paper includes 14 pages, 14 figures, and has been submitted to Elsevier for possible publication

arXiv:2309.08546 [pdf, other]

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Authors: Jack Foster, Alexandra Brintrup

Abstract: The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as th… ▽ More The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries. △ Less

Submitted 4 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.04785 [pdf, other]

Implementation of Autonomous Supply Chains for Digital Twinning: a Multi-Agent Approach

Authors: Liming Xu, Yaniv Proselkov, Stefan Schoepf, David Minarsch, Maria Minaricova, Alexandra Brintrup

Abstract: Trade disruptions, the pandemic, and the Ukraine war over the past years have adversely affected global supply chains, revealing their vulnerability. Autonomous supply chains are an emerging topic that has gained attention in industry and academia as a means of increasing their monitoring and robustness. While many theoretical frameworks exist, there is only sparse work to facilitate generalisable… ▽ More Trade disruptions, the pandemic, and the Ukraine war over the past years have adversely affected global supply chains, revealing their vulnerability. Autonomous supply chains are an emerging topic that has gained attention in industry and academia as a means of increasing their monitoring and robustness. While many theoretical frameworks exist, there is only sparse work to facilitate generalisable technical implementation. We address this gap by investigating multi-agent system approaches for implementing autonomous supply chains, presenting an autonomous economic agent-based technical framework. We illustrate this framework with a prototype, studied in a perishable food supply chain scenario, and discuss possible extensions. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: This paper includes 7 Pages, 4 Figures, and has been accepted by the IFAC World Congress 2023, 9 July - 14 July, 2023, Yokohama, Japan and will be published in IFAC-PapersOnLine

arXiv:2308.07707 [pdf, other]

Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening

Authors: Jack Foster, Stefan Schoepf, Alexandra Brintrup

Abstract: Machine unlearning, the ability for a machine learning model to forget, is becoming increasingly important to comply with data privacy regulations, as well as to remove harmful, manipulated, or outdated information. The key challenge lies in forgetting specific information while protecting model performance on the remaining data. While current state-of-the-art methods perform well, they typically… ▽ More Machine unlearning, the ability for a machine learning model to forget, is becoming increasingly important to comply with data privacy regulations, as well as to remove harmful, manipulated, or outdated information. The key challenge lies in forgetting specific information while protecting model performance on the remaining data. While current state-of-the-art methods perform well, they typically require some level of retraining over the retained data, in order to protect or restore model performance. This adds computational overhead and mandates that the training data remain available and accessible, which may not be feasible. In contrast, other methods employ a retrain-free paradigm, however, these approaches are prohibitively computationally expensive and do not perform on par with their retrain-based counterparts. We present Selective Synaptic Dampening (SSD), a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data. First, SSD uses the Fisher information matrix of the training and forgetting data to select parameters that are disproportionately important to the forget set. Second, SSD induces forgetting by dampening these parameters proportional to their relative importance to the forget set with respect to the wider training data. We evaluate our method against several existing unlearning methods in a range of experiments using ResNet18 and Vision Transformer. Results show that the performance of SSD is competitive with retrain-based post hoc methods, demonstrating the viability of retrain-free post hoc unlearning approaches. △ Less

Submitted 13 December, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Accepted as a main track paper at AAAI 2024

arXiv:2307.12157 [pdf, other]

Identifying contributors to supply chain outcomes in a multi-echelon setting: a decentralised approach

Authors: Stefan Schoepf, Jack Foster, Alexandra Brintrup

Abstract: Organisations often struggle to identify the causes of change in metrics such as product quality and delivery duration. This task becomes increasingly challenging when the cause lies outside of company borders in multi-echelon supply chains that are only partially observable. Although traditional supply chain management has advocated for data sharing to gain better insights, this does not take pla… ▽ More Organisations often struggle to identify the causes of change in metrics such as product quality and delivery duration. This task becomes increasingly challenging when the cause lies outside of company borders in multi-echelon supply chains that are only partially observable. Although traditional supply chain management has advocated for data sharing to gain better insights, this does not take place in practice due to data privacy concerns. We propose the use of explainable artificial intelligence for decentralised computing of estimated contributions to a metric of interest in a multi-stage production process. This approach mitigates the need to convince supply chain actors to share data, as all computations occur in a decentralised manner. Our method is empirically validated using data collected from a real multi-stage manufacturing process. The results demonstrate the effectiveness of our approach in detecting the source of quality variations compared to a centralised approach using Shapley additive explanations. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.12136 [pdf, other]

Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

Authors: Stefan Schoepf, Stephen Mak, Julian Senoner, Liming Xu, Netland Torbjörn, Alexandra Brintrup

Abstract: Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loa… ▽ More Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream. GitHub: https://github.com/if-loops/3L-CVRP △ Less

Submitted 11 June, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: Presented at the IJCAI 2023 Workshop on Search and Planning with Complex Objectives (WoSePCO)

arXiv:2305.11581 [pdf]

Trustworthy, responsible, ethical AI in manufacturing and supply chains: synthesis and emerging research questions

Authors: Alexandra Brintrup, George Baryannis, Ashutosh Tiwari, Svetan Ratchev, Giovanna Martinez-Arellano, Jatinder Singh

Abstract: While the increased use of AI in the manufacturing sector has been widely noted, there is little understanding on the risks that it may raise in a manufacturing organisation. Although various high level frameworks and definitions have been proposed to consolidate potential risks, practitioners struggle with understanding and implementing them. This lack of understanding exposes manufacturing to… ▽ More While the increased use of AI in the manufacturing sector has been widely noted, there is little understanding on the risks that it may raise in a manufacturing organisation. Although various high level frameworks and definitions have been proposed to consolidate potential risks, practitioners struggle with understanding and implementing them. This lack of understanding exposes manufacturing to a multitude of risks, including the organisation, its workers, as well as suppliers and clients. In this paper, we explore and interpret the applicability of responsible, ethical, and trustworthy AI within the context of manufacturing. We then use a broadened adaptation of a machine learning lifecycle to discuss, through the use of illustrative examples, how each step may result in a given AI trustworthiness concern. We additionally propose a number of research questions to the manufacturing research community, in order to help guide future research so that the economic and societal benefits envisaged by AI in manufacturing are delivered safely and responsibly. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: Pre-print under peer-review

arXiv:2211.08140 [pdf, other]

doi 10.1016/j.dsm.2023.04.001

Network science approach for identifying disruptive elements of an airline

Authors: Vinod Kumar Chauhan, Anna Ledwoch, Alexandra Brintrup, Manuel Herrera, Vaggelis Giannikas, Goran Stojkovic, Duncan Mcfarlane

Abstract: Currently, flight delays are common and they propagate from an originating flight to connecting flights, leading to large disruptions in the overall schedule. These disruptions cause massive economic losses, affect airlines' reputations, waste passengers' time and money, and directly impact the environment. This study adopts a network science approach for solving the delay propagation problem by m… ▽ More Currently, flight delays are common and they propagate from an originating flight to connecting flights, leading to large disruptions in the overall schedule. These disruptions cause massive economic losses, affect airlines' reputations, waste passengers' time and money, and directly impact the environment. This study adopts a network science approach for solving the delay propagation problem by modeling and analyzing the flight schedules and historical operational data of an airline. We aim to determine the most disruptive airports, flights, flight-connections, and connection types in an airline network. Disruptive elements are influential or critical entities in an airline network. They are the elements that can either cause (airline schedules) or have caused (historical data) the largest disturbances in the network. An airline can improve its operations by avoiding delays caused by the most disruptive elements. The proposed network science approach for disruptive element analysis was validated using a case study of an operating airline. The analysis indicates that potential disruptive elements in a schedule of an airline are also actual disruptive elements in the historical data and they should be considered to improve operations. The airline network exhibits small-world effects and delays can propagate to any part of the network with a minimum of four delayed flights. Finally, we observed that passenger connections between flights are the most disruptive connection type. Therefore, the proposed methodology provides a tool for airlines to build robust flight schedules that reduce delays and propagation. △ Less

Submitted 14 April, 2023; v1 submitted 19 October, 2022; originally announced November 2022.

Comments: accepted to Data Science and Management

arXiv:2210.11953 [pdf, other]

doi 10.1016/j.cie.2022.108928

Real-time large-scale supplier order assignments across two-tiers of a supply chain with penalty and dual-sourcing

Authors: Vinod Kumar Chauhan, Stephen Mak, Ajith Kumar Parlikad, Muhannad Alomari, Linus Casassa, Alexandra Brintrup

Abstract: Supplier selection and order allocation (SSOA) are key strategic decisions in supply chain management which greatly impact the performance of the supply chain. Although, the SSOA problem has been studied extensively but less attention paid to scalability presents a significant gap preventing adoption of SSOA algorithms by industrial practitioners. This paper presents a novel multi-item, multi-supp… ▽ More Supplier selection and order allocation (SSOA) are key strategic decisions in supply chain management which greatly impact the performance of the supply chain. Although, the SSOA problem has been studied extensively but less attention paid to scalability presents a significant gap preventing adoption of SSOA algorithms by industrial practitioners. This paper presents a novel multi-item, multi-supplier double order allocations with dual-sourcing and penalty constraints across two-tiers of a supply chain, resulting in cooperation and in facilitating supplier preferences to work with other suppliers through bidding. We propose Mixed-Integer Programming models for allocations at individual-tiers as well as an integrated allocations. An application to a real-time large-scale case study of a manufacturing company is presented, which is the largest scale studied in terms of supply chain size and number of variables so far in literature. The use case allows us to highlight how problem formulation and implementation can help reduce computational complexity using Mathematical Programming (MP) and Genetic Algorithm (GA) approaches. The results show an interesting observation that MP outperforms GA to solve SSOA. Sensitivity analysis is presented for sourcing strategy, penalty threshold and penalty factor. The developed model was successfully deployed in a large international sourcing conference with multiple bidding rounds, which helped in more than 10% procurement cost reductions to the manufacturing company. △ Less

Submitted 30 December, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: accepted at Computers & Industrial Engineering (2022)

arXiv:2210.11479 [pdf, other]

doi 10.1016/j.sca.2023.100050

Exploitation of material consolidation trade-offs in multi-tier complex supply networks

Authors: Vinod Kumar Chauhan, Muhannad Alomari, James Arney, Ajith Kumar Parlikad, Alexandra Brintrup

Abstract: While consolidation strategies form the backbone of many supply chain optimisation problems, exploitation of multi-tier material relationships through consolidation remains an understudied area, despite being a prominent feature of industries that produce complex made-to-order products. In this paper, we propose an optimisation framework for exploiting multi-to-multi relationship between tiers of… ▽ More While consolidation strategies form the backbone of many supply chain optimisation problems, exploitation of multi-tier material relationships through consolidation remains an understudied area, despite being a prominent feature of industries that produce complex made-to-order products. In this paper, we propose an optimisation framework for exploiting multi-to-multi relationship between tiers of a supply chain. The resulting formulation is flexible such that quantity discounts, inventory holding, and transport costs can be included. The framework introduces a new trade-off between tiers, leading to cost reductions in one tier but increased costs in the other, which helps to reduce the overall procurement cost in the supply chain. A mixed integer linear programming model is developed and tested with a range of small to large-scale test problems from aerospace manufacturing. Our comparison to benchmark results shows that there is indeed a cost trade-off between two tiers, and that its reduction can be achieved using a holistic approach to reconfiguration. Costs are decreased when second tier fixed ordering costs and the number of machining options increase. Consolidation results in reduced inventory holding costs in all scenarios. Several secondary effects such as simplified supplier selection may also be observed. △ Less

Submitted 19 November, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: accepted to Supply Chain Analytics

arXiv:2209.09116 [pdf, other]

Trolley optimisation: An extension of bin packing to load PCB components

Authors: Vinod Kumar Chauhan, Mark Bass, Ajith Kumar Parlikad, Alexandra Brintrup

Abstract: A trolley is a container for loading printed circuit board (PCB) components and a trolley optimisation problem (TOP) is an assignment of PCB components to trolleys for use in the production of a set of PCBs in an assembly line. In this paper, we introduce the TOP, a novel operation research application. To formulate the TOP, we derive a novel extension of the bin packing problem. We exploit the pr… ▽ More A trolley is a container for loading printed circuit board (PCB) components and a trolley optimisation problem (TOP) is an assignment of PCB components to trolleys for use in the production of a set of PCBs in an assembly line. In this paper, we introduce the TOP, a novel operation research application. To formulate the TOP, we derive a novel extension of the bin packing problem. We exploit the problem structure to decompose the TOP into two smaller, identical and independent problems. Further, we develop a mixed integer linear programming model to solve the TOP and prove that the TOP is an NP-complete problem. A case study of an aerospace manufacturing company is used to illustrate the TOP which successfully automated the manual process in the company and resulted in significant cost reductions and flexibility in the building process. △ Less

Submitted 31 October, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

arXiv:2202.12653 [pdf, other]

Bayesian autoencoders with uncertainty quantification: Towards trustworthy anomaly detection

Authors: Bang Xiang Yong, Alexandra Brintrup

Abstract: Despite numerous studies of deep autoencoders (AEs) for unsupervised anomaly detection, AEs still lack a way to express uncertainty in their predictions, crucial for ensuring safe and trustworthy machine learning systems in high-stake applications. Therefore, in this work, the formulation of Bayesian autoencoders (BAEs) is adopted to quantify the total anomaly uncertainty, comprising epistemic and… ▽ More Despite numerous studies of deep autoencoders (AEs) for unsupervised anomaly detection, AEs still lack a way to express uncertainty in their predictions, crucial for ensuring safe and trustworthy machine learning systems in high-stake applications. Therefore, in this work, the formulation of Bayesian autoencoders (BAEs) is adopted to quantify the total anomaly uncertainty, comprising epistemic and aleatoric uncertainties. To evaluate the quality of uncertainty, we consider the task of classifying anomalies with the additional option of rejecting predictions of high uncertainty. In addition, we use the accuracy-rejection curve and propose the weighted average accuracy as a performance metric. Our experiments demonstrate the effectiveness of the BAE and total anomaly uncertainty on a set of benchmark datasets and two real datasets for manufacturing: one for condition monitoring, the other for quality inspection. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.12637 [pdf, other]

Do autoencoders need a bottleneck for anomaly detection?

Authors: Bang Xiang Yong, Alexandra Brintrup

Abstract: A common belief in designing deep autoencoders (AEs), a type of unsupervised neural network, is that a bottleneck is required to prevent learning the identity function. Learning the identity function renders the AEs useless for anomaly detection. In this work, we challenge this limiting belief and investigate the value of non-bottlenecked AEs. The bottleneck can be removed in two ways: (1) overp… ▽ More A common belief in designing deep autoencoders (AEs), a type of unsupervised neural network, is that a bottleneck is required to prevent learning the identity function. Learning the identity function renders the AEs useless for anomaly detection. In this work, we challenge this limiting belief and investigate the value of non-bottlenecked AEs. The bottleneck can be removed in two ways: (1) overparameterising the latent layer, and (2) introducing skip connections. However, limited works have reported on the use of one of the ways. For the first time, we carry out extensive experiments covering various combinations of bottleneck removal schemes, types of AEs and datasets. In addition, we propose the infinitely-wide AEs as an extreme example of non-bottlenecked AEs. Their improvement over the baseline implies learning the identity function is not trivial as previously assumed. Moreover, we find that non-bottlenecked architectures (highest AUROC=0.857) can outperform their bottlenecked counterparts (highest AUROC=0.696) on the popular task of CIFAR (inliers) vs SVHN (anomalies), among other tasks, shedding light on the potential of developing non-bottlenecked AEs for improving anomaly detection. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2110.10038 [pdf, other]

Coalitional Bayesian Autoencoders -- Towards explainable unsupervised deep learning

Authors: Bang Xiang Yong, Alexandra Brintrup

Abstract: This paper aims to improve the explainability of Autoencoder's (AE) predictions by proposing two explanation methods based on the mean and epistemic uncertainty of log-likelihood estimate, which naturally arise from the probabilistic formulation of the AE called Bayesian Autoencoders (BAE). To quantitatively evaluate the performance of explanation methods, we test them in sensor network applicatio… ▽ More This paper aims to improve the explainability of Autoencoder's (AE) predictions by proposing two explanation methods based on the mean and epistemic uncertainty of log-likelihood estimate, which naturally arise from the probabilistic formulation of the AE called Bayesian Autoencoders (BAE). To quantitatively evaluate the performance of explanation methods, we test them in sensor network applications, and propose three metrics based on covariate shift of sensors : (1) G-mean of Spearman drift coefficients, (2) G-mean of sensitivity-specificity of explanation ranking and (3) sensor explanation quality index (SEQI) which combines the two aforementioned metrics. Surprisingly, we find that explanations of BAE's predictions suffer from high correlation resulting in misleading explanations. To alleviate this, a "Coalitional BAE" is proposed, which is inspired by agent-based system theory. Our comprehensive experiments on publicly available condition monitoring datasets demonstrate the improved quality of explanations using the Coalitional BAE. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: Preprint submitted to Journal of Applied Soft Computing

arXiv:2109.01703 [pdf]

doi 10.1016/j.ijpe.2021.108279

Will bots take over the supply chain? Revisiting Agent-based supply chain automation

Authors: Liming Xu, Stephen Mak, Alexandra Brintrup

Abstract: Agent-based systems have the capability to fuse information from many distributed sources and create better plans faster. This feature makes agent-based systems naturally suitable to address the challenges in Supply Chain Management (SCM). Although agent-based supply chains systems have been proposed since early 2000; industrial uptake of them has been lagging. The reasons quoted include the immat… ▽ More Agent-based systems have the capability to fuse information from many distributed sources and create better plans faster. This feature makes agent-based systems naturally suitable to address the challenges in Supply Chain Management (SCM). Although agent-based supply chains systems have been proposed since early 2000; industrial uptake of them has been lagging. The reasons quoted include the immaturity of the technology, a lack of interoperability with supply chain information systems, and a lack of trust in Artificial Intelligence (AI). In this paper, we revisit the agent-based supply chain and review the state of the art. We find that agent-based technology has matured, and other supporting technologies that are penetrating supply chains; are filling in gaps, leaving the concept applicable to a wider range of functions. For example, the ubiquity of IoT technology helps agents "sense" the state of affairs in a supply chain and opens up new possibilities for automation. Digital ledgers help securely transfer data between third parties, making agent-based information sharing possible, without the need to integrate Enterprise Resource Planning (ERP) systems. Learning functionality in agents enables agents to move beyond automation and towards autonomy. We note this convergence effect through conceptualising an agent-based supply chain framework, reviewing its components, and highlighting research challenges that need to be addressed in moving forward. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 38 pages, 5 figures

Journal ref: International Journal of Production Economics, Volume 241, 2021

arXiv:2107.13304 [pdf, other]

Bayesian Autoencoders: Analysing and Fixing the Bernoulli likelihood for Out-of-Distribution Detection

Authors: Bang Xiang Yong, Tim Pearce, Alexandra Brintrup

Abstract: After an autoencoder (AE) has learnt to reconstruct one dataset, it might be expected that the likelihood on an out-of-distribution (OOD) input would be low. This has been studied as an approach to detect OOD inputs. Recent work showed this intuitive approach can fail for the dataset pairs FashionMNIST vs MNIST. This paper suggests this is due to the use of Bernoulli likelihood and analyses why th… ▽ More After an autoencoder (AE) has learnt to reconstruct one dataset, it might be expected that the likelihood on an out-of-distribution (OOD) input would be low. This has been studied as an approach to detect OOD inputs. Recent work showed this intuitive approach can fail for the dataset pairs FashionMNIST vs MNIST. This paper suggests this is due to the use of Bernoulli likelihood and analyses why this is the case, proposing two fixes: 1) Compute the uncertainty of likelihood estimate by using a Bayesian version of the AE. 2) Use alternative distributions to model the likelihood. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: Presented at the ICML 2020 Workshop on Uncertainty and Ro-bustness in Deep Learning

arXiv:2107.13252 [pdf, other]

doi 10.17863/CAM.51696

Multi Agent System for Machine Learning Under Uncertainty in Cyber Physical Manufacturing System

Authors: Bang Xiang Yong, Alexandra Brintrup

Abstract: Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. While accuracy is important, focusing primarily on it poses an overfitting danger, exposing manufacturers to risk, ultimately hindering the adoption of these techniques. In… ▽ More Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. While accuracy is important, focusing primarily on it poses an overfitting danger, exposing manufacturers to risk, ultimately hindering the adoption of these techniques. In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty in a cyber-physical manufacturing system (CPMS) scenario. Then, we propose a multi-agent system architecture which leverages probabilistic machine learning as a means of achieving such criteria. We propose possible scenarios for which our proposed architecture is useful and discuss future work. Experimentally, we implement Bayesian Neural Networks for multi-tasks classification on a public dataset for the real-time condition monitoring of a hydraulic system and demonstrate the usefulness of the system by evaluating the probability of a prediction being accurate given its uncertainty. We deploy these models using our proposed agent-based framework and integrate web visualisation to demonstrate its real-time feasibility. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: International Workshop on Service Orientation in Holonic and Multi-Agent Manufacturing

arXiv:2107.13249 [pdf, other]

doi 10.1109/MetroInd4.0IoT48571.2020.9138306

Bayesian Autoencoders for Drift Detection in Industrial Environments

Authors: Bang Xiang Yong, Yasmin Fathy, Alexandra Brintrup

Abstract: Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments. A typical use includes training a predictive model with data from sensors operating under normal conditions and using the model to detect anomalies. Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift); however, the us… ▽ More Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments. A typical use includes training a predictive model with data from sensors operating under normal conditions and using the model to detect anomalies. Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift); however, the use of Autoencoders to distinguish between different anomalies has not yet been considered. To this end, we first propose the development of Bayesian Autoencoders to quantify epistemic and aleatoric uncertainties. We then test the Bayesian Autoencoder using a real-world industrial dataset for hydraulic condition monitoring. The system is injected with noise and drifts, and we have found the epistemic uncertainty to be less sensitive to sensor perturbations as compared to the reconstruction loss. By observing the reconstructed signals with the uncertainties, we gain interpretable insights, and these uncertainties offer a potential avenue for distinguishing real and virtual drifts. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: Published in 2020 IEEE International Workshop on Metrology for Industry 4.0 & IoT

arXiv:2107.10609 [pdf, other]

Data Considerations in Graph Representation Learning for Supply Chain Networks

Authors: Ajmal Aziz, Edward Elson Kosasih, Ryan-Rhys Griffiths, Alexandra Brintrup

Abstract: Supply chain network data is a valuable asset for businesses wishing to understand their ethical profile, security of supply, and efficiency. Possession of a dataset alone however is not a sufficient enabler of actionable decisions due to incomplete information. In this paper, we present a graph representation learning approach to uncover hidden dependency links that focal companies may not be awa… ▽ More Supply chain network data is a valuable asset for businesses wishing to understand their ethical profile, security of supply, and efficiency. Possession of a dataset alone however is not a sufficient enabler of actionable decisions due to incomplete information. In this paper, we present a graph representation learning approach to uncover hidden dependency links that focal companies may not be aware of. To the best of our knowledge, our work is the first to represent a supply chain as a heterogeneous knowledge graph with learnable embeddings. We demonstrate that our representation facilitates state-of-the-art performance on link prediction of a global automotive supply chain network using a relational graph convolutional network. It is anticipated that our method will be directly applicable to businesses wishing to sever links with nefarious entities and mitigate risk of supply failure. More abstractly, it is anticipated that our method will be useful to inform representation learning of supply chain networks for downstream tasks beyond link prediction. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: ICML 2021 Workshop on Machine Learning for Data

arXiv:2107.00913 [pdf, other]

Reinforcement Learning Provides a Flexible Approach for Realistic Supply Chain Safety Stock Optimisation

Authors: Edward Elson Kosasih, Alexandra Brintrup

Abstract: Although safety stock optimisation has been studied for more than 60 years, most companies still use simplistic means to calculate necessary safety stock levels, partly due to the mismatch between existing analytical methods' emphases on deriving provably optimal solutions and companies' preferences to sacrifice optimal results in favour of more realistic problem settings. A newly emerging method… ▽ More Although safety stock optimisation has been studied for more than 60 years, most companies still use simplistic means to calculate necessary safety stock levels, partly due to the mismatch between existing analytical methods' emphases on deriving provably optimal solutions and companies' preferences to sacrifice optimal results in favour of more realistic problem settings. A newly emerging method from the field of Artificial Intelligence (AI), namely Reinforcement Learning (RL), offers promise in finding optimal solutions while accommodating more realistic problem features. Unlike analytical-based models, RL treats the problem as a black-box simulation environment mitigating against the problem of oversimplifying reality. As such, assumptions on stock keeping policy can be relaxed and a higher number of problem variables can be accommodated. While RL has been popular in other domains, its applications in safety stock optimisation remain scarce. In this paper, we investigate three RL methods, namely, Q-Learning, Temporal Difference Advantage Actor-Critic and Multi-agent Temporal Difference Advantage Actor-Critic for optimising safety stock in a linear chain of independent agents. We find that RL can simultaneously optimise both safety stock level and order quantity parameters of an inventory policy, unlike classical safety stock optimisation models where only safety stock level is optimised while order quantity is predetermined based on simple rules. This allows RL to model more complex supply chain procurement behaviour. However, RL takes longer time to arrive at solutions, necessitating future research on identifying and improving trade-offs between the use of AI and mathematical models are needed. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 12 pages; 6 figures

arXiv:2106.04972 [pdf, other]

Understanding Softmax Confidence and Uncertainty

Authors: Tim Pearce, Alexandra Brintrup, Jun Zhu

Abstract: It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this, e.g., out-of-distribution (OOD) detection. This paper investigates this contradiction, identifying two implicit biases that do encourage… ▽ More It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this, e.g., out-of-distribution (OOD) detection. This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with epistemic uncertainty: 1) Approximately optimal decision boundary structure, and 2) Filtering effects of deep networks. It describes why low-dimensional intuitions about softmax confidence are misleading. Diagnostic experiments quantify reasons softmax confidence can fail, finding that extrapolations are less to blame than overlap between training and OOD data in final-layer representations. Pre-trained/fine-tuned networks reduce this overlap. △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2011.02833 [pdf, other]

Digital Twins: State of the Art Theory and Practice, Challenges, and Open Research Questions

Authors: Angira Sharma, Edward Kosasih, Jie Zhang, Alexandra Brintrup, Anisoara Calinescu

Abstract: Digital Twin was introduced over a decade ago, as an innovative all-encompassing tool, with perceived benefits including real-time monitoring, simulation and forecasting. However, the theoretical framework and practical implementations of digital twins (DT) are still far from this vision. Although successful implementations exist, sufficient implementation details are not publicly available, there… ▽ More Digital Twin was introduced over a decade ago, as an innovative all-encompassing tool, with perceived benefits including real-time monitoring, simulation and forecasting. However, the theoretical framework and practical implementations of digital twins (DT) are still far from this vision. Although successful implementations exist, sufficient implementation details are not publicly available, therefore it is difficult to assess their effectiveness, draw comparisons and jointly advance the DT methodology. This work explores the various DT features and current approaches, the shortcomings and reasons behind the delay in the implementation and adoption of digital twin. Advancements in machine learning, internet of things and big data have contributed hugely to the improvements in DT with regards to its real-time monitoring and forecasting properties. Despite this progress and individual company-based efforts, certain research gaps exist in the field, which have caused delay in the widespread adoption of this concept. We reviewed relevant works and identified that the major reasons for this delay are the lack of a universal reference framework, domain dependence, security concerns of shared data, reliance of digital twin on other technologies, and lack of quantitative metrics. We define the necessary components of a digital twin required for a universal reference framework, which also validate its uniqueness as a concept compared to similar concepts like simulation, autonomous systems, etc. This work further assesses the digital twin applications in different domains and the current state of machine learning and big data in it. It thus answers and identifies novel research questions, both of which will help to better understand and advance the theory and practice of digital twins. △ Less

Submitted 4 December, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

arXiv:2007.14235 [pdf, other]

Structured Weight Priors for Convolutional Neural Networks

Authors: Tim Pearce, Andrew Y. K. Foong, Alexandra Brintrup

Abstract: Selection of an architectural prior well suited to a task (e.g. convolutions for image data) is crucial to the success of deep neural networks (NNs). Conversely, the weight priors within these architectures are typically left vague, e.g.~independent Gaussian distributions, which has led to debate over the utility of Bayesian deep learning. This paper explores the benefits of adding structure to we… ▽ More Selection of an architectural prior well suited to a task (e.g. convolutions for image data) is crucial to the success of deep neural networks (NNs). Conversely, the weight priors within these architectures are typically left vague, e.g.~independent Gaussian distributions, which has led to debate over the utility of Bayesian deep learning. This paper explores the benefits of adding structure to weight priors. It initially considers first-layer filters of a convolutional NN, designing a prior based on random Gabor filters. Second, it considers adding structure to the prior of final-layer weights by estimating how each hidden feature relates to each class. Empirical results suggest that these structured weight priors lead to more meaningful functional priors for image data. This contributes to the ongoing discussion on the importance of weight priors. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

arXiv:1905.06076 [pdf, other]

Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions

Authors: Tim Pearce, Russell Tsuchida, Mohamed Zaki, Alexandra Brintrup, Andy Neely

Abstract: A simple, flexible approach to creating expressive priors in Gaussian process (GP) models makes new kernels from a combination of basic kernels, e.g. summing a periodic and linear kernel can capture seasonal variation with a long term trend. Despite a well-studied link between GPs and Bayesian neural networks (BNNs), the BNN analogue of this has not yet been explored. This paper derives BNN archit… ▽ More A simple, flexible approach to creating expressive priors in Gaussian process (GP) models makes new kernels from a combination of basic kernels, e.g. summing a periodic and linear kernel can capture seasonal variation with a long term trend. Despite a well-studied link between GPs and Bayesian neural networks (BNNs), the BNN analogue of this has not yet been explored. This paper derives BNN architectures mirroring such kernel combinations. Furthermore, it shows how BNNs can produce periodic kernels, which are often useful in this context. These ideas provide a principled approach to designing BNNs that incorporate prior knowledge about a function. We showcase the practical value of these ideas with illustrative experiments in supervised and reinforcement learning settings. △ Less

Submitted 28 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Journal ref: The 35th Conference on Uncertainty in Artificial Intelligence (UAI 2019)

arXiv:1810.05546 [pdf, other]

Uncertainty in Neural Networks: Approximately Bayesian Ensembling

Authors: Tim Pearce, Felix Leibfried, Alexandra Brintrup, Mohamed Zaki, Andy Neely

Abstract: Understanding the uncertainty of a neural network's (NN) predictions is essential for many purposes. The Bayesian framework provides a principled approach to this, however applying it to NNs is challenging due to large numbers of parameters and data. Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesi… ▽ More Understanding the uncertainty of a neural network's (NN) predictions is essential for many purposes. The Bayesian framework provides a principled approach to this, however applying it to NNs is challenging due to large numbers of parameters and data. Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian. This work proposes one modification to the usual process that we argue does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior. A theoretical analysis of the procedure in a simplified setting suggests the recovered posterior is centred correctly but tends to have an underestimated marginal variance, and overestimated correlation. However, two conditions can lead to exact recovery. We argue that these conditions are partially present in NNs. Empirical evaluations demonstrate it has an advantage over standard ensembling, and is competitive with variational methods. △ Less

Submitted 26 February, 2020; v1 submitted 12 October, 2018; originally announced October 2018.

Comments: Please cite as published in AISTATS 2020

Journal ref: The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020

Showing 1–33 of 33 results for author: Brintrup, A