Search | arXiv e-print repository

Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Authors: Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga, Alberto Termine, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich

Abstract: Causal opacity denotes the difficulty in understanding the "hidden" causal structure underlying a deep neural network's (DNN) reasoning. This leads to the inability to rely on and verify state-of-the-art DNN-based systems especially in high-stakes scenarios. For this reason, causal opacity represents a key open challenge at the intersection of deep learning, interpretability, and causality. This w… ▽ More Causal opacity denotes the difficulty in understanding the "hidden" causal structure underlying a deep neural network's (DNN) reasoning. This leads to the inability to rely on and verify state-of-the-art DNN-based systems especially in high-stakes scenarios. For this reason, causal opacity represents a key open challenge at the intersection of deep learning, interpretability, and causality. This work addresses this gap by introducing Causal Concept Embedding Models (Causal CEMs), a class of interpretable models whose decision-making process is causally transparent by design. The results of our experiments show that Causal CEMs can: (i) match the generalization performance of causally-opaque models, (ii) support the analysis of interventional and counterfactual scenarios, thereby improving the model's causal interpretability and supporting the effective verification of its reliability and fairness, and (iii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also accuracy of the explanation provided for a specific instance. △ Less

Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.09521 [pdf, other]

Towards a fully declarative neuro-symbolic language

Authors: Tilman Hinnerichs, Robin Manhaeve, Giuseppe Marra, Sebastijan Dumancic

Abstract: Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicat… ▽ More Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicates, which hence extends to fully declarative NeSy frameworks. We first show that the declarative extension preserves the learning and reasoning capabilities while being able to answer arbitrary queries while only being trained on a single query type. △ Less

Submitted 1 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.04127 [pdf, other]

doi 10.14722/spacesec.2024.23033

On the Feasibility of CubeSats Application Sandboxing for Space Missions

Authors: Gabriele Marra, Ulysse Planta, Philipp Wüstenberg, Ali Abbasi

Abstract: This paper details our journey in designing and selecting a suitable application sandboxing mechanism for a satellite under development, with a focus on small satellites. Central to our study is the development of selection criteria for sandboxing and assessing its appropriateness for our satellite payload. We also test our approach on two already operational satellites, Suchai and SALSAT, to vali… ▽ More This paper details our journey in designing and selecting a suitable application sandboxing mechanism for a satellite under development, with a focus on small satellites. Central to our study is the development of selection criteria for sandboxing and assessing its appropriateness for our satellite payload. We also test our approach on two already operational satellites, Suchai and SALSAT, to validate its effectiveness. These experiments highlight the practicality and efficiency of our chosen sandboxing method for real-world space systems. Our results provide insights and highlight the challenges involved in integrating application sandboxing in the space sector. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures, accepted to SpaceSec Workshop 2024 and to be published as post-conference proceedings with NDSS 2024

arXiv:2403.18756 [pdf]

Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray

Authors: Guglielmo Gallone, Francesco Iodice, Alberto Presta, Davide Tore, Ovidio de Filippo, Michele Visciano, Carlo Alberto Barbano, Alessandro Serafini, Paola Gorrini, Alessandro Bruno, Walter Grosso Marra, James Hughes, Mario Iannaccone, Paolo Fonio, Attilio Fiandrotti, Alessandro Depaoli, Marco Grangetto, Gaetano Maria de Ferrari, Fabrizio D'Ascenzo

Abstract: Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) wi… ▽ More Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) with available paired chest x-ray and chest computed tomography (CT) indicated for any clinical reason and performed within 3 months. The CAC score calculated on chest CT was used as ground truth. The model was validated on an temporally-independent cohort of 90 patients from the same institution (external validation). The diagnostic accuracy of the AI-CAC model assessed by the area under the curve (AUえーゆーC) was the primary outcome. Overall, median AI-CAC score was 35 (0-388) and 28.9% patients had no AI-CAC. AUC of the AI-CAC model to identify a CAC>0 was 0.90 in the internal validation cohort and 0.77 in the external validation cohort. Sensitivity was consistently above 92% in both cohorts. In the overall cohort (n=540), among patients with AI-CAC=0, a single ASCVD event occurred, after 4.3 years. Patients with AI-CAC>0 had significantly higher Kaplan Meier estimates for ASCVD events (13.5% vs. 3.4%, log-rank=0.013). Conclusion. The AI-CAC model seems to accurately detect subclinical atherosclerosis on chest x-ray with elevated sensitivity, and to predict ASCVD events with elevated negative predictive value. Adoption of the AI-CAC model to refine CV risk stratification or as an opportunistic screening tool requires prospective evaluation. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Submitted to European Heart Journal - Cardiovascular Imaging Added also the additional material 44 pages (30 main paper, 14 additional material), 14 figures (5 main manuscript, 9 additional material)

arXiv:2402.01408 [pdf, other]

Climbing the Ladder of Interpretability with Counterfactual Concept Bottleneck Models

Authors: Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich

Abstract: Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), explain task predictions (the "Why?"), and imagine alternative scenarios that could result in different predictions (the "What if?"). The inability to answer these questions represents a crucial gap in deploying reliable AI ag… ▽ More Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), explain task predictions (the "Why?"), and imagine alternative scenarios that could result in different predictions (the "What if?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and deepening human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our results show that CF-CBMs produce: accurate predictions (the "What?"), simple explanations for task predictions (the "Why?"), and interpretable counterfactuals (the "What if?"). CF-CBMs can also sample or estimate the most probable counterfactual to: (i) explain the effect of concept interventions on tasks, (ii) show users how to get a desired class label, and (iii) propose concept interventions via "task-driven" interventions. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2308.11991 [pdf, other]

Relational Concept Based Models

Authors: Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, Giuseppe Marra

Abstract: The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods pro… ▽ More The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2304.14068 [pdf, other]

doi 10.5555/3618408.3618484

Interpretable Neural-Symbolic Concept Reasoning

Authors: Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Charlotte Magister, Alberto Tonda, Pietro Lio', Frederic Precioso, Mateja Jamnik, Giuseppe Marra

Abstract: Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the… ▽ More Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. In DCR, neural networks do not make task predictions directly, but they build syntactic rule structures using concept embeddings. DCR then executes these rules on meaningful concept truth degrees to provide a final interpretable and semantically-consistent prediction in a differentiable manner. Our experiments show that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance. △ Less

Submitted 22 May, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1801-1825, 2023

arXiv:2303.13566 [pdf, other]

Enhancing Embedding Representations of Biomedical Data using Logic Knowledge

Authors: Michelangelo Diligenti, Francesco Giannini, Stefano Fioravanti, Caterina Graziani, Moreno Falaschi, Giuseppe Marra

Abstract: Knowledge Graph Embeddings (KGE) have become a quite popular class of models specifically devised to deal with ontologies and graph structure data, as they can implicitly encode statistical dependencies between entities and relations in a latent space. KGE techniques are particularly effective for the biomedical domain, where it is quite common to deal with large knowledge graphs underlying comple… ▽ More Knowledge Graph Embeddings (KGE) have become a quite popular class of models specifically devised to deal with ontologies and graph structure data, as they can implicitly encode statistical dependencies between entities and relations in a latent space. KGE techniques are particularly effective for the biomedical domain, where it is quite common to deal with large knowledge graphs underlying complex interactions between biological and chemical objects. Recently in the literature, the PharmKG dataset has been proposed as one of the most challenging knowledge graph biomedical benchmark, with hundreds of thousands of relational facts between genes, diseases and chemicals. Despite KGEs can scale to very large relational domains, they generally fail at representing more complex relational dependencies between facts, like logic rules, which may be fundamental in complex experimental settings. In this paper, we exploit logic rules to enhance the embedding representations of KGEs on the PharmKG dataset. To this end, we adopt Relational Reasoning Network (R2N), a recently proposed neural-symbolic approach showing promising results on knowledge graph completion tasks. An R2N uses the available logic rules to build a neural architecture that reasons over KGE latent representations. In the experiments, we show that our approach is able to significantly improve the current state-of-the-art on the PharmKG dataset. Finally, we provide an ablation study to experimentally compare the effect of alternative sets of rules according to different selection criteria and varying the number of considered rules. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.04660 [pdf, other]

Neural Probabilistic Logic Programming in Discrete-Continuous Domains

Authors: Lennert De Smet, Pedro Zuidberg Dos Martires, Robin Manhaeve, Giuseppe Marra, Angelika Kimmig, Luc De Raedt

Abstract: Neural-symbolic AI (NeSy) allows neural networks to exploit symbolic background knowledge in the form of logic. It has been shown to aid learning in the limited data regime and to facilitate inference on out-of-distribution data. Probabilistic NeSy focuses on integrating neural networks with both logic and probability theory, which additionally allows learning under uncertainty. A major limitation… ▽ More Neural-symbolic AI (NeSy) allows neural networks to exploit symbolic background knowledge in the form of logic. It has been shown to aid learning in the limited data regime and to facilitate inference on out-of-distribution data. Probabilistic NeSy focuses on integrating neural networks with both logic and probability theory, which additionally allows learning under uncertainty. A major limitation of current probabilistic NeSy systems, such as DeepProbLog, is their restriction to finite probability distributions, i.e., discrete random variables. In contrast, deep probabilistic programming (DPP) excels in modelling and optimising continuous probability distributions. Hence, we introduce DeepSeaProbLog, a neural probabilistic logic programming language that incorporates DPP techniques into NeSy. Doing so results in the support of inference and learning of both discrete and continuous probability distributions under logical constraints. Our main contributions are 1) the semantics of DeepSeaProbLog and its corresponding inference algorithm, 2) a proven asymptotically unbiased learning algorithm, and 3) a series of experiments that illustrate the versatility of our approach. △ Less

Submitted 14 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 27 pages, 9 figures

ACM Class: D.3.1; I.2.4; I.2.6

arXiv:2303.03226 [pdf, other]

Safe Reinforcement Learning via Probabilistic Logic Shields

Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt

Abstract: Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Grad… ▽ More Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2209.09056 [pdf, other]

Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off

Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lio, Mateja Jamnik

Abstract: Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to improve the model's performance. However, existing c… ▽ More Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to improve the model's performance. However, existing concept bottleneck models are unable to find optimal compromises between high task accuracy, robust concept-based explanations, and effective interventions on concepts -- particularly in real-world conditions where complete and accurate concept supervisions are scarce. To address this, we propose Concept Embedding Models, a novel family of concept bottleneck models which goes beyond the current accuracy-vs-interpretability trade-off by learning interpretable high-dimensional concept representations. Our experiments demonstrate that Concept Embedding Models (1) attain better or competitive task accuracy w.r.t. standard neural models without concepts, (2) provide concept representations capturing meaningful semantics including and beyond their ground truth labels, (3) support test-time concept interventions whose effect in test accuracy surpasses that in standard concept bottleneck models, and (4) scale to real-world conditions where complete concept supervisions are scarce. △ Less

Submitted 5 December, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: To appear at NeurIPS 2022

Report number: 35 MSC Class: 68T07 ACM Class: I.2.6

Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/867c06823281e506e8059f5c13a57f75-Abstract-Conference.html

arXiv:2202.04178 [pdf, other]

VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming

Authors: Eleonora Misino, Giuseppe Marra, Emanuele Sansone

Abstract: We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once… ▽ More We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once trained, VAEL can solve new unseen generation tasks by (i) leveraging the previously acquired knowledge encoded in the neural component and (ii) exploiting new logical programs on the structured latent space. Our experiments provide support on the benefits of this neuro-symbolic integration both in terms of task generalization and data efficiency. To the best of our knowledge, this work is the first to propose a general-purpose end-to-end framework integrating probabilistic logic programming into a deep generative model. △ Less

Submitted 25 May, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2108.11451 [pdf, other]

From Statistical Relational to Neurosymbolic Artificial Intelligence: a Survey

Authors: Giuseppe Marra, Sebastijan Dumančić, Robin Manhaeve, Luc De Raedt

Abstract: This survey explores the integration of learning and reasoning in two different fields of artificial intelligence: neurosymbolic and statistical relational artificial intelligence. Neurosymbolic artificial intelligence (NeSy) studies the integration of symbolic reasoning and neural networks, while statistical relational artificial intelligence (StarAI) focuses on integrating logic with probabilist… ▽ More This survey explores the integration of learning and reasoning in two different fields of artificial intelligence: neurosymbolic and statistical relational artificial intelligence. Neurosymbolic artificial intelligence (NeSy) studies the integration of symbolic reasoning and neural networks, while statistical relational artificial intelligence (StarAI) focuses on integrating logic with probabilistic graphical models. This survey identifies seven shared dimensions between these two subfields of AI. These dimensions can be used to characterize different NeSy and StarAI systems. They are concerned with (1) the approach to logical inference, whether model or proof-based; (2) the syntax of the used logical theories; (3) the logical semantics of the systems and their extensions to facilitate learning; (4) the scope of learning, encompassing either parameter or structure learning; (5) the presence of symbolic and subsymbolic representations; (6) the degree to which systems capture the original logic, probabilistic, and neural paradigms; and (7) the classes of learning tasks the systems are applied to. By positioning various NeSy and StarAI systems along these dimensions and pointing out similarities and differences between them, this survey contributes fundamental concepts for understanding the integration of learning and reasoning. △ Less

Submitted 2 January, 2024; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: To appear in Artificial Intelligence. Shorter version at IJCAI 2020 survey track, https://www.ijcai.org/proceedings/2020/0688.pdf

arXiv:2106.12574 [pdf, other]

DeepStochLog: Neural Stochastic Logic Programming

Authors: Thomas Winters, Giuseppe Marra, Robin Manhaeve, Luc De Raedt

Abstract: Recent advances in neural symbolic learning, such as DeepProbLog, extend probabilistic logic programs with neural predicates. Like graphical models, these probabilistic logic programs define a probability distribution over possible worlds, for which inference is computationally hard. We propose DeepStochLog, an alternative neural symbolic framework based on stochastic definite clause grammars, a t… ▽ More Recent advances in neural symbolic learning, such as DeepProbLog, extend probabilistic logic programs with neural predicates. Like graphical models, these probabilistic logic programs define a probability distribution over possible worlds, for which inference is computationally hard. We propose DeepStochLog, an alternative neural symbolic framework based on stochastic definite clause grammars, a type of stochastic logic program, which defines a probability distribution over possible derivations. More specifically, we introduce neural grammar rules into stochastic definite clause grammars to create a framework that can be trained end-to-end. We show that inference and learning in neural stochastic logic programming scale much better than for neural probabilistic logic programs. Furthermore, the experimental evaluation shows that DeepStochLog achieves state-of-the-art results on challenging neural symbolic learning tasks. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: Thomas Winters and Giuseppe Marra contributed equally to this work

MSC Class: 68T27; 68T37; 68T07 ACM Class: I.2.6; I.2.5; I.2.3

arXiv:2106.00393 [pdf, other]

Relational Reasoning Networks

Authors: Giuseppe Marra, Michelangelo Diligenti, Francesco Giannini

Abstract: Neuro-symbolic methods integrate neural architectures, knowledge representation and reasoning. However, they have been struggling at both dealing with the intrinsic uncertainty of the observations and scaling to real-world applications. This paper presents Relational Reasoning Networks (R2N), a novel end-to-end model that performs relational reasoning in the latent space of a deep learner architec… ▽ More Neuro-symbolic methods integrate neural architectures, knowledge representation and reasoning. However, they have been struggling at both dealing with the intrinsic uncertainty of the observations and scaling to real-world applications. This paper presents Relational Reasoning Networks (R2N), a novel end-to-end model that performs relational reasoning in the latent space of a deep learner architecture, where the representations of constants, ground atoms and their manipulations are learned in an integrated fashion. Unlike flat architectures like Knowledge Graph Embedders, which can only represent relations between entities, R2Ns define an additional computational structure, accounting for higher-level relations among the ground atoms. The considered relations can be explicitly known, like the ones defined by logic formulas, or defined as unconstrained correlations among groups of ground atoms. R2Ns can be applied to purely symbolic tasks or as a neuro-symbolic platform to integrate learning and reasoning in heterogeneous problems with both symbolic and feature-based represented entities. The proposed model overtakes the limitations of previous neuro-symbolic methods that have been either limited in terms of scalability or expressivity. The proposed methodology is shown to achieve state-of-the-art results in different experimental settings. △ Less

Submitted 30 January, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2009.12600 [pdf, other]

Online Learning of Non-Markovian Reward Models

Authors: Gavin Rens, Jean-François Raskin, Raphaël Reynouad, Giuseppe Marra

Abstract: There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences. In our formal setting, we consider a Markov decision process (MDP) t… ▽ More There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's $L^*$ active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by $L^*$. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on three problems. The results show that using $L^*$ to learn an MRM in a non-Markovian reward decision process is effective. △ Less

Submitted 30 September, 2020; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: 24 pages, single column, 7 figures. arXiv admin note: substantial text overlap with arXiv:2001.09293

arXiv:2005.02392 [pdf, other]

doi 10.1109/TPAMI.2021.3073504

Deep Constraint-based Propagation in Graph Neural Networks

Authors: Matteo Tiezzi, Giuseppe Marra, Stefano Melacci, Marco Maggini

Abstract: The popularity of deep learning techniques renewed the interest in neural architectures able to process complex structures that can be represented using graphs, inspired by Graph Neural Networks (GNNs). We focus our attention on the originally proposed GNN model of Scarselli et al. 2009, which encodes the state of the nodes of the graph by means of an iterative diffusion procedure that, during the… ▽ More The popularity of deep learning techniques renewed the interest in neural architectures able to process complex structures that can be represented using graphs, inspired by Graph Neural Networks (GNNs). We focus our attention on the originally proposed GNN model of Scarselli et al. 2009, which encodes the state of the nodes of the graph by means of an iterative diffusion procedure that, during the learning stage, must be computed at every epoch, until the fixed point of a learnable state transition function is reached, propagating the information among the neighbouring nodes. We propose a novel approach to learning in GNNs, based on constrained optimization in the Lagrangian framework. Learning both the transition function and the node states is the outcome of a joint process, in which the state convergence procedure is implicitly expressed by a constraint satisfaction mechanism, avoiding iterative epoch-wise procedures and the network unfolding. Our computational structure searches for saddle points of the Lagrangian in the adjoint space composed of weights, nodes state variables and Lagrange multipliers. This process is further enhanced by multiple layers of constraints that accelerate the diffusion process. An experimental analysis shows that the proposed approach compares favourably with popular models on several benchmarks. △ Less

Submitted 1 September, 2021; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2002.07684

arXiv:2003.08316 [pdf, ps, other]

From Statistical Relational to Neuro-Symbolic Artificial Intelligence

Authors: Luc De Raedt, Sebastijan Dumančić, Robin Manhaeve, Giuseppe Marra

Abstract: Neuro-symbolic and statistical relational artificial intelligence both integrate frameworks for learning with logical reasoning. This survey identifies several parallels across seven different dimensions between these two fields. These cannot only be used to characterize and position neuro-symbolic artificial intelligence approaches but also to identify a number of directions for further research. Neuro-symbolic and statistical relational artificial intelligence both integrate frameworks for learning with logical reasoning. This survey identifies several parallels across seven different dimensions between these two fields. These cannot only be used to characterize and position neuro-symbolic artificial intelligence approaches but also to identify a number of directions for further research. △ Less

Submitted 24 March, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.07720 [pdf, other]

Local Propagation in Constraint-based Neural Network

Authors: Giuseppe Marra, Matteo Tiezzi, Stefano Melacci, Alessandro Betti, Marco Maggini, Marco Gori

Abstract: In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on… ▽ More In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on the search for saddle points in the adjoint space composed of weights, neural outputs, and Lagrange multipliers. All the updates of the model variables are locally performed, so that LP is fully parallelizable over the neural units, circumventing the classic problem of gradient vanishing in deep networks. The implementation of popular neural models is described in the context of LP, together with those conditions that trace a natural connection with Backpropagation. We also investigate the setting in which we tolerate bounded violations of the architectural constraints, and we provide experimental evidence that LP is a feasible approach to train shallow and deep networks, opening the road to further investigations on more complex architectures, easily describable by constraints. △ Less

Submitted 17 April, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

arXiv:2002.07684 [pdf, ps, other]

A Lagrangian Approach to Information Propagation in Graph Neural Networks

Authors: Matteo Tiezzi, Giuseppe Marra, Stefano Melacci, Marco Maggini, Marco Gori

Abstract: In many real world applications, data are characterized by a complex structure, that can be naturally encoded as a graph. In the last years, the popularity of deep learning techniques has renewed the interest in neural models able to process complex patterns. In particular, inspired by the Graph Neural Network (GNN) model, different architectures have been proposed to extend the original GNN schem… ▽ More In many real world applications, data are characterized by a complex structure, that can be naturally encoded as a graph. In the last years, the popularity of deep learning techniques has renewed the interest in neural models able to process complex patterns. In particular, inspired by the Graph Neural Network (GNN) model, different architectures have been proposed to extend the original GNN scheme. GNNs exploit a set of state variables, each assigned to a graph node, and a diffusion mechanism of the states among neighbor nodes, to implement an iterative procedure to compute the fixed point of the (learnable) state transition function. In this paper, we propose a novel approach to the state computation and the learning algorithm for GNNs, based on a constraint optimisation task solved in the Lagrangian framework. The state convergence procedure is implicitly expressed by the constraint satisfaction mechanism and does not require a separate iterative phase for each epoch of the learning procedure. In fact, the computational structure is based on the search for saddle points of the Lagrangian in the adjoint space composed of weights, neural outputs (node states), and Lagrange multipliers. The proposed approach is compared experimentally with other popular models for processing graphs. △ Less

Submitted 17 April, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

arXiv:2002.02193 [pdf, other]

Relational Neural Machines

Authors: Giuseppe Marra, Michelangelo Diligenti, Francesco Giannini, Marco Gori, Marco Maggini

Abstract: Deep learning has been shown to achieve impressive results in several tasks where a large amount of training data is available. However, deep learning solely focuses on the accuracy of the predictions, neglecting the reasoning process leading to a decision, which is a major issue in life-critical applications. Probabilistic logic reasoning allows to exploit both statistical regularities and specif… ▽ More Deep learning has been shown to achieve impressive results in several tasks where a large amount of training data is available. However, deep learning solely focuses on the accuracy of the predictions, neglecting the reasoning process leading to a decision, which is a major issue in life-critical applications. Probabilistic logic reasoning allows to exploit both statistical regularities and specific domain expertise to perform reasoning under uncertainty, but its scalability and brittle integration with the layers processing the sensory data have greatly limited its applications. For these reasons, combining deep architectures and probabilistic logic reasoning is a fundamental goal towards the development of intelligent agents operating in complex environments. This paper presents Relational Neural Machines, a novel framework allowing to jointly train the parameters of the learners and of a First--Order Logic based reasoner. A Relational Neural Machine is able to recover both classical learning from supervised data in case of pure sub-symbolic learning, and Markov Logic Networks in case of pure symbolic reasoning, while allowing to jointly train and perform inference in hybrid learning tasks. Proper algorithmic solutions are devised to make learning and inference tractable in large-scale problems. The experiments show promising results in different relational tasks. △ Less

Submitted 6 February, 2020; originally announced February 2020.

arXiv:1909.05367 [pdf, other]

doi 10.1109/TNNLS.2019.2955597

Learning in Text Streams: Discovery and Disambiguation of Entity and Relation Instances

Authors: Marco Maggini, Giuseppe Marra, Stefano Melacci, Andrea Zugarini

Abstract: We consider a scenario where an artificial agent is reading a stream of text composed of a set of narrations, and it is informed about the identity of some of the individuals that are mentioned in the text portion that is currently being read. The agent is expected to learn to follow the narrations, thus disambiguating mentions and discovering new individuals. We focus on the case in which individ… ▽ More We consider a scenario where an artificial agent is reading a stream of text composed of a set of narrations, and it is informed about the identity of some of the individuals that are mentioned in the text portion that is currently being read. The agent is expected to learn to follow the narrations, thus disambiguating mentions and discovering new individuals. We focus on the case in which individuals are entities and relations, and we propose an end-to-end trainable memory network that learns to discover and disambiguate them in an online manner, performing one-shot learning, and dealing with a small number of sparse supervisions. Our system builds a not-given-in-advance knowledge base, and it improves its skills while reading unsupervised text. The model deals with abrupt changes in the narration, taking into account their effects when resolving co-references. We showcase the strong disambiguation and discovery skills of our model on a corpus of Wikipedia documents and on a newly introduced dataset, that we make publicly available. △ Less

Submitted 27 April, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1908.01819 [pdf, other]

doi 10.1007/978-3-030-01424-7_13

An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning

Authors: Giuseppe Marra, Andrea Zugarini, Stefano Melacci, Marco Maggini

Abstract: In the last few years, neural networks have been intensively used to develop meaningful distributed representations of words and contexts around them. When these representations, also known as "embeddings", are learned from unsupervised large corpora, they can be transferred to different tasks with positive effects in terms of performances, especially when only a few supervisions are available. In… ▽ More In the last few years, neural networks have been intensively used to develop meaningful distributed representations of words and contexts around them. When these representations, also known as "embeddings", are learned from unsupervised large corpora, they can be transferred to different tasks with positive effects in terms of performances, especially when only a few supervisions are available. In this work, we further extend this concept, and we present an unsupervised neural architecture that jointly learns word and context embeddings, processing words as sequences of characters. This allows our model to spot the regularities that are due to the word morphology, and to avoid the need of a fixed-sized input vocabulary of words. We show that we can learn compact encoders that, despite the relatively small number of parameters, reach high-level performances in downstream tasks, comparing them with related state-of-the-art approaches or with fully supervised methods. △ Less

Submitted 19 July, 2019; originally announced August 2019.

Journal ref: Lecture Notes in Computer Science, vol 11141. Springer, Cham 2018

arXiv:1907.11468 [pdf, other]

doi 10.1007/s10489-022-04383-6

T-Norms Driven Loss Functions for Machine Learning

Authors: Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Maggini, Marco Gori

Abstract: Neural-symbolic approaches have recently gained popularity to inject prior knowledge into a learner without requiring it to induce this knowledge from data. These approaches can potentially learn competitive solutions with a significant reduction of the amount of supervised data. A large class of neural-symbolic approaches is based on First-Order Logic to represent prior knowledge, relaxed to a di… ▽ More Neural-symbolic approaches have recently gained popularity to inject prior knowledge into a learner without requiring it to induce this knowledge from data. These approaches can potentially learn competitive solutions with a significant reduction of the amount of supervised data. A large class of neural-symbolic approaches is based on First-Order Logic to represent prior knowledge, relaxed to a differentiable form using fuzzy logic. This paper shows that the loss function expressing these neural-symbolic learning tasks can be unambiguously determined given the selection of a t-norm generator. When restricted to supervised learning, the presented theoretical apparatus provides a clean justification to the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. However, the proposed learning formulation extends the advantages of the cross-entropy loss to the general knowledge that can be represented by a neural-symbolic method. Therefore, the methodology allows the development of a novel class of loss functions, which are shown in the experimental results to lead to faster convergence rates than the approaches previously proposed in the literature. △ Less

Submitted 15 February, 2023; v1 submitted 26 July, 2019; originally announced July 2019.

Journal ref: Applied Intelligence 2023, Springer

arXiv:1907.07904 [pdf, other]

On the relation between Loss Functions and T-Norms

Authors: Francesco Giannini, Giuseppe Marra, Michelangelo Diligenti, Marco Maggini, Marco Gori

Abstract: Deep learning has been shown to achieve impressive results in several domains like computer vision and natural language processing. A key element of this success has been the development of new loss functions, like the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. While the cross-entropy loss is… ▽ More Deep learning has been shown to achieve impressive results in several domains like computer vision and natural language processing. A key element of this success has been the development of new loss functions, like the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. While the cross-entropy loss is usually justified from a probabilistic perspective, this paper shows an alternative and more direct interpretation of this loss in terms of t-norms and their associated generator functions, and derives a general relation between loss functions and t-norms. In particular, the presented work shows intriguing results leading to the development of a novel class of loss functions. These losses can be exploited in any supervised learning task and which could lead to faster convergence rates that the commonly employed cross-entropy loss. △ Less

Submitted 18 July, 2019; originally announced July 2019.

arXiv:1905.13462 [pdf, other]

Neural Markov Logic Networks

Authors: Giuseppe Marra, Ondřej Kuželka

Abstract: We introduce neural Markov logic networks (NMLNs), a statistical relational learning system that borrows ideas from Markov logic. Like Markov logic networks (MLNs), NMLNs are an exponential-family model for modelling distributions over possible worlds, but unlike MLNs, they do not rely on explicitly specified first-order logic rules. Instead, NMLNs learn an implicit representation of such rules as… ▽ More We introduce neural Markov logic networks (NMLNs), a statistical relational learning system that borrows ideas from Markov logic. Like Markov logic networks (MLNs), NMLNs are an exponential-family model for modelling distributions over possible worlds, but unlike MLNs, they do not rely on explicitly specified first-order logic rules. Instead, NMLNs learn an implicit representation of such rules as a neural network that acts as a potential function on fragments of the relational structure. Similarly to many neural symbolic methods, NMLNs can exploit embeddings of constants but, unlike them, NMLNs work well also in their absence. This is extremely important for predicting in settings other than the transductive one. We showcase the potential of NMLNs on knowledge-base completion, triple classification and on generation of molecular (graph) data. △ Less

Submitted 22 October, 2020; v1 submitted 31 May, 2019; originally announced May 2019.

arXiv:1903.07534 [pdf, other]

LYRICS: a General Interface Layer to Integrate Logic Inference and Deep Learning

Authors: Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Gori

Abstract: In spite of the amazing results obtained by deep learning in many applications, a real intelligent behavior of an agent acting in a complex environment is likely to require some kind of higher-level symbolic inference. Therefore, there is a clear need for the definition of a general and tight integration between low-level tasks, processing sensorial data that can be effectively elaborated using de… ▽ More In spite of the amazing results obtained by deep learning in many applications, a real intelligent behavior of an agent acting in a complex environment is likely to require some kind of higher-level symbolic inference. Therefore, there is a clear need for the definition of a general and tight integration between low-level tasks, processing sensorial data that can be effectively elaborated using deep learning techniques, and the logic reasoning that allows humans to take decisions in complex environments. This paper presents LYRICS, a generic interface layer for AI, which is implemented in TersorFlow (TF). LYRICS provides an input language that allows to define arbitrary First Order Logic (FOL) background knowledge. The predicates and functions of the FOL knowledge can be bound to any TF computational graph, and the formulas are converted into a set of real-valued constraints, which participate to the overall optimization problem. This allows to learn the weights of the learners, under the constraints imposed by the prior knowledge. The framework is extremely general as it imposes no restrictions in terms of which models or knowledge can be integrated. In this paper, we show the generality of the approach showing some use cases of the presented language, including model checking, supervised learning and collective classification. △ Less

Submitted 12 September, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

Comments: To appear in proceedings of ECML PKDD 2019

arXiv:1901.04195 [pdf, other]

Integrating Learning and Reasoning with Deep Logic Models

Authors: Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Gori

Abstract: Deep learning is very effective at jointly learning feature representations and classification models, especially when dealing with high dimensional input patterns. Probabilistic logic reasoning, on the other hand, is capable to take consistent and robust decisions in complex environments. The integration of deep learning and logic reasoning is still an open-research problem and it is considered t… ▽ More Deep learning is very effective at jointly learning feature representations and classification models, especially when dealing with high dimensional input patterns. Probabilistic logic reasoning, on the other hand, is capable to take consistent and robust decisions in complex environments. The integration of deep learning and logic reasoning is still an open-research problem and it is considered to be the key for the development of real intelligent agents. This paper presents Deep Logic Models, which are deep graphical models integrating deep learning and logic reasoning both for learning and inference. Deep Logic Models create an end-to-end differentiable architecture, where deep learners are embedded into a network implementing a continuous relaxation of the logic knowledge. The learning process allows to jointly learn the weights of the deep learners and the meta-parameters controlling the high-level reasoning. The experimental results show that the proposed methodology overtakes the limitations of the other approaches that have been proposed to bridge deep learning and reasoning. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1808.06934 [pdf, ps, other]

Backpropagation and Biological Plausibility

Authors: Alessandro Betti, Marco Gori, Giuseppe Marra

Abstract: By and large, Backpropagation (BP) is regarded as one of the most important neural computation algorithms at the basis of the progress in machine learning, including the recent advances in deep learning. However, its computational structure has been the source of many debates on its arguable biological plausibility. In this paper, it is shown that when framing supervised learning in the Lagrangian… ▽ More By and large, Backpropagation (BP) is regarded as one of the most important neural computation algorithms at the basis of the progress in machine learning, including the recent advances in deep learning. However, its computational structure has been the source of many debates on its arguable biological plausibility. In this paper, it is shown that when framing supervised learning in the Lagrangian framework, while one can see a natural emergence of Backpropagation, biologically plausible local algorithms can also be devised that are based on the search for saddle points in the learning adjoint space composed of weights, neural outputs, and Lagrangian multipliers. This might open the doors to a truly novel class of learning algorithms where, because of the introduction of the notion of support neurons, the optimization scheme also plays a fundamental role in the construction of the architecture. △ Less

Submitted 21 August, 2018; originally announced August 2018.

arXiv:1807.09202 [pdf, other]

doi 10.1007/978-3-030-30508-6_45

Constraint-Based Visual Generation

Authors: Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Gori

Abstract: In the last few years the systematic adoption of deep learning to visual generation has produced impressive results that, amongst others, definitely benefit from the massive exploration of convolutional architectures. In this paper, we propose a general approach to visual generation that combines learning capabilities with logic descriptions of the target to be generated. The process of generation… ▽ More In the last few years the systematic adoption of deep learning to visual generation has produced impressive results that, amongst others, definitely benefit from the massive exploration of convolutional architectures. In this paper, we propose a general approach to visual generation that combines learning capabilities with logic descriptions of the target to be generated. The process of generation is regarded as a constrained satisfaction problem, where the constraints describe a set of properties that characterize the target. Interestingly, the constraints can also involve logic variables, while all of them are converted into real-valued functions by means of the t-norm theory. We use deep architectures to model the involved variables, and propose a computational scheme where the learning process carries out a satisfaction of the constraints. We propose some examples in which the theory can naturally be used, including the modeling of GAN and auto-encoders, and report promising results in problems with the generation of handwritten characters and face transformations. △ Less

Submitted 24 September, 2019; v1 submitted 16 July, 2018; originally announced July 2018.

arXiv:1807.06302 [pdf, other]

Learning Neuron Non-Linearities with Kernel-Based Deep Neural Networks

Authors: Giuseppe Marra, Dario Zanca, Alessandro Betti, Marco Gori

Abstract: The effectiveness of deep neural architectures has been widely supported in terms of both experimental and foundational principles. There is also clear evidence that the activation function (e.g. the rectifier and the LSTM units) plays a crucial role in the complexity of learning. Based on this remark, this paper discusses an optimal selection of the neuron non-linearity in a functional framework… ▽ More The effectiveness of deep neural architectures has been widely supported in terms of both experimental and foundational principles. There is also clear evidence that the activation function (e.g. the rectifier and the LSTM units) plays a crucial role in the complexity of learning. Based on this remark, this paper discusses an optimal selection of the neuron non-linearity in a functional framework that is inspired from classic regularization arguments. It is shown that the best activation function is represented by a kernel expansion in the training set, that can be effectively approximated over an opportune set of points modeling 1-D clusters. The idea can be naturally extended to recurrent networks, where the expressiveness of kernel-based activation functions turns out to be a crucial ingredient to capture long-term dependencies. We give experimental evidence of this property by a set of challenging experiments, where we compare the results with neural architectures based on state of the art LSTM cells. △ Less

Submitted 5 October, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

arXiv:1611.08981 [pdf]

Online tools for public engagement: case studies from Reykjavik

Authors: Iva Bojic, Giulia Marra, Vera Naydenova

Abstract: With the ubiquity of Internet technologies and growing demands for transparency and open data policies, the role of social networking and online deliberation tools for public engagement in decision-making has increased substantially in the last decades. In this paper, we present the analysis of how social media are used by different public bodies to enhance public participation in deliberative dem… ▽ More With the ubiquity of Internet technologies and growing demands for transparency and open data policies, the role of social networking and online deliberation tools for public engagement in decision-making has increased substantially in the last decades. In this paper, we present the analysis of how social media are used by different public bodies to enhance public participation in deliberative democracy. We collected and reviewed published information on the subject and carried out a field base assessment, involving structured interviews with different government representatives and urban policymakers. In order to compare collected data, we used a framework for systematic analysis and comparison of e-participation platforms called the participatory cube. The results we got were the following. Participatory decision-making on matters of public concern justly consumes time and resources, therefore online tools should be applied with consideration of scale and efficiency, i.e. on burning issues for a majority of citizens or small-scale local platforms, and in combination with meetings in real time and space. The budget and workforce allocated to managing online engagement tools should be proportionate to other political and administrative efforts to bring to execution proposed ideas and act on collected feedback in order to satisfy the needs expressed by the communities and not undermine their beliefs about their power to influence decisions. △ Less

Submitted 27 November, 2016; originally announced November 2016.

Showing 1–32 of 32 results for author: Marra, G