Search | arXiv e-print repository

Nash Equilibrium in Games on Graphs with Incomplete Preferences

Authors: Abhishek N. Kulkarni, Jie Fu, Ufuk Topcu

Abstract: Games with incomplete preferences are an important model for studying rational decision-making in scenarios where players face incomplete information about their preferences and must contend with incomparable outcomes. We study the problem of computing Nash equilibrium in a subclass of two-player games played on graphs where each player seeks to maximally satisfy their (possibly incomplete) prefer… ▽ More Games with incomplete preferences are an important model for studying rational decision-making in scenarios where players face incomplete information about their preferences and must contend with incomparable outcomes. We study the problem of computing Nash equilibrium in a subclass of two-player games played on graphs where each player seeks to maximally satisfy their (possibly incomplete) preferences over a set of temporal goals. We characterize the Nash equilibrium and prove its existence in scenarios where player preferences are fully aligned, partially aligned, and completely opposite, in terms of the well-known solution concepts of sure winning and Pareto efficiency. When preferences are partially aligned, we derive conditions under which a player needs cooperation and demonstrate that the Nash equilibria depend not only on the preference alignment but also on whether the players need cooperation to achieve a better outcome and whether they are willing to cooperate.We illustrate the theoretical results by solving a mechanism design problem for a drone delivery scenario. △ Less

Submitted 11 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: 14 page, 6 figure, under development

arXiv:2406.07556 [pdf]

Community Driven Approaches to Research in Technology & Society CCC Workshop Report

Authors: Suresh Venkatasubramanian, Timnit Gebru, Ufuk Topcu, Haley Griffin, Leah Namisa Rosenbloom, Nasim Sonboli

Abstract: Based on our workshop activities, we outlined three ways in which research can support community needs: (1) Mapping the ecosystem of both the players and ecosystem and harm landscapes, (2) Counter-Programming, which entails using the same surveillance tools that communities are subjected to observe the entities doing the surveilling, effectively protecting people from surveillance, and conducting… ▽ More Based on our workshop activities, we outlined three ways in which research can support community needs: (1) Mapping the ecosystem of both the players and ecosystem and harm landscapes, (2) Counter-Programming, which entails using the same surveillance tools that communities are subjected to observe the entities doing the surveilling, effectively protecting people from surveillance, and conducting ethical data collection to measure the impact of these technologies, and (3) Engaging in positive visions and tools for empowerment so that technology can bring good instead of harm. In order to effectively collaborate on the aforementioned directions, we outlined seven important mechanisms for effective collaboration: (1) Never expect free labor of community members, (2) Ensure goals are aligned between all collaborators, (3) Elevate community members to leadership positions, (4) Understand no group is a monolith, (5) Establish a common language, (6) Discuss organization roles and goals of the project transparently from the start, and (7) Enable a recourse for harm. We recommend that anyone engaging in community-based research (1) starts with community-defined solutions, (2) provides alternatives to digital services/information collecting mechanisms, (3) prohibits harmful automated systems, (4) transparently states any systems impact, (5) minimizes and protects data, (6) proactively demonstrates a system is safe and beneficial prior to deployment, and (7) provides resources directly to community partners. Throughout the recommendation section of the report, we also provide specific recommendations for funding agencies, academic institutions, and individual researchers. △ Less

Submitted 21 March, 2024; originally announced June 2024.

arXiv:2406.03565 [pdf, other]

Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games

Authors: Kushagra Gupta, Xinjie Liu, Ufuk Topcu, David Fridovich-Keil

Abstract: Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors.… ▽ More Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with a local linear convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local superlinear convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods do not offer convergence rate guarantees. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.14173 [pdf, other]

Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication

Authors: Shenghui Chen, Daniel Fried, Ufuk Topcu

Abstract: Developing autonomous agents that can strategize and cooperate with humans under information asymmetry is challenging without effective communication in natural language. We introduce a shared-control game, where two players collectively control a token in alternating turns to achieve a common objective under incomplete information. We formulate a policy synthesis problem for an autonomous agent i… ▽ More Developing autonomous agents that can strategize and cooperate with humans under information asymmetry is challenging without effective communication in natural language. We introduce a shared-control game, where two players collectively control a token in alternating turns to achieve a common objective under incomplete information. We formulate a policy synthesis problem for an autonomous agent in this game with a human as the other player. To solve this problem, we propose a communication-based approach comprising a language module and a planning module. The language module translates natural language messages into and from a finite set of flags, a compact representation defined to capture player intents. The planning module leverages these flags to compute a policy using an asymmetric information-set Monte Carlo tree search with flag exchange algorithm we present. We evaluate the effectiveness of this approach in a testbed based on Gnomes at Night, a search-and-find maze board game. Results of human subject experiments show that communication narrows the information gap between players and enhances human-agent cooperation efficiency with fewer turns. △ Less

Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: with appendix

arXiv:2405.08954 [pdf, other]

Zero-Shot Transfer of Neural ODEs

Authors: Tyler Ingebrand, Adam J. Thorpe, Ufuk Topcu

Abstract: Autonomous systems often encounter environments and scenarios beyond the scope of their training data, which underscores a critical challenge: the need to generalize and adapt to unseen scenarios in real time. This challenge necessitates new mathematical and algorithmic tools that enable adaptation and zero-shot transfer. To this end, we leverage the theory of function encoders, which enables zero… ▽ More Autonomous systems often encounter environments and scenarios beyond the scope of their training data, which underscores a critical challenge: the need to generalize and adapt to unseen scenarios in real time. This challenge necessitates new mathematical and algorithmic tools that enable adaptation and zero-shot transfer. To this end, we leverage the theory of function encoders, which enables zero-shot transfer by combining the flexibility of neural networks with the mathematical principles of Hilbert spaces. Using this theory, we first present a method for learning a space of dynamics spanned by a set of neural ODE basis functions. After training, the proposed approach can rapidly identify dynamics in the learned space using an efficient inner product calculation. Critically, this calculation requires no gradient calculations or retraining during the online phase. This method enables zero-shot transfer for autonomous systems at runtime and opens the door for a new class of adaptable control algorithms. We demonstrate state-of-the-art system modeling accuracy for two MuJoCo robot environments and show that the learned models can be used for more efficient MPC control of a quadrotor. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.00923 [pdf, other]

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

Authors: Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu

Abstract: Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method… ▽ More Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: https://vita-group.github.io/MM3DGS-SLAM △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Project Webpage: https://vita-group.github.io/MM3DGS-SLAM

arXiv:2403.17233 [pdf, other]

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

Authors: Kevin S. Miller, Adam J. Thorpe, Ufuk Topcu

Abstract: We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process. Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information. Through numerical exp… ▽ More We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process. Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information. Through numerical experiments, we demonstrate that this strategy explores regions of high discrepancy and accelerates learning while simultaneously reducing model uncertainty. We rigorously prove that our active learning algorithm yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance. We demonstrate the efficacy of our approach on an under-actuated pendulum system and on the half-cheetah MuJoCo environment. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.12279 [pdf, other]

Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Authors: Vivek Pandey, Arash Amini, Guangyi Liu, Ufuk Topcu, Qiyu Sun, Kostas Daniilidis, Nader Motee

Abstract: We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, w… ▽ More We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, we establish a crucial connection between graph Laplacian and the importance of features. We show that strong network connectivity translates to uniformity in feature importance, which enables uniform random sampling of features and reduces the overall computational complexity. We leverage a scalable randomized algorithm for sparse sums of positive semidefinite matrices to efficiently select the set of the most informative features and significantly improve the probabilistic performance bounds. Finally, we support our findings with extensive simulations. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10705 [pdf, other]

Susceptibility of Communities against Low-Credibility Content in Social News Websites

Authors: Yigit Ege Bayiz, Arash Amini, Radu Marculescu, Ufuk Topcu

Abstract: Social news websites, such as Reddit, have evolved into prominent platforms for sharing and discussing news. A key issue on social news websites sites is the formation of echo chambers, which often lead to the spread of highly biased or uncredible news. We develop a method to identify communities within a social news website that are prone to uncredible or highly biased news. We employ a user embe… ▽ More Social news websites, such as Reddit, have evolved into prominent platforms for sharing and discussing news. A key issue on social news websites sites is the formation of echo chambers, which often lead to the spread of highly biased or uncredible news. We develop a method to identify communities within a social news website that are prone to uncredible or highly biased news. We employ a user embedding pipeline that detects user communities based on their stances towards posts and news sources. We then project each community onto a credibility-bias space and analyze the distributional characteristics of each projected community to identify those that have a high risk of adopting beliefs with low credibility or high bias. This approach also enables the prediction of individual users' susceptibility to low credibility content, based on their community affiliation. Our experiments show that latent space clusters effectively indicate the credibility and bias levels of their users, with significant differences observed across clusters -- a $34\%$ difference in the users' susceptibility to low-credibility content and a $8.3\%$ difference in the users' susceptibility to high political bias. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures, Under review in ICWSM 2024

arXiv:2403.10384 [pdf, other]

Coordination in Noncooperative Multiplayer Matrix Games via Reduced Rank Correlated Equilibria

Authors: Jaehan Im, Yue Yu, David Fridovich-Keil, Ufuk Topcu

Abstract: Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of j… ▽ More Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of joint actions to be considered and thereby mitigates computational complexity. The idea is to approximate the set of all joint actions with the actions used in a set of pre-computed Nash equilibria via a convex hull operation. In a game with n players and each player having m actions, the proposed mechanism reduces the number of joint actions considered from O(m^n) to O(mn). We demonstrate the application of the proposed mechanism to an air traffic queue management problem. Compared with the correlated equilibrium-a popular benchmark coordination mechanism-the proposed approach is capable of solving a problem involving four thousand times more joint actions while yielding similar or better performance in terms of a fairness indicator and showing a maximum optimality gap of 0.066% in terms of the average delay cost. In the meantime, it yields a solution that shows up to 99.5% improvement in a fairness indicator and up to 50.4% reduction in average delay cost compared to the Nash solution, which does not involve coordination. △ Less

Submitted 12 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.10938 [pdf, other]

News Source Credibility Assessment: A Reddit Case Study

Authors: Arash Amini, Yigit Ege Bayiz, Ashwin Ram, Radu Marculescu, Ufuk Topcu

Abstract: In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training… ▽ More In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training approach for CREDiBERT, leveraging Reddit's community-based structure. By encoding submission content using CREDiBERT and integrating it into a Siamese neural network, we significantly improve the binary classification of submission credibility, achieving a 9% increase in F1 score compared to existing methods. Additionally, we introduce a new version of the post-to-post network in Reddit that efficiently encodes user interactions to enhance the binary classification task by nearly 8% in F1 score. Finally, we employ CREDiBERT to evaluate the susceptibility of subreddits with respect to different topics. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 12 pages; 3 figures

arXiv:2402.08902 [pdf, other]

Auto-Encoding Bayesian Inverse Games

Authors: Xinjie Liu, Lasse Peters, Javier Alonso-Mora, Ufuk Topcu, David Fridovich-Keil

Abstract: When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some pr… ▽ More When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior game parameter distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning. △ Less

Submitted 15 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.08570 [pdf, other]

Online Foundation Model Selection in Robotics

Authors: Po-han Li, Oyku Selin Toprak, Aditya Narayanan, Ufuk Topcu, Sandeep Chinchali

Abstract: Foundation models have recently expanded into robotics after excelling in computer vision and natural language processing. The models are accessible in two ways: open-source or paid, closed-source options. Users with access to both face a problem when deciding between effective yet costly closed-source models and free but less powerful open-source alternatives. We call it the model selection probl… ▽ More Foundation models have recently expanded into robotics after excelling in computer vision and natural language processing. The models are accessible in two ways: open-source or paid, closed-source options. Users with access to both face a problem when deciding between effective yet costly closed-source models and free but less powerful open-source alternatives. We call it the model selection problem. Existing supervised-learning methods are impractical due to the high cost of collecting extensive training data from closed-source models. Hence, we focus on the online learning setting where algorithms learn while collecting data, eliminating the need for large pre-collected datasets. We thus formulate a user-centric online model selection problem and propose a novel solution that combines an open-source encoder to output context and an online learning algorithm that processes this context. The encoder distills vast data distributions into low-dimensional features, i.e., the context, without additional training. The online learning algorithm aims to maximize a composite reward that includes model performance, execution time, and costs based on the context extracted from the data. It results in an improved trade-off between selecting open-source and closed-source models compared to non-contextual methods, as validated by our theoretical analysis. Experiments across language-based robotic tasks such as Waymo Open Dataset, ALFRED, and Open X-Embodiment demonstrate real-world applications of the solution. The results show that the solution significantly improves the task success rate by up to 14%. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07069 [pdf, other]

Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine

Authors: Shayan Meshkat Alsadat, Jean-Raphael Gaglione, Daniel Neider, Ufuk Topcu, Zhe Xu

Abstract: We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement learning. Our method uses Large Language Models (LLM) to obtain high-level domain-specific knowledge using prompt engineering instead of providing the reinforcement le… ▽ More We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement learning. Our method uses Large Language Models (LLM) to obtain high-level domain-specific knowledge using prompt engineering instead of providing the reinforcement learning algorithm directly with the high-level knowledge which requires an expert to encode the automaton. We use chain-of-thought and few-shot methods for prompt engineering and demonstrate that our method works using these approaches. Additionally, LARL-RM allows for fully closed-loop reinforcement learning without the need for an expert to guide and supervise the learning since LARL-RM can use the LLM directly to generate the required high-level knowledge for the task at hand. We also show the theoretical guarantee of our algorithm to converge to an optimal policy. We demonstrate that LARL-RM speeds up the convergence by 30% by implementing our method in two case studies. △ Less

Submitted 10 February, 2024; originally announced February 2024.

arXiv:2401.17173 [pdf, other]

Zero-Shot Reinforcement Learning via Function Encoders

Authors: Tyler Ingebrand, Amy Zhang, Ufuk Topcu

Abstract: Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation… ▽ More Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation. △ Less

Submitted 11 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2312.13132 [pdf, other]

On the complexity of sabotage games for network security

Authors: Dhananjay Raju, Georgios Bakirtzis, Ufuk Topcu

Abstract: Securing dynamic networks against adversarial actions is challenging because of the need to anticipate and counter strategic disruptions by adversarial entities within complex network structures. Traditional game-theoretic models, while insightful, often fail to model the unpredictability and constraints of real-world threat assessment scenarios. We refine sabotage games to reflect the realistic l… ▽ More Securing dynamic networks against adversarial actions is challenging because of the need to anticipate and counter strategic disruptions by adversarial entities within complex network structures. Traditional game-theoretic models, while insightful, often fail to model the unpredictability and constraints of real-world threat assessment scenarios. We refine sabotage games to reflect the realistic limitations of the saboteur and the network operator. By transforming sabotage games into reachability problems, our approach allows applying existing computational solutions to model realistic restrictions on attackers and defenders within the game. Modifying sabotage games into dynamic network security problems successfully captures the nuanced interplay of strategy and uncertainty in dynamic network security. Theoretically, we extend sabotage games to model network security contexts and thoroughly explore if the additional restrictions raise their computational complexity, often the bottleneck of game theory in practical contexts. Practically, this research sets the stage for actionable insights for developing robust defense mechanisms by understanding what risks to mitigate in dynamically changing networks under threat. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.01249 [pdf, other]

A Multifidelity Sim-to-Real Pipeline for Verifiable and Compositional Reinforcement Learning

Authors: Cyrus Neary, Christian Ellis, Aryaman Singh Samyal, Craig Lennon, Ufuk Topcu

Abstract: We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training a… ▽ More We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training and testing of the corresponding subtask policies, while simultaneously providing guarantees on the overall behavior that results from their composition. By verifying the performance of these subtask policies using a multifidelity simulation pipeline, the framework not only allows for efficient RL training, but also for a refinement of the subtasks and their interfaces in response to challenges arising from discrepancies between simulation and reality. In an experimental case study we apply the framework to train and deploy a compositional RL system that successfully pilots a Warthog unmanned ground robot. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.14200 [pdf, other]

Prebunking Design as a Defense Mechanism Against Misinformation Propagation on Social Networks

Authors: Yigit Ege Bayiz, Ufuk Topcu

Abstract: The growing reliance on social media for news consumption necessitates effective countermeasures to mitigate the rapid spread of misinformation. Prebunking, a proactive method that arms users with accurate information before they come across false content, has garnered support from journalism and psychology experts. We formalize the problem of optimal prebunking as optimizing the timing of deliver… ▽ More The growing reliance on social media for news consumption necessitates effective countermeasures to mitigate the rapid spread of misinformation. Prebunking, a proactive method that arms users with accurate information before they come across false content, has garnered support from journalism and psychology experts. We formalize the problem of optimal prebunking as optimizing the timing of delivering accurate information, ensuring users encounter it before receiving misinformation while minimizing the disruption to user experience. Utilizing a susceptible-infected epidemiological process to model the propagation of misinformation, we frame optimal prebunking as a policy synthesis problem with safety constraints. We then propose a policy that approximates the optimal solution to a relaxed problem. The experiments show that this policy cuts the user experience cost of repeated information delivery in half, compared to delivering accurate information immediately after identifying a misinformation propagation. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 10 pages, 3 figures, Submitted to PERCOM 2024

arXiv:2311.06275 [pdf]

Algorithmic Robustness

Authors: David Jensen, Brian LaMacchia, Ufuk Topcu, Pamela Wisniewski

Abstract: Algorithmic robustness refers to the sustained performance of a computational system in the face of change in the nature of the environment in which that system operates or in the task that the system is meant to perform. Below, we motivate the importance of algorithmic robustness, present a conceptual framework, and highlight the relevant areas of research for which algorithmic robustness is rele… ▽ More Algorithmic robustness refers to the sustained performance of a computational system in the face of change in the nature of the environment in which that system operates or in the task that the system is meant to perform. Below, we motivate the importance of algorithmic robustness, present a conceptual framework, and highlight the relevant areas of research for which algorithmic robustness is relevant. Why robustness? Robustness is an important enabler of other goals that are frequently cited in the context of public policy decisions about computational systems, including trustworthiness, accountability, fairness, and safety. Despite this dependence, it tends to be under-recognized compared to these other concepts. This is unfortunate, because robustness is often more immediately achievable than these other ultimate goals, which can be more subjective and exacting. Thus, we highlight robustness as an important goal for researchers, engineers, regulators, and policymakers when considering the design, implementation, and deployment of computational systems. We urge researchers and practitioners to elevate the attention paid to robustness when designing and evaluating computational systems. For many key systems, the immediate question after any demonstration of high performance should be: "How robust is that performance to realistic changes in the task or environment?" Greater robustness will set the stage for systems that are more trustworthy, accountable, fair, and safe. Toward that end, this document provides a brief roadmap to some of the concepts and existing research around the idea of algorithmic robustness. △ Less

Submitted 17 October, 2023; originally announced November 2023.

arXiv:2311.06255 [pdf, ps, other]

Privacy-Engineered Value Decomposition Networks for Cooperative Multi-Agent Reinforcement Learning

Authors: Parham Gohari, Matthew Hale, Ufuk Topcu

Abstract: In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's long-term rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingl… ▽ More In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's long-term rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm, an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination, and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guarantee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data, past environment interactions, when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination. △ Less

Submitted 12 September, 2023; originally announced November 2023.

Comments: Paper accepted at 62nd IEEE Conference on Decision and Control

arXiv:2311.01258 [pdf, other]

doi 10.1561/2600000029

Formal Methods for Autonomous Systems

Authors: Tichakorn Wongpiromsarn, Mahsa Ghasemi, Murat Cubuktepe, Georgios Bakirtzis, Steven Carr, Mustafa O. Karabag, Cyrus Neary, Parham Gohari, Ufuk Topcu

Abstract: Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. Th… ▽ More Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.18239 [pdf, other]

Fine-Tuning Language Models Using Formal Methods Feedback

Authors: Yunhao Yang, Neel P. Bhatt, Tyler Ingebrand, William Ward, Steven Carr, Zhangyang Wang, Ufuk Topcu

Abstract: Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models… ▽ More Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidences, primarily in autonomous driving, to demonstrate the method's effectiveness across multiple tasks. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.00468 [pdf, ps, other]

Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations

Authors: Mustafa O. Karabag, Sophia Smith, David Fridovich-Keil, Ufuk Topcu

Abstract: When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a l… ▽ More When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.10171 [pdf, other]

Specification-Driven Video Search via Foundation Models and Formal Verification

Authors: Yunhao Yang, Jean-Raphaël Gaglione, Sandeep Chinchali, Ufuk Topcu

Abstract: The increasing abundance of video data enables users to search for events of interest, e.g., emergency incidents. Meanwhile, it raises new concerns, such as the need for preserving privacy. Existing approaches to video search require either manual inspection or a deep learning model with massive training. We develop a method that uses recent advances in vision and language models, as well as forma… ▽ More The increasing abundance of video data enables users to search for events of interest, e.g., emergency incidents. Meanwhile, it raises new concerns, such as the need for preserving privacy. Existing approaches to video search require either manual inspection or a deep learning model with massive training. We develop a method that uses recent advances in vision and language models, as well as formal methods, to search for events of interest in video clips automatically and efficiently. The method consists of an algorithm to map text-based event descriptions into linear temporal logic over finite traces (LTL$_f$) and an algorithm to construct an automaton encoding the video information. Then, the method formally verifies the automaton representing the video against the LTL$_f$ specifications and adds the pertinent video clips to the search result if the automaton satisfies the specifications. We provide qualitative and quantitative analysis to demonstrate the video-searching capability of the proposed method. It achieves over 90 percent precision in searching over privacy-sensitive videos and a state-of-the-art autonomous driving dataset. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 12 pages, 18 figures

arXiv:2309.06420 [pdf, other]

Verifiable Reinforcement Learning Systems via Compositionality

Authors: Cyrus Neary, Aryaman Singh Samyal, Christos Verginis, Murat Cubuktepe, Ufuk Topcu

Abstract: We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process, which is used to plan and analyze compositions of subsystems, and of the collecti… ▽ More We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process, which is used to plan and analyze compositions of subsystems, and of the collection of low-level subsystems themselves. The subsystems are implemented as deep RL agents operating under partial observability. By defining interfaces between the subsystems, the framework enables automatic decompositions of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual subtask specifications, i.e. achieve the subsystem's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the subsystems. We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the subtask specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings. The result is an iterative procedure for defining subtask specifications, and for training the subsystems to meet them. Experimental results demonstrate the presented framework's novel capabilities in environments with both full and partial observability, discrete and continuous state and action spaces, as well as deterministic and stochastic dynamics. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2106.05864

arXiv:2308.08017 [pdf, other]

Active Inverse Learning in Stackelberg Trajectory Games

Authors: Yue Yu, Jacob Levy, Negar Mehr, David Fridovich-Keil, Ufuk Topcu

Abstract: Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the… ▽ More Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the follower's objective function. Instead of using passively observed trajectories like existing methods, the proposed method actively maximizes the differences in the follower's trajectories under different hypotheses to accelerate the leader's inference. We demonstrate the proposed method in a receding-horizon repeated trajectory game. Compared with uniformly random inputs, the leader inputs provided by the proposed method accelerate the convergence of the probability of different hypotheses conditioned on the follower's trajectory by orders of magnitude. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.05295 [pdf, other]

doi 10.5555/3635637.3663065

Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Authors: Yunhao Yang, Cyrus Neary, Ufuk Topcu

Abstract: Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to… ▽ More Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations with formal guarantees. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It allows formal verification of whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. Next, the algorithm leverages the vision and language capabilities of pretrained models to link the observations from the task environment to the text-based control logic from the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to provide probabilistic guarantees on whether the controller satisfies the user-provided specifications under perceptual uncertainties. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks. △ Less

Submitted 17 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted as full paper in AAMAS 2024

arXiv:2306.13732 [pdf, other]

Reinforcement Learning with Temporal-Logic-Based Causal Diagrams

Authors: Yash Paliwal, Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Xiaoming Duan, Ufuk Topcu, Zhe Xu

Abstract: We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals. In this setting, a common approach is to represent the tasks as deterministic finite automata (DFA) and integrate them into the state-space for RL algorithms. However, while these machines model the reward function, they often overlook the causal knowledge about the en… ▽ More We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals. In this setting, a common approach is to represent the tasks as deterministic finite automata (DFA) and integrate them into the state-space for RL algorithms. However, while these machines model the reward function, they often overlook the causal knowledge about the environment. To address this limitation, we propose the Temporal-Logic-based Causal Diagram (TL-CD) in RL, which captures the temporal causal relationships between different properties of the environment. We exploit the TL-CD to devise an RL algorithm in which an agent requires significantly less exploration of the environment. To this end, based on a TL-CD and a task DFA, we identify configurations where the agent can determine the expected rewards early during an exploration. Through a series of case studies, we demonstrate the benefits of using TL-CDs, particularly the faster convergence of the algorithm to an optimal policy due to reduced exploration of the environment. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.06335 [pdf, other]

How to Learn and Generalize From Three Minutes of Data: Physics-Constrained and Uncertainty-Aware Neural Stochastic Differential Equations

Authors: Franck Djeumou, Cyrus Neary, Ufuk Topcu

Abstract: We present a framework and algorithms to learn controlled dynamics models using neural stochastic differential equations (SDEs) -- SDEs whose drift and diffusion terms are both parametrized by neural networks. We construct the drift term to leverage a priori physics knowledge as inductive bias, and we design the diffusion term to represent a distance-aware estimate of the uncertainty in the learne… ▽ More We present a framework and algorithms to learn controlled dynamics models using neural stochastic differential equations (SDEs) -- SDEs whose drift and diffusion terms are both parametrized by neural networks. We construct the drift term to leverage a priori physics knowledge as inductive bias, and we design the diffusion term to represent a distance-aware estimate of the uncertainty in the learned model's predictions -- it matches the system's underlying stochasticity when evaluated on states near those from the training dataset, and it predicts highly stochastic dynamics when evaluated on states beyond the training regime. The proposed neural SDEs can be evaluated quickly enough for use in model predictive control algorithms, or they can be used as simulators for model-based reinforcement learning. Furthermore, they make accurate predictions over long time horizons, even when trained on small datasets that cover limited regions of the state space. We demonstrate these capabilities through experiments on simulated robotic systems, as well as by using them to model and control a hexacopter's flight dynamics: A neural SDE trained using only three minutes of manually collected flight data results in a model-based control policy that accurately tracks aggressive trajectories that push the hexacopter's velocity and Euler angles to nearly double the maximum values observed in the training dataset. △ Less

Submitted 15 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Final submission to CoRL 2023

arXiv:2306.06330 [pdf, other]

Autonomous Drifting with 3 Minutes of Data via Learned Tire Models

Authors: Franck Djeumou, Jonathan Y. M. Goh, Ufuk Topcu, Avinash Balachandran

Abstract: Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural-ExpTanh parameterization. These models are… ▽ More Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural-ExpTanh parameterization. These models are designed to satisfy physically insightful assumptions while also having sufficient fidelity to capture higher-order effects directly from vehicle state measurements. They are used as drop-in replacements for an analytical brush tire model in an existing nonlinear model predictive control framework. Experiments with a customized Toyota Supra show that scarce amounts of driving data -- less than three minutes -- is sufficient to achieve high-performance autonomous drifting on various trajectories with speeds up to 45mph. Comparisons with the benchmark model show a $4 \times$ improvement in tracking performance, smoother control inputs, and faster and more consistent computation time. △ Less

Submitted 16 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Final Submission at ICRA 2023

arXiv:2305.17372 [pdf, other]

Reinforcement Learning With Reward Machines in Stochastic Games

Authors: Jueming Hu, Jean-Raphael Gaglione, Yanze Wang, Zhe Xu, Ufuk Topcu, Yongming Liu

Abstract: We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we de… ▽ More We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at this point. We use the Lemke-Howson method to derive the best-response strategy given current Q-functions. The three case studies show that QRM-SG can learn the best-response strategies effectively. QRM-SG learns the best-response strategies after around 7500 episodes in Case Study I, 1000 episodes in Case Study II, and 1500 episodes in Case Study III, while baseline methods such as Nash Q-learning and MADDPG fail to converge to the Nash equilibrium in all three case studies. △ Less

Submitted 28 August, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.16505 [pdf, other]

Reward-Machine-Guided, Self-Paced Reinforcement Learning

Authors: Cevahir Koprulu, Ufuk Topcu

Abstract: Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in long-horizon planning tasks that involve temporally extended behaviors. We hypothesize that taking advantage of prior knowledge about the underlying task struc… ▽ More Self-paced reinforcement learning (RL) aims to improve the data efficiency of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniques for self-paced RL fail in long-horizon planning tasks that involve temporally extended behaviors. We hypothesize that taking advantage of prior knowledge about the underlying task structure can improve the effectiveness of self-paced RL. We develop a self-paced RL algorithm guided by reward machines, i.e., a type of finite-state machine that encodes the underlying task structure. The algorithm integrates reward machines in 1) the update of the policy and value functions obtained by any RL algorithm of choice, and 2) the update of the automated curriculum that generates context distributions. Our empirical results evidence that the proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress. It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 9 pages, 11 figures. Accepted for UAI 2023

arXiv:2305.15523 [pdf, other]

Task-aware Distributed Source Coding under Dynamic Bandwidth

Authors: Po-han Li, Sravan Kumar Ankireddy, Ruihan Zhao, Hossein Nourkhiz Mahjoub, Ehsan Moradi-Pari, Ufuk Topcu, Sandeep Chinchali, Hyeji Kim

Abstract: Efficient compression of correlated data is essential to minimize communication overload in multi-sensor networks. In such networks, each sensor independently compresses the data and transmits them to a central node due to limited communication bandwidth. A decoder at the central node decompresses and passes the data to a pre-trained machine learning-based task to generate the final output. Thus,… ▽ More Efficient compression of correlated data is essential to minimize communication overload in multi-sensor networks. In such networks, each sensor independently compresses the data and transmits them to a central node due to limited communication bandwidth. A decoder at the central node decompresses and passes the data to a pre-trained machine learning-based task to generate the final output. Thus, it is important to compress the features that are relevant to the task. Additionally, the final performance depends heavily on the total available bandwidth. In practice, it is common to encounter varying availability in bandwidth, and higher bandwidth results in better performance of the task. We design a novel distributed compression framework composed of independent encoders and a joint decoder, which we call neural distributed principal component analysis (NDPCA). NDPCA flexibly compresses data from multiple sources to any available bandwidth with a single model, reducing computing and storage overhead. NDPCA achieves this by learning low-rank task representations and efficiently distributing bandwidth among sensors, thus providing a graceful trade-off between performance and bandwidth. Experiments show that NDPCA improves the success rate of multi-view robotic arm manipulation by 9% and the accuracy of object detection tasks on satellite imagery by 14% compared to an autoencoder with uniform bandwidth allocation. △ Less

Submitted 13 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Journal ref: NeurIPS 2023

arXiv:2305.01473 [pdf, other]

Efficient Sensitivity Analysis for Parametric Robust Markov Chains

Authors: Thom Badings, Sebastian Junges, Ahmadreza Marandi, Ufuk Topcu, Nils Jansen

Abstract: We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected r… ▽ More We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: To be presented at CAV 2023

arXiv:2304.00163 [pdf, other]

Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Authors: Shenghui Chen, Yue Yu, David Fridovich-Keil, Ufuk Topcu

Abstract: Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-… ▽ More Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outperform those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude. △ Less

Submitted 8 September, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

arXiv:2303.04268 [pdf, ps, other]

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Authors: Mustafa O. Karabag, Ufuk Topcu

Abstract: Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity o… ▽ More Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: Accepted to AAAI-23

arXiv:2302.14242 [pdf, other]

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Authors: Ruihan Zhao, Ufuk Topcu, Sandeep Chinchali, Mariano Phielipp

Abstract: Recent progress in deep reinforcement learning (RL) and computer vision enables artificial agents to solve complex tasks, including locomotion, manipulation and video games from high-dimensional pixel observations. However, domain specific reward functions are often engineered to provide sufficient learning signals, requiring expert knowledge. While it is possible to train vision-based RL agents u… ▽ More Recent progress in deep reinforcement learning (RL) and computer vision enables artificial agents to solve complex tasks, including locomotion, manipulation and video games from high-dimensional pixel observations. However, domain specific reward functions are often engineered to provide sufficient learning signals, requiring expert knowledge. While it is possible to train vision-based RL agents using only sparse rewards, additional challenges in exploration arise. We present a novel and efficient method to solve sparse-reward robot manipulation tasks from only image observations by utilizing a few demonstrations. First, we learn an embedded neural dynamics model from demonstration transitions and further fine-tune it with the replay buffer. Next, we reward the agents for staying close to the demonstrated trajectories using a distance metric defined in the embedding space. Finally, we use an off-policy, model-free vision RL algorithm to update the control policies. Our method achieves state-of-the-art sample efficiency in simulation and enables efficient training of a real Franka Emika Panda manipulator. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2301.08811 [pdf, ps, other]

Differential Privacy in Cooperative Multiagent Planning

Authors: Bo Chen, Calvin Hawkins, Mustafa O. Karabag, Cyrus Neary, Matthew Hale, Ufuk Topcu

Abstract: Privacy-aware multiagent systems must protect agents' sensitive data while simultaneously ensuring that agents accomplish their shared objectives. Towards this goal, we propose a framework to privatize inter-agent communications in cooperative multiagent decision-making problems. We study sequential decision-making problems formulated as cooperative Markov games with reach-avoid objectives. We app… ▽ More Privacy-aware multiagent systems must protect agents' sensitive data while simultaneously ensuring that agents accomplish their shared objectives. Towards this goal, we propose a framework to privatize inter-agent communications in cooperative multiagent decision-making problems. We study sequential decision-making problems formulated as cooperative Markov games with reach-avoid objectives. We apply a differential privacy mechanism to privatize agents' communicated symbolic state trajectories, and then we analyze tradeoffs between the strength of privacy and the team's performance. For a given level of privacy, this tradeoff is shown to depend critically upon the total correlation among agents' state-action processes. We synthesize policies that are robust to privacy by reducing the value of the total correlation. Numerical experiments demonstrate that the team's performance under these policies decreases by only 3 percent when comparing private versus non-private implementations of communication. By contrast, the team's performance decreases by roughly 86 percent when using baseline policies that ignore total correlation and only optimize team performance. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2301.03565 [pdf, other]

Physics-Informed Kernel Embeddings: Integrating Prior System Knowledge with Data-Driven Control

Authors: Adam J. Thorpe, Cyrus Neary, Franck Djeumou, Meeko M. K. Oishi, Ufuk Topcu

Abstract: Data-driven control algorithms use observations of system dynamics to construct an implicit model for the purpose of control. However, in practice, data-driven techniques often require excessive sample sizes, which may be infeasible in real-world scenarios where only limited observations of the system are available. Furthermore, purely data-driven methods often neglect useful a priori knowledge, s… ▽ More Data-driven control algorithms use observations of system dynamics to construct an implicit model for the purpose of control. However, in practice, data-driven techniques often require excessive sample sizes, which may be infeasible in real-world scenarios where only limited observations of the system are available. Furthermore, purely data-driven methods often neglect useful a priori knowledge, such as approximate models of the system dynamics. We present a method to incorporate such prior knowledge into data-driven control algorithms using kernel embeddings, a nonparametric machine learning technique based in the theory of reproducing kernel Hilbert spaces. Our proposed approach incorporates prior knowledge of the system dynamics as a bias term in the kernel learning problem. We formulate the biased learning problem as a least-squares problem with a regularization term that is informed by the dynamics, that has an efficiently computable, closed-form solution. Through numerical experiments, we empirically demonstrate the improved sample efficiency and out-of-sample generalization of our approach over a purely data-driven baseline. We demonstrate an application of our method to control through a target tracking problem with nonholonomic dynamics, and on spring-mass-damper and F-16 aircraft state prediction tasks. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2301.01219 [pdf, other]

Task-Guided IRL in POMDPs that Scales

Authors: Franck Djeumou, Christian Ellis, Murat Cubuktepe, Craig Lennon, Ufuk Topcu

Abstract: In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision process… ▽ More In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information. △ Less

Submitted 30 December, 2022; originally announced January 2023.

Comments: Final submission to the Artificial Intelligence journal (Elsevier). arXiv admin note: substantial text overlap with arXiv:2105.14073

arXiv:2212.01944 [pdf, other]

Automaton-Based Representations of Task Knowledge from Generative Language Models

Authors: Yunhao Yang, Jean-Raphaël Gaglione, Cyrus Neary, Ufuk Topcu

Abstract: Automaton-based representations of task knowledge play an important role in control and planning for sequential decision-making problems. However, obtaining the high-level task knowledge required to build such automata is often difficult. Meanwhile, large-scale generative language models (GLMs) can automatically generate relevant task knowledge. However, the textual outputs from GLMs cannot be for… ▽ More Automaton-based representations of task knowledge play an important role in control and planning for sequential decision-making problems. However, obtaining the high-level task knowledge required to build such automata is often difficult. Meanwhile, large-scale generative language models (GLMs) can automatically generate relevant task knowledge. However, the textual outputs from GLMs cannot be formally verified or used for sequential decision-making. We propose a novel algorithm named GLM2FSA, which constructs a finite state automaton (FSA) encoding high-level task knowledge from a brief natural-language description of the task goal. GLM2FSA first sends queries to a GLM to extract task knowledge in textual form, and then it builds an FSA to represent this text-based knowledge. The proposed algorithm thus fills the gap between natural-language task descriptions and automaton-based representations, and the constructed FSA can be formally verified against user-defined specifications. We accordingly propose a method to iteratively refine the queries to the GLM based on the outcomes, e.g., counter-examples, from verification. We demonstrate GLM2FSA's ability to build and refine automaton-based representations of everyday tasks (e.g., crossing a road), and also of tasks that require highly-specialized knowledge (e.g., executing secure multi-party computation). △ Less

Submitted 9 August, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Submitted to JAIR

arXiv:2212.00916 [pdf, other]

Learning Temporal Logic Properties: an Overview of Two Recent Methods

Authors: Jean-Raphaël Gaglione, Rajarshi Roy, Nasim Baharisangari, Daniel Neider, Zhe Xu, Ufuk Topcu

Abstract: Learning linear temporal logic (LTL) formulas from examples labeled as positive or negative has found applications in inferring descriptions of system behavior. We summarize two methods to learn LTL formulas from examples in two different problem settings. The first method assumes noise in the labeling of the examples. For that, they define the problem of inferring an LTL formula that must be cons… ▽ More Learning linear temporal logic (LTL) formulas from examples labeled as positive or negative has found applications in inferring descriptions of system behavior. We summarize two methods to learn LTL formulas from examples in two different problem settings. The first method assumes noise in the labeling of the examples. For that, they define the problem of inferring an LTL formula that must be consistent with most but not all of the examples. The second method considers the other problem of inferring meaningful LTL formulas in the case where only positive examples are given. Hence, the first method addresses the robustness to noise, and the second method addresses the balance between conciseness and specificity (i.e., language minimality) of the inferred formula. The summarized methods propose different algorithms to solve the aforementioned problems, as well as to infer other descriptions of temporal properties, such as signal temporal logic or deterministic finite automata. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: Appears in Proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)"

ACM Class: I.2; F.4.3

arXiv:2212.00893 [pdf, other]

Compositional Learning of Dynamical System Models Using Port-Hamiltonian Neural Networks

Authors: Cyrus Neary, Ufuk Topcu

Abstract: Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical result… ▽ More Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical results that bound the error of the resulting composite models, and v) a method to learn the composition itself, when it is not known a priori. The end result is a modular approach to learning: neural network submodels are trained on trajectory data generated by relatively simple subsystems, and the dynamics of more complex composite systems are then predicted without requiring additional data generated by the composite systems themselves. We achieve this compositionality by representing the system of interest, as well as each of its subsystems, as a port-Hamiltonian neural network (PHNN) -- a class of neural ordinary differential equations that uses the port-Hamiltonian systems formulation as inductive bias. We compose collections of PHNNs by using the system's physics-informed interconnection structure, which may be known a priori, or may itself be learned from data. We demonstrate the novel capabilities of the proposed framework through numerical examples involving interacting spring-mass-damper systems. Models of these systems, which include nonlinear energy dissipation and control inputs, are learned independently. Accurate compositions are learned using an amount of training data that is negligible in comparison with that required to train a new model from scratch. Finally, we observe that the composite PHNNs enjoy properties of port-Hamiltonian systems, such as cyclo-passivity -- a property that is useful for control purposes. △ Less

Submitted 13 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Paper accepted for publication at L4DC 2023

arXiv:2211.11741 [pdf, other]

Sensor Placement for Online Fault Diagnosis

Authors: Dhananjay Raju, Georgios Bakirtzis, Ufuk Topcu

Abstract: Fault diagnosis is the problem of determining a set of faulty system components that explain discrepancies between observed and expected behavior. Due to the intrinsic relation between observations and sensors placed on a system, sensors' fault diagnosis and placement are mutually dependent. Consequently, it is imperative to solve the fault diagnosis and sensor placement problems jointly. One appr… ▽ More Fault diagnosis is the problem of determining a set of faulty system components that explain discrepancies between observed and expected behavior. Due to the intrinsic relation between observations and sensors placed on a system, sensors' fault diagnosis and placement are mutually dependent. Consequently, it is imperative to solve the fault diagnosis and sensor placement problems jointly. One approach to modeling systems for fault diagnosis uses answer set programming (ASP). We present a model-based approach to sensor placement for active diagnosis using ASP, where the secondary objective is to reduce the number of sensors used. The proposed method finds locations for system sensors with around 500 components in a few minutes. To address larger systems, we propose a notion of modularity such that it is possible to treat each module as a separate system and solve the sensor placement problem for each module independently. Additionally, we provide a fixpoint algorithm for determining the modules of a system. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.04617 [pdf, other]

Countering Misinformation on Social Networks Using Graph Alterations

Authors: Yigit E. Bayiz, Ufuk Topcu

Abstract: We restrict the propagation of misinformation in a social-media-like environment while preserving the spread of correct information. We model the environment as a random network of users in which each news item propagates in the network in consecutive cascades. Existing studies suggest that the cascade behaviors of misinformation and correct information are affected differently by user polarizatio… ▽ More We restrict the propagation of misinformation in a social-media-like environment while preserving the spread of correct information. We model the environment as a random network of users in which each news item propagates in the network in consecutive cascades. Existing studies suggest that the cascade behaviors of misinformation and correct information are affected differently by user polarization and reflexivity. We show that this difference can be used to alter network dynamics in a way that selectively hinders the spread of misinformation content. To implement these alterations, we introduce an optimization-based probabilistic dropout method that randomly removes connections between users to achieve minimal propagation of misinformation. We use disciplined convex programming to optimize these removal probabilities over a reduced space of possible network alterations. We test the algorithm's effectiveness using simulated social networks. In our tests, we use both synthetic network structures based on stochastic block models, and natural network structures that are generated using random sampling of a dataset collected from Twitter. The results show that on average the algorithm decreases the cascade size of misinformation content by up to $70\%$ in synthetic network tests and up to $45\%$ in natural network tests while maintaining a branching ratio of at least $1.5$ for correct information. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 10 pages, 6 figures

arXiv:2210.01221 [pdf, other]

Cost Design in Atomic Routing Games

Authors: Yue Yu, Shenghui Chen, David Fridovich-Keil, Ufuk Topcu

Abstract: An atomic routing game is a multiplayer game on a directed graph. Each player in the game chooses a path -- a sequence of links that connect its origin node to its destination node -- with the lowest cost, where the cost of each link is a function of all players' choices. We develop a novel numerical method to design the link cost function in atomic routing games such that the players' choices at… ▽ More An atomic routing game is a multiplayer game on a directed graph. Each player in the game chooses a path -- a sequence of links that connect its origin node to its destination node -- with the lowest cost, where the cost of each link is a function of all players' choices. We develop a novel numerical method to design the link cost function in atomic routing games such that the players' choices at the Nash equilibrium minimize a given smooth performance function. This method first approximates the nonsmooth Nash equilibrium conditions with smooth ones, then iteratively improves the link cost function via implicit differentiation. We demonstrate the application of this method to atomic routing games that model noncooperative agents navigating in grid worlds. △ Less

Submitted 17 May, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.04536 [pdf, ps, other]

doi 10.1109/Allerton49937.2022.9929349

Alternating Direction Method of Multipliers for Decomposable Saddle-Point Problems

Authors: Mustafa O. Karabag, David Fridovich-Keil, Ufuk Topcu

Abstract: Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by e… ▽ More Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by exploiting the decomposable structure. We show the convergence of this algorithm for convex-concave saddle-point problems under a mild assumption. We also provide a sufficient condition for which the assumption holds. We demonstrate the convergence properties of the saddle-point alternating direction method of multipliers with numerical examples on a power allocation problem in communication channels and a network routing problem with adversarial costs. △ Less

Submitted 27 December, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: Accepted to 58th Annual Allerton Conference on Communication, Control, and Computing

arXiv:2209.02650 [pdf, other]

Learning Interpretable Temporal Properties from Positive Examples Only

Authors: Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Zhe Xu, Ufuk Topcu

Abstract: We consider the problem of explaining the temporal behavior of black-box systems using human-interpretable models. To this end, based on recent research trends, we rely on the fundamental yet interpretable models of deterministic finite automata (DFAs) and linear temporal logic (LTL) formulas. In contrast to most existing works for learning DFAs and LTL formulas, we rely on only positive examples.… ▽ More We consider the problem of explaining the temporal behavior of black-box systems using human-interpretable models. To this end, based on recent research trends, we rely on the fundamental yet interpretable models of deterministic finite automata (DFAs) and linear temporal logic (LTL) formulas. In contrast to most existing works for learning DFAs and LTL formulas, we rely on only positive examples. Our motivation is that negative examples are generally difficult to observe, in particular, from black-box systems. To learn meaningful models from positive examples only, we design algorithms that rely on conciseness and language minimality of models as regularizers. To this end, our algorithms adopt two approaches: a symbolic and a counterexample-guided one. While the symbolic approach exploits an efficient encoding of language minimality as a constraint satisfaction problem, the counterexample-guided one relies on generating suitable negative examples to prune the search. Both the approaches provide us with effective algorithms with theoretical guarantees on the learned models. To assess the effectiveness of our algorithms, we evaluate all of them on synthetic data. △ Less

Submitted 2 March, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: Full version of the paper that appeared in AAAI23

ACM Class: F.4.1; I.2.6

arXiv:2208.13687 [pdf, other]

Categorical semantics of compositional reinforcement learning

Authors: Georgios Bakirtzis, Michail Savvas, Ufuk Topcu

Abstract: Reinforcement learning (RL) often requires decomposing a problem into subtasks and composing learned behaviors on these tasks. Compositionality in RL has the potential to create modular subtask units that interface with other system capabilities. However, generating compositional models requires the characterization of minimal assumptions for the robustness of the compositional feature. We develop… ▽ More Reinforcement learning (RL) often requires decomposing a problem into subtasks and composing learned behaviors on these tasks. Compositionality in RL has the potential to create modular subtask units that interface with other system capabilities. However, generating compositional models requires the characterization of minimal assumptions for the robustness of the compositional feature. We develop a framework for a \emph{compositional theory} of RL using a categorical point of view. Given the categorical representation of compositionality, we investigate sufficient conditions under which learning-by-parts results in the same optimal policy as learning on the whole. In particular, our approach introduces a category $\mathsf{MDP}$, whose objects are Markov decision processes (MDPs) acting as models of tasks. We show that $\mathsf{MDP}$ admits natural compositional operations, such as certain fiber products and pushouts. These operations make explicit compositional phenomena in RL and unify existing constructions, such as puncturing hazardous states in composite MDPs and incorporating state-action symmetry. We also model sequential task completion by introducing the language of zig-zag diagrams that is an immediate application of the pushout operation in $\mathsf{MDP}$. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2207.08275 [pdf, other]

Inverse Matrix Games with Unique Quantal Response Equilibrium

Authors: Yue Yu, Jonathan Salfity, David Fridovich-Keil, Ufuk Topcu

Abstract: In an inverse game problem, one needs to infer the cost function of the players in a game such that a desired joint strategy is a Nash equilibrium. We study the inverse game problem for a class of multiplayer matrix games, where the cost perceived by each player is corrupted by random noise. We provide sufficient conditions for the players' quantal response equilibrium -- a generalization of the N… ▽ More In an inverse game problem, one needs to infer the cost function of the players in a game such that a desired joint strategy is a Nash equilibrium. We study the inverse game problem for a class of multiplayer matrix games, where the cost perceived by each player is corrupted by random noise. We provide sufficient conditions for the players' quantal response equilibrium -- a generalization of the Nash equilibrium to games with perception noise -- to be unique. We develop efficient optimization algorithms for inferring the cost matrix based on semidefinite programs and bilevel optimization. We demonstrate the application of these methods in encouraging collision avoidance and fair resource allocation. △ Less

Submitted 13 October, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Showing 1–50 of 169 results for author: Topcu, U