Search | arXiv e-print repository

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

Authors: Sara Pohland, Alvin Tan, Prabal Dutta, Claire Tomlin

Abstract: Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle un… ▽ More Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and wide diversity of such situations present a significant challenge for these data-driven methods. To overcome this challenge, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that some key high-level behaviors of our approach transfer to a physical robot. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2403.05612 [pdf, other]

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Authors: Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine

Abstract: Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that a… ▽ More Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations. We find that, while hallucinations from the reward model can significantly undermine the effectiveness of RL factuality finetuning, strategically controlling how reward models hallucinate can minimize these negative effects. Leveraging our previous observations on controlling hallucinations, we propose an approach for learning more reliable reward models, and show that they improve the efficacy of RL factuality finetuning in long-form biography and book/movie plot generation tasks. △ Less

Submitted 28 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.10182 [pdf, other]

Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations

Authors: Jingqi Li, Anand Siththaranjan, Somayeh Sojoudi, Claire Tomlin, Andrea Bajcsy

Abstract: Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work we study the inverse problem: how agents can demonstrate their intent to others, within the framework of general-sum dynamic games. We first present a model of this intent demonstration… ▽ More Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work we study the inverse problem: how agents can demonstrate their intent to others, within the framework of general-sum dynamic games. We first present a model of this intent demonstration problem and then propose an algorithm that enables an agent to trade off their task performance and intent demonstration to improve the overall system's performance. To scale to continuous states and action spaces as well as to nonlinear dynamics and costs, our algorithm leverages linear-quadratic approximations with an efficient intent teaching guarantee. Our empirical results show that intent demonstration accelerates other agents' learning and enables the demonstrating agent to balance task performance with intent expression. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Under review by L4DC 2024

arXiv:2402.05279 [pdf, other]

Safety Filters for Black-Box Dynamical Systems by Learning Discriminating Hyperplanes

Authors: Will Lavanakul, Jason J. Choi, Koushil Sreenath, Claire J. Tomlin

Abstract: Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at… ▽ More Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at each state is what matters. By focusing on this constraint, we can eliminate dependence on any specific certificate function-based design. To achieve this, we define a discriminating hyperplane that shapes the half-space constraint on control input at each state, serving as a sufficient condition for safety. This concept not only generalizes over traditional safety methods but also simplifies safety filter design by eliminating dependence on specific certificate functions. We present two strategies to learn the discriminating hyperplane: (a) a supervised learning approach, using pre-verified control invariant sets for labeling, and (b) a reinforcement learning (RL) approach, which does not require such labels. The main advantage of our method, unlike conventional safe RL approaches, is the separation of performance and safety. This offers a reusable safety filter for learning new tasks, avoiding the need to retrain from scratch. As such, we believe that the new notion of the discriminating hyperplane offers a more generalizable direction towards designing safety filters, encompassing and extending existing certificate-function-based or safe RL methodologies. △ Less

Submitted 21 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: * Indicate co-first authors. This is an extended version of the paper presented at L4DC 2024

arXiv:2401.15745 [pdf, other]

The computation of approximate feedback Stackelberg equilibria in multi-player nonlinear constrained dynamic games

Authors: Jingqi Li, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil

Abstract: Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing local feedback Stackelberg policies in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are… ▽ More Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing local feedback Stackelberg policies in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are primarily designed for unconstrained problems, our approach involves reformulating a feedback Stackelberg dynamic game into a sequence of nested optimization problems, enabling the derivation of Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order sufficient condition for local feedback Stackelberg policies. We propose a Newton-style primal-dual interior point method for solving constrained linear quadratic (LQ) feedback Stackelberg games, offering provable convergence guarantees. Our method is further extended to compute local feedback Stackelberg policies for more general nonlinear games by iteratively approximating them using LQ games, ensuring that their KKT conditions are locally aligned with those of the original nonlinear games. We prove the exponential convergence of our algorithm in constrained nonlinear games. In a feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs and constraints, our experimental results reveal the algorithm's ability to handle infeasible initial conditions and achieve exponential convergence towards an approximate local feedback Stackelberg equilibrium. △ Less

Submitted 15 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: This manuscript is currently under review by SIAM Journal on Optimization

arXiv:2401.10313 [pdf, other]

Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving Security

Authors: Marsalis Gibson, David Babazadeh, Claire Tomlin, Shankar Sastry

Abstract: Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The a… ▽ More Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The analysis reveals that between all inputs, almost all of the perturbation sensitivities for both models lie only within the most recent position and velocity states. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models, revealing that these trajectory predictors are, in fact, susceptible to image-based attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how these attacks can cause a vehicle to come to a sudden stop from moderate driving speeds. △ Less

Submitted 20 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures, 1 tables

arXiv:2311.13824 [pdf, other]

Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant rel… ▽ More As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant reliance on high-quality training data. In response to these challenges, this study presents a scalable data-driven controller that efficiently identifies and infers from the most informative data points for implementing data-driven safety filters. Our approach is grounded in the integration of a model-based certificate function-based method and Gaussian Process (GP) regression, reinforced by a novel online data selection algorithm that reduces time complexity from quadratic to linear relative to dataset size. Empirical evidence, gathered from successful real-world cart-pole swing-up experiments and simulated locomotion of a five-link bipedal robot, demonstrates the efficacy of our approach. Our findings reveal that our efficient online data selection algorithm, which strategically selects key data points, enhances the practicality and efficiency of data-driven certifying filters in complex robotic systems, significantly mitigating scalability concerns inherent in nonparametric learning-based control methods. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: The first three authors contributed equally to the work. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.17180 [pdf, other]

A Forward Reachability Perspective on Robust Control Invariance and Discount Factors in Reachability Analysis

Authors: Jason J. Choi, Donggun Lee, Boyang Li, Jonathan P. How, Koushil Sreenath, Sylvia L. Herbert, Claire J. Tomlin

Abstract: Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs) by examining the forward reachability problem associated with control invariant sets. We present… ▽ More Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs) by examining the forward reachability problem associated with control invariant sets. We present the notion of an "inevitable Forward Reachable Tube" (FRT) as a tool for analyzing control invariant sets. Our findings show that the inevitable FRT of a robust control invariant set with a differentiable boundary is the set itself. We highlight the role of the differentiability of the boundary in shaping the FRTs of the sets through numerical examples. We also formulate a zero-sum differential game between the control and disturbance, where the inevitable FRT is characterized by the zero-superlevel set of the value function. By incorporating a discount factor in the cost function of the game, the barrier constraint of the CBF naturally arises as the constraint that is imposed on the optimal control policy. As a result, the value function of our FRT formulation serves as a CBF-like function, which has not been previously realized in reachability studies. Conversely, any valid CBF is also a forward reachability value function inside the control invariant set, thereby revealing the inverse optimality of the CBF. As such, our work establishes a strong link between reachability, control invariance, and CBFs, filling a gap that prior formulations based on backward reachability were unable to bridge. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: The first two authors contributed equally to this work

arXiv:2310.00873 [pdf, other]

Deep Neural Networks Tend To Extrapolate Predictably

Authors: Katie Kang, Amrith Setlur, Claire Tomlin, Sergey Levine

Abstract: Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly O… ▽ More Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). Furthermore, we present an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs. △ Less

Submitted 15 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.06655 [pdf, other]

Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

Authors: Alonso Marco, Elias Morley, Claire J. Tomlin

Abstract: In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly di… ▽ More In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains. △ Less

Submitted 15 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 7 pages, 4 figures

arXiv:2307.07147 [pdf, other]

Linking vision and motion for self-supervised object-centric perception

Authors: Kaylene C. Stocking, Zak Murez, Vijay Badrinarayanan, Jamie Shotton, Alex Kendall, Claire Tomlin, Christopher P. Burgess

Abstract: Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features. Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization. In this work we adapt a self-supervised object-centric vision model to perfor… ▽ More Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features. Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization. In this work we adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs. We demonstrate that our method obtains promising results on the Waymo Open perception dataset. While object mask quality lags behind supervised methods or alternatives that use more privileged information, we find that our model is capable of learning a representation that fuses multiple camera viewpoints over time and successfully tracks many vehicles and pedestrians in the dataset. Code for our model is available at https://github.com/wayveai/SOCS. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Presented at the CVPR 2023 Vision-Centric Autonomous Driving workshop

arXiv:2307.01927 [pdf, other]

Safe Connectivity Maintenance of Underactuated Multi-Agent Networks in Dynamic Oceanic Environments

Authors: Nicolas Hoischen, Marius Wiggert, Claire J. Tomlin

Abstract: Autonomous multi-agent systems are increasingly being deployed in environments where winds and ocean currents have a significant influence. Recent work has developed control policies for single agents that leverage flows to achieve their objectives in dynamic environments. However, in multi-agent systems, these flows can cause agents to collide or drift apart and lose direct inter-agent communicat… ▽ More Autonomous multi-agent systems are increasingly being deployed in environments where winds and ocean currents have a significant influence. Recent work has developed control policies for single agents that leverage flows to achieve their objectives in dynamic environments. However, in multi-agent systems, these flows can cause agents to collide or drift apart and lose direct inter-agent communications, especially when agents have low propulsion capabilities. To address these challenges, we propose a hierarchical multi-agent control approach that allows arbitrary single-agent performance policies that are unaware of other agents to be used in multi-agent systems while ensuring safe operation. We first develop a safety controller using potential functions, solely dedicated to avoiding collisions and maintaining inter-agent communication. Next, we design a low-interference safe interaction (LISIC) policy that trades off the performance policy and the safety control to ensure safe and performant operation. Specifically, when the agents are at an appropriate distance, LISIC prioritizes the performance policy while smoothly increasing the safety controller when necessary. We prove that under mild assumptions on the flows experienced by the agents, our approach can guarantee safety. Additionally, we demonstrate the effectiveness of our method in realistic settings through an extensive empirical analysis with simulations of fleets of underactuated autonomous surface vehicles operating in dynamic ocean currents where these assumptions do not always hold. △ Less

Submitted 20 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 8 pages, Published at European Control Conference 2024 (ECC 2024) Nicolas Hoischen and Marius Wiggert contributed equally to this work

arXiv:2307.01917 [pdf, other]

Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers

Authors: Andreas Doering, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin

Abstract: Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they mig… ▽ More Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 6 pages, 3 figures, submitted to 2023 IEEE 62th Annual Conference on Decision and Control (CDC) Andreas Doering and Marius Wiggert contributed equally to this work

arXiv:2307.01916 [pdf, other]

Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

Authors: Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin

Abstract: Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth r… ▽ More Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions. △ Less

Submitted 29 August, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 8 pages, submitted to 2023 IEEE 62th Annual Conference on Decision and Control (CDC) Matthias Killer and Marius Wiggert contributed equally to this work

arXiv:2304.01945 [pdf, other]

Scenario-Game ADMM: A Parallelized Scenario-Based Solver for Stochastic Noncooperative Games

Authors: Jingqi Li, Chih-Yuan Chiu, Lasse Peters, Fernando Palafox, Mustafa Karabag, Javier Alonso-Mora, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil

Abstract: Decision-making in multi-player games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sam… ▽ More Decision-making in multi-player games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sample average approximation (SAA) method and enjoys a feasibility guarantee derived from the scenario optimization literature. We characterize the sample complexity of this new game-theoretic approximation scheme, and observe that high accuracy usually requires a large number of samples, which results in a large number of sampled constraints. To accommodate this, we decompose the approximated game into a set of smaller games with few constraints for each sampled scenario, and propose a decentralized, consensus-based ADMM algorithm to efficiently compute a generalized Nash equilibrium (GNE) of the approximated game. We prove the convergence of our algorithm to a GNE and empirically demonstrate superior performance relative to a recent baseline algorithm based on ADMM and interior point method. △ Less

Submitted 13 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.00432 [pdf, other]

Multi-Agent Reachability Calibration with Conformal Prediction

Authors: Anish Muthali, Haotian Shen, Sampada Deglurkar, Michael H. Lim, Rebecca Roelofs, Aleksandra Faust, Claire Tomlin

Abstract: We investigate methods to provide safety assurances for autonomous agents that incorporate predictions of other, uncontrolled agents' behavior into their own trajectory planning. Given a learning-based forecasting model that predicts agents' trajectories, we introduce a method for providing probabilistic assurances on the model's prediction error with calibrated confidence intervals. Through quant… ▽ More We investigate methods to provide safety assurances for autonomous agents that incorporate predictions of other, uncontrolled agents' behavior into their own trajectory planning. Given a learning-based forecasting model that predicts agents' trajectories, we introduce a method for providing probabilistic assurances on the model's prediction error with calibrated confidence intervals. Through quantile regression, conformal prediction, and reachability analysis, our method generates probabilistically safe and dynamically feasible prediction sets. We showcase their utility in certifying the safety of planning algorithms, both in simulations using actual autonomous driving data and in an experiment with Boeing vehicles. △ Less

Submitted 13 December, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

arXiv:2302.01999 [pdf, other]

Online and Offline Learning of Player Objectives from Partial Observations in Dynamic Games

Authors: Lasse Peters, Vicenç Rubies-Royo, Claire J. Tomlin, Laura Ferranti, Javier Alonso-Mora, Cyrill Stachniss, David Fridovich-Keil

Abstract: Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address… ▽ More Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches. △ Less

Submitted 14 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2106.03611

arXiv:2301.01398 [pdf, other]

Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories

Authors: Jingqi Li, Chih-Yuan Chiu, Lasse Peters, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil

Abstract: In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that th… ▽ More In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Accepted by AAMAS 2023. This is a preprint version

arXiv:2212.00186 [pdf, other]

Multi-Task Imitation Learning for Linear Dynamical Systems

Authors: Thomas T. Zhang, Katie Kang, Bruce D. Lee, Claire Tomlin, Sergey Levine, Stephen Tu, Nikolai Matni

Abstract: We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that… ▽ More We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation. △ Less

Submitted 9 November, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

Comments: Appeared in L4DC 2023. V3: corrected typo in assumptions

arXiv:2210.05015 [pdf, other]

doi 10.1613/jair.1.14525

Optimality Guarantees for Particle Belief Approximation of POMDPs

Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w… ▽ More Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers. △ Less

Submitted 19 October, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Journal ref: Journal of Artificial Intelligence Research, 77, 1591-1636 (2023)

arXiv:2208.10733 [pdf, other]

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Authors: Fernando Castañeda, Jason J. Choi, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as math… ▽ More Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as mathematical tools that provide a formal safety-preserving control synthesis procedure for systems with known dynamics. In this paper, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to bridge the gap between an approximate mathematical model and the real system. Compared to previous approaches, we study the feasibility of the resulting robust safety-critical controller. This feasibility analysis results in a set of richness conditions that the available information about the system should satisfy to guarantee that a safe control action can be found at all times. We then use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety-critical controller. Our proposed methodology endows the system with the ability to reason at all times about whether the current information at its disposal is enough to ensure safety or if new measurements are required. This, in turn, allows us to provide formal results of forward invariance of a safe set with high probability, even in a priori unexplored regions. Finally, we validate the proposed framework in numerical simulations of an adaptive cruise control system and a kinematic vehicle. △ Less

Submitted 26 September, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Journal article. Includes the results of the 2021 CDC paper titled "Pointwise feasibility of gaussian process-based safety-critical control under model uncertainty" and proposes a recursively feasible safe online learning algorithm as new contribution

arXiv:2206.10524 [pdf, other]

Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control

Authors: Katie Kang, Paula Gradu, Jason Choi, Michael Janner, Claire Tomlin, Sergey Levine

Abstract: Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that it was trained on. In co… ▽ More Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that it was trained on. In control theory, Lyapunov stability and control-invariant sets allow us to make guarantees about controllers that stabilize the system around specific states, while in machine learning, density models allow us to estimate the training data distribution. Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this work, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2204.07629 [pdf]

Navigation between initial and desired community states using shortcuts

Authors: Benjamin W. Blonder, Michael H. Lim, Zachary Sunberg, Claire Tomlin

Abstract: Ecological management problems often involve navigating from an initial to a desired community state. We ask whether navigation is possible without brute-force additions and deletions of species, using actions of varying costs: adding/deleting a small number of individuals of a species, changing the environment, and waiting. Navigation can yield direct paths (single sequence of actions) or shortcu… ▽ More Ecological management problems often involve navigating from an initial to a desired community state. We ask whether navigation is possible without brute-force additions and deletions of species, using actions of varying costs: adding/deleting a small number of individuals of a species, changing the environment, and waiting. Navigation can yield direct paths (single sequence of actions) or shortcut paths (multiple sequences of actions with lower cost than a direct path). We ask (1) when is non-brute-force navigation possible?; (2) do shortcuts exist and what are their properties?; and (3) what heuristics predict shortcut existence? Using several empirical datasets, we show that (1) non-brute-force navigation is only possible between some state pairs, (2) shortcuts exist between many state pairs; and (3) changes in abundance and richness are the strongest predictors of shortcut existence, independent of dataset and algorithm choices. State diagrams thus unveil hidden strategies for efficiently shifting between states. △ Less

Submitted 2 December, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2204.07539 [pdf, other]

Stability and Robustness of a Hybrid Control Law for the Half-bridge Inverter

Authors: Gabriel E. Colón-Reyes, Kaylene C. Stocking, Duncan S. Callaway, Claire J. Tomlin

Abstract: Hybrid systems combine both discrete and continuous state dynamics. Power electronic inverters are inherently hybrid systems: they are controlled via discrete-valued switching inputs which determine the evolution of the continuous-valued current and voltage state dynamics. Hybrid systems analysis could prove increasingly useful as large numbers of renewable energy sources are incorporated to the… ▽ More Hybrid systems combine both discrete and continuous state dynamics. Power electronic inverters are inherently hybrid systems: they are controlled via discrete-valued switching inputs which determine the evolution of the continuous-valued current and voltage state dynamics. Hybrid systems analysis could prove increasingly useful as large numbers of renewable energy sources are incorporated to the grid with inverters as their interface. In this work, we explore a hybrid systems approach for the stability analysis of power and power electronic systems. We provide an analytical proof showing that the use of a hybrid model for the half-bridge inverter allows the derivation of a control law that drives the system states to desired sinusoidal voltage and current references. We derive an analytical expression for a global Lyapunov function for the dynamical system in terms of the system parameters, which proves uniform, global, and asymptotic stability of the origin in error coordinates. Moreover, we demonstrate robustness to parameter changes through this Lyapunov function. We validate these results via simulation. Finally, we show empirically the incorporation of droop control with this hybrid systems approach. In the low-inertia grid community, the juxtaposition of droop control with the hybrid switching control can be considered a grid-forming control strategy using a switched inverter model. △ Less

Submitted 18 May, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2204.01986 [pdf, other]

On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control

Authors: Tyler Westenbroek, Anand Siththaranjan, Mohsin Sarwari, Claire J. Tomlin, Shankar S. Sastry

Abstract: Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost functio… ▽ More Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost function interacts with the underlying structure of the control system and impacts the amount of computation required to obtain a stabilizing controller. We treat the cost design problem as a two-step process where the designer specifies outputs for the system that are to be penalized and then modulates the relative weighting of the inputs and the outputs in the cost. To characterize the computational burden associated to obtaining a stabilizing controller with a particular cost, we bound the prediction horizon required by receding horizon methods and the number of iterations required by dynamic programming methods to meet this requirement. Our theoretical results highlight a qualitative separation between what is possible, from a design perspective, when the chosen outputs induce either minimum-phase or non-minimum-phase behavior. Simulation studies indicate that this separation also holds for modern reinforcement learning methods. △ Less

Submitted 17 November, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2203.12303 [pdf, other]

Koopman-Based Neural Lyapunov Functions for General Attractors

Authors: Shankar A. Deka, Alonso M. Valle, Claire J. Tomlin

Abstract: Koopman spectral theory has grown in the past decade as a powerful tool for dynamical systems analysis and control. In this paper, we show how recent data-driven techniques for estimating Koopman-Invariant subspaces with neural networks can be leveraged to extract Lyapunov certificates for the underlying system. In our work, we specifically focus on systems with a limit-cycle, beyond just an isola… ▽ More Koopman spectral theory has grown in the past decade as a powerful tool for dynamical systems analysis and control. In this paper, we show how recent data-driven techniques for estimating Koopman-Invariant subspaces with neural networks can be leveraged to extract Lyapunov certificates for the underlying system. In our work, we specifically focus on systems with a limit-cycle, beyond just an isolated equilibrium point, and use Koopman eigenfunctions to efficiently parameterize candidate Lyapunov functions to construct forward-invariant sets under some (unknown) attractor dynamics. Additionally, when the dynamics are polynomial and when neural networks are replaced by polynomials as a choice of function approximators in our approach, one can further leverage Sum-of-Squares programs and/or nonlinear programs to yield provably correct Lyapunov certificates. In such a polynomial case, our Koopman-based approach for constructing Lyapunov functions uses significantly fewer decision variables compared to directly formulating and solving a Sum-of-Squares optimization problem. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Submitted to CDC 2022

arXiv:2203.10142 [pdf, other]

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

Authors: Jingqi Li, Donggun Lee, Somayeh Sojoudi, Claire J. Tomlin

Abstract: In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a con… ▽ More In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2201.08538 [pdf, other]

Computation of Regions of Attraction for Hybrid Limit Cycles Using Reachability: An Application to Walking Robots

Authors: Jason J. Choi, Ayush Agrawal, Koushil Sreenath, Claire J. Tomlin, Somil Bansal

Abstract: Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) r… ▽ More Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) reachability problem. This enables us to leverage HJ reachability tools that are compatible with general nonlinear system dynamics, and can formally deal with state and input constraints as well as bounded disturbances. Our main contribution is the generalization of HJ reachability framework to account for the discontinuous state changes originating from state resets, which has remained a challenge until now. We apply our approach for computing region-of-attractions for several underactuated walking robots and demonstrate that the proposed approach can (a) recover a bigger region-of-attraction than state-of-the-art approaches, (b) handle state resets, nonlinear dynamics, external disturbances, and input constraints, and (c) also provides a stabilizing controller for the system that can leverage the state resets for enhancing system stability. △ Less

Submitted 8 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE RA-L & ICRA, 2022

arXiv:2201.07082 [pdf, other]

Inducing Structure in Reward Learning by Learning Features

Authors: Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

Abstract: Reward learning enables robots to learn adaptable behaviors from human input. Traditional methods model the reward as a linear function of hand-crafted features, but that requires specifying all the relevant features a priori, which is impossible for real-world tasks. To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but th… ▽ More Reward learning enables robots to learn adaptable behaviors from human input. Traditional methods model the reward as a linear function of hand-crafted features, but that requires specifying all the relevant features a priori, which is impossible for real-world tasks. To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but this is challenging because the robot has to implicitly learn the features that are important and how to combine them, simultaneously. Instead, we propose a divide and conquer approach: focus human input specifically on learning the features separately, and only then learn how to combine them into a reward. We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space. The robot can then learn how to combine them into a reward using demonstrations, corrections, or other reward learning frameworks. We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known. By first focusing human input specifically on the feature(s), our method decreases sample complexity and improves generalization of the learned reward over a deepIRL baseline. We show this in experiments with a physical 7DOF robot manipulator, as well as in a user study conducted in a simulated environment. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 24 pages, 22 figures, accepted to the International Journal of Robotics Research. arXiv admin note: text overlap with arXiv:2006.13208

arXiv:2112.12288 [pdf, other]

doi 10.15607/RSS.2021.XVII.077

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Authors: Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J. Tomlin, Jaime F. Fisac

Abstract: Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems… ▽ More Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted in Robotics: Science and Systems (RSS), 2021

arXiv:2112.09456 [pdf, other]

Compositional Learning-based Planning for Vision POMDPs

Authors: Sampada Deglurkar, Michael H. Lim, Johnathan Tucker, Zachary N. Sunberg, Aleksandra Faust, Claire J. Tomlin

Abstract: The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi… ▽ More The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time. △ Less

Submitted 2 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2111.13786 [pdf, other]

Learning from learning machines: a new generation of AI technology to meet the needs of science

Authors: Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson , et al. (11 additional authors not shown)

Abstract: We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and… ▽ More We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2109.10521 [pdf, other]

Incorporating Data Uncertainty in Object Tracking Algorithms

Authors: Anish Muthali, Forrest Laine, Claire Tomlin

Abstract: Methodologies for incorporating the uncertainties characteristic of data-driven object detectors into object tracking algorithms are explored. Object tracking methods rely on measurement error models, typically in the form of measurement noise, false positive rates, and missed detection rates. Each of these quantities, in general, can be dependent on object or measurement location. However, for de… ▽ More Methodologies for incorporating the uncertainties characteristic of data-driven object detectors into object tracking algorithms are explored. Object tracking methods rely on measurement error models, typically in the form of measurement noise, false positive rates, and missed detection rates. Each of these quantities, in general, can be dependent on object or measurement location. However, for detections generated from neural-network processed camera inputs, these measurement error statistics are not sufficient to represent the primary source of errors, namely a dissimilarity between run-time sensor input and the training data upon which the detector was trained. To this end, we investigate incorporating data uncertainty into object tracking methods such as to improve the ability to track objects, and particularly those which out-of-distribution w.r.t. training data. The proposed methodologies are validated on an object tracking benchmark as well on experiments with a real autonomous aircraft. △ Less

Submitted 2 November, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: For associated video, see https://youtu.be/S21EvaAynRg

arXiv:2109.10450 [pdf, other]

Towards cyber-physical systems robust to communication delays: A differential game approach

Authors: Shankar A. Deka, Donggun Lee, Claire J. Tomlin

Abstract: Collaboration between interconnected cyber-physical systems is becoming increasingly pervasive. Time-delays in communication channels between such systems are known to induce catastrophic failure modes, like high frequency oscillations in robotic manipulators in bilateral teleoperation or string instability in platoons of autonomous vehicles. This paper considers nonlinear time-delay systems repre… ▽ More Collaboration between interconnected cyber-physical systems is becoming increasingly pervasive. Time-delays in communication channels between such systems are known to induce catastrophic failure modes, like high frequency oscillations in robotic manipulators in bilateral teleoperation or string instability in platoons of autonomous vehicles. This paper considers nonlinear time-delay systems representing coupled robotic agents, and proposes controllers that are robust to time-varying communication delays. We introduce approximations that allow the delays to be considered as implicit control inputs themselves, and formulate the problem as a zero-sum differential game between the stabilizing controllers and the delays acting adversarially. The ensuing optimal control law is finally compared to known results from Lyapunov-Krasovskii based approaches via numerical experiments. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: 7 pages, 5 figures, Submitted to IEEE Control Systems Letters

MSC Class: 34K35; 49L12; 93D21

arXiv:2109.07578 [pdf, other]

Multi-Task Learning with Sequence-Conditioned Transporter Networks

Authors: Michael H. Lim, Andy Zeng, Brian Ichter, Maryam Bandari, Erwin Coumans, Claire Tomlin, Stefan Schaal, Aleksandra Faust

Abstract: Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of be… ▽ More Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2109.04874 [pdf, other]

Discretizing Dynamics for Maximum Likelihood Constraint Inference

Authors: Kaylene C. Stocking, David L. McPherson, Robert P. Matthew, Claire J. Tomlin

Abstract: Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We pr… ▽ More Maximum likelihood constraint inference is a powerful technique for identifying unmodeled constraints that affect the behavior of a demonstrator acting under a known objective function. However, it was originally formulated only for discrete state-action spaces. Continuous dynamics are more useful for modeling many real-world systems of interest, including the movements of humans and robots. We present a method to generate a tabular state-action space that approximates continuous dynamics and can be used for constraint inference on demonstrations that obey the true system dynamics. We then demonstrate accurate constraint inference on nonlinear pendulum systems with 2- and 4-dimensional state spaces, and show that performance is robust to a range of hyperparameters. The demonstrations are not required to be fully optimal with respect to the objective, and the most likely constraints can be identified even when demonstrations cover only a small portion of the state space. For these reasons, the proposed approach may be especially useful for inferring constraints on human demonstrators, which has important applications in human-robot interaction and biomechanical medicine. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: 10 pages, 7 figures

arXiv:2109.00140 [pdf, other]

Lax Formulae for Efficiently Solving Two Classes of State-Constrained Optimal Control Problems

Authors: Donggun Lee, Claire J. Tomlin

Abstract: This paper presents Lax formulae for solving the following optimal control problems: minimize the maximum (or the minimum) cost over a time horizon, while satisfying a state constraint. We present a viscosity theory, and by applying the theory to the Hamilton-Jacobi (HJ) equations, these Lax formulae are derived. A numerical algorithm for the Lax formulae is presented: under certain conditions, th… ▽ More This paper presents Lax formulae for solving the following optimal control problems: minimize the maximum (or the minimum) cost over a time horizon, while satisfying a state constraint. We present a viscosity theory, and by applying the theory to the Hamilton-Jacobi (HJ) equations, these Lax formulae are derived. A numerical algorithm for the Lax formulae is presented: under certain conditions, this algorithm's computational complexity is polynomial in the dimension of the state. For each class of optimal control problem, an example demonstrates the use and performance of the Lax formulae. △ Less

Submitted 31 August, 2021; originally announced September 2021.

arXiv:2106.15006 [pdf, other]

Hamilton-Jacobi Equations for Two Classes of State-Constrained Zero-Sum Games

Authors: Donggun Lee, Claire J. Tomlin

Abstract: This paper presents Hamilton-Jacobi (HJ) formulations for two classes of two-player zero-sum games: one with a maximum cost value over time, and one with a minimum cost value over time. In the zero-sum game setting, player A minimizes the given cost while satisfying state constraints, and player B wants to prevent player A's success. For each class of problems, this paper presents two HJ equations… ▽ More This paper presents Hamilton-Jacobi (HJ) formulations for two classes of two-player zero-sum games: one with a maximum cost value over time, and one with a minimum cost value over time. In the zero-sum game setting, player A minimizes the given cost while satisfying state constraints, and player B wants to prevent player A's success. For each class of problems, this paper presents two HJ equations: one for time-varying dynamics, cost, and state constraint; the other for time-invariant dynamics, cost, and state constraint. Utilizing the HJ equations, the optimal control for each player is analyzed, and a numerical algorithm is presented to compute the solution to the HJ equations. A two-dimensional water system is introduced as an example to demonstrate the proposed HJ framework. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.13440 [pdf, other]

A Computationally Efficient Hamilton-Jacobi-based Formula for State-Constrained Optimal Control Problems

Authors: Donggun Lee, Claire J. Tomlin

Abstract: This paper investigates a Hamilton-Jacobi (HJ) analysis to solve finite-horizon optimal control problems for high-dimensional systems. Although grid-based methods, such as the level-set method [1], numerically solve a general class of HJ partial differential equations, the computational complexity is exponential in the dimension of the continuous state. To manage this computational complexity, met… ▽ More This paper investigates a Hamilton-Jacobi (HJ) analysis to solve finite-horizon optimal control problems for high-dimensional systems. Although grid-based methods, such as the level-set method [1], numerically solve a general class of HJ partial differential equations, the computational complexity is exponential in the dimension of the continuous state. To manage this computational complexity, methods based on Lax-Hopf theory have been developed for the state-unconstrained optimal control problem under certain assumptions, such as affine dynamics and state-independent stage cost. Based on the Lax formula [2], this paper proposes an HJ formula for the state-constrained optimal control problem for nonlinear systems. We call this formula \textit{the generalized Lax formula} for the optimal control problem. The HJ formula provides both the optimal cost and an optimal control signal. We also provide an efficient computational method for a class of problems for which the dynamics is affine in the state, and for which the stage and terminal cost, as well as the state constraints, are convex in the state. This class of problems does not require affine dynamics and convex stage cost in the control. This paper also provides three practical examples. △ Less

Submitted 25 June, 2021; originally announced June 2021.

arXiv:2106.07108 [pdf, other]

Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty

Authors: Fernando Castañeda, Jason J. Choi, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when t… ▽ More Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when this model is inaccurate the guarantees of safety and stability can be easily lost. In this paper, we present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs. The considered model uncertainty is affected by both state and control input. We derive probabilistic bounds on the effects that such model uncertainty has on the dynamics of the CBF and CLF. We then use these bounds to build safety and stability chance constraints that can be incorporated in a min-norm convex optimization-based controller, called GP-CBF-CLF-SOCP. As the main theoretical result of the paper, we present necessary and sufficient conditions for pointwise feasibility of the proposed optimization problem. We believe that these conditions could serve as a starting point towards understanding what are the minimal requirements on the distribution of data collected from the real system in order to guarantee safety. Finally, we validate the proposed framework with numerical simulations of an adaptive cruise controller for an automotive system. △ Less

Submitted 1 October, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: The first two authors contributed equally. Accepted for publication in IEEE 60th Conference on Decision and Control (CDC 2021)

arXiv:2106.03611 [pdf, other]

Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations

Authors: Lasse Peters, David Fridovich-Keil, Vicenç Rubies-Royo, Claire J. Tomlin, Cyrill Stachniss

Abstract: Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desi… ▽ More Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper, we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence, our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead, it robustly recovers this information from noisy, partial state observations. As a byproduct of estimating player objectives, our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus, it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore, using the estimated objectives, our method makes accurate predictions of each player's trajectory. △ Less

Submitted 7 August, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Submitted to RSS2021

arXiv:2104.02808 [pdf, other]

Robust Control Barrier-Value Functions for Safety-Critical Control

Authors: Jason J. Choi, Donggun Lee, Koushil Sreenath, Claire J. Tomlin, Sylvia L. Herbert

Abstract: This paper works towards unifying two popular approaches in the safety control community: Hamilton-Jacobi (HJ) reachability and Control Barrier Functions (CBFs). HJ Reachability has methods for direct construction of value functions that provide safety guarantees and safe controllers, however the online implementation can be overly conservative and/or rely on chattering bang-bang control. The CBF… ▽ More This paper works towards unifying two popular approaches in the safety control community: Hamilton-Jacobi (HJ) reachability and Control Barrier Functions (CBFs). HJ Reachability has methods for direct construction of value functions that provide safety guarantees and safe controllers, however the online implementation can be overly conservative and/or rely on chattering bang-bang control. The CBF community has methods for safe-guarding controllers in the form of point-wise optimization using quadratic programs (CBF-QP), where the CBF-based safety certificate is used as a constraint. However, finding a valid CBF for a general dynamical system is challenging. This paper unifies these two methods by introducing a new reachability formulation inspired by the structure of CBFs to construct a Control Barrier-Value Function (CBVF). We verify that CBVF is a viscosity solution to a novel Hamilton-Jacobi-Isaacs Variational Inequality and preserves the same safety guarantee as the original reachability formulation. Finally, inspired by the CBF-QP, we propose a QP-based online control synthesis for systems affine in control and disturbance, whose solution is always the CBVF's optimal control signal robust to bounded disturbance. We demonstrate the benefit of using the CBVFs for double-integrator and Dubins car systems by comparing it to previous methods. △ Less

Submitted 25 October, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: IEEE CDC 2021

arXiv:2103.05746 [pdf, other]

Analyzing Human Models that Adapt Online

Authors: Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan

Abstract: Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot c… ▽ More Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot can learn the human's preferences in a finite number of observations? To answer such analysis questions, our key idea is to model the robot's learning algorithm as a dynamical system where the state is the current model parameter estimate and the control is the human data the robot observes. This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them. We demonstrate the utility of our analysis tool in four human-robot domains, including autonomous driving and indoor navigation. △ Less

Submitted 30 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: ICRA 2021

arXiv:2102.07039 [pdf, other]

doi 10.1109/TAC.2021.3059838

FaSTrack: a Modular Framework for Real-Time Motion Planning and Guaranteed Safe Tracking

Authors: Mo Chen, Sylvia L. Herbert, Haimin Hu, Ye Pu, Jaime F. Fisac, Somil Bansal, SooJean Han, Claire J. Tomlin

Abstract: Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replan… ▽ More Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replanning and guaranteed safety. In this framework, real-time computation is achieved by allowing any trajectory planner to use a simplified \textit{planning model} of the system. The plan is tracked by the system, represented by a more realistic, higher-dimensional \textit{tracking model}. We precompute the tracking error bound (TEB) due to mismatch between the two models and due to external disturbances. We also obtain the corresponding tracking controller used to stay within the TEB. The precomputation does not require prior knowledge of the environment. We demonstrate FaSTrack using Hamilton-Jacobi reachability for precomputation and three different real-time trajectory planners with three different tracking-planning model pairs. △ Less

Submitted 13 March, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: Published in the IEEE Transactions on Automatic Control

arXiv:2101.12086 [pdf, other]

Risk-sensitive safety analysis using Conditional Value-at-Risk

Authors: Margaret P. Chapman, Riccardo Bonalli, Kevin M. Smith, Insoon Yang, Marco Pavone, Claire J. Tomlin

Abstract: This paper develops a safety analysis method for stochastic systems that is sensitive to the possibility and severity of rare harmful outcomes. We define risk-sensitive safe sets as sub-level sets of the solution to a non-standard optimal control problem, where a random maximum cost is assessed via Conditional Value-at-Risk (CVaR). The objective function represents the maximum extent of constraint… ▽ More This paper develops a safety analysis method for stochastic systems that is sensitive to the possibility and severity of rare harmful outcomes. We define risk-sensitive safe sets as sub-level sets of the solution to a non-standard optimal control problem, where a random maximum cost is assessed via Conditional Value-at-Risk (CVaR). The objective function represents the maximum extent of constraint violation of the state trajectory, averaged over a given percentage of worst cases. This problem is well-motivated but difficult to solve tractably because the temporal decomposition for CVaR is history-dependent. Our primary theoretical contribution is to derive computationally tractable under-approximations to risk-sensitive safe sets. Our method provides a novel, theoretically guaranteed, parameter-dependent upper bound to the CVaR of a maximum cost without the need to augment the state space. For a fixed parameter value, the solution to only one Markov decision process problem is required to obtain the under-approximations for any family of risk-sensitivity levels. In addition, we propose a second definition for risk-sensitive safe sets and provide a tractable method for their estimation without using a parameter-dependent upper bound. The second definition is expressed in terms of a new coherent risk functional, which is inspired by CVaR. We demonstrate our primary theoretical contribution via numerical examples. △ Less

Submitted 25 June, 2022; v1 submitted 28 January, 2021; originally announced January 2021.

Journal ref: IEEE Transactions on Automatic Control, 2022

arXiv:2101.05916 [pdf, other]

Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability

Authors: Sylvia Herbert, Jason J. Choi, Suvansh Sanjeev, Marsalis Gibson, Koushil Sreenath, Claire J. Tomlin

Abstract: Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new k… ▽ More Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new knowledge about these uncertainties and should therefore update its safety analysis accordingly. However, work to learn and update safety analysis is limited to small systems of about two dimensions due to the computational complexity of the analysis. In this paper we synthesize several techniques to speed up computation: decomposition, warm-starting, and adaptive grids. Using this new framework we can update safe sets by one or more orders of magnitude faster than prior work, making this technique practical for many realistic systems. We demonstrate our results on simulated 2D and 10D near-hover quadcopters operating in a windy environment. △ Less

Submitted 2 April, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: The first two authors are co-first authors. ICRA 2021

arXiv:2101.02900 [pdf, other]

doi 10.1137/21M142530X

The Computation of Approximate Generalized Feedback Nash Equilibria

Authors: Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu, Claire Tomlin

Abstract: We present the concept of a Generalized Feedback Nash Equilibrium (GFNE) in dynamic games, extending the Feedback Nash Equilibrium concept to games in which players are subject to state and input constraints. We formalize necessary and sufficient conditions for (local) GFNE solutions at the trajectory level, which enable the development of efficient numerical methods for their computation. Specifi… ▽ More We present the concept of a Generalized Feedback Nash Equilibrium (GFNE) in dynamic games, extending the Feedback Nash Equilibrium concept to games in which players are subject to state and input constraints. We formalize necessary and sufficient conditions for (local) GFNE solutions at the trajectory level, which enable the development of efficient numerical methods for their computation. Specifically, we propose a Newton-style method for finding game trajectories which satisfy necessary conditions for an equilibrium, which can then be checked against sufficiency conditions. We show that the evaluation of the necessary conditions in general requires computing a series of nested, implicitly-defined derivatives, which quickly becomes intractable. To this end, we introduce an approximation to the necessary conditions which is amenable to efficient evaluation, and in turn, computation of solutions. We term the solutions to the approximate necessary conditions Generalized Feedback Quasi-Nash Equilibria (GFQNE), and we introduce numerical methods for their computation. In particular, we develop a Sequential Linear-Quadratic Game approach, in which a LQ local approximation of the game is solved at each iteration. The development of this method relies on the ability to compute a GFNE to inequality- and equality-constrained LQ games, and therefore specific methods for the solution of these special cases are developed in detail. We demonstrate the effectiveness of the proposed solution approach on a dynamic game arising in an autonomous driving application. △ Less

Submitted 21 November, 2023; v1 submitted 8 January, 2021; originally announced January 2021.

arXiv:2012.10140 [pdf, other]

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

Abstract: This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori… ▽ More This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments. △ Less

Submitted 1 April, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

arXiv:2011.07183 [pdf, other]

Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics

Authors: Fernando Castañeda, Jason J. Choi, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. In order to estimate both state and input-dependent model uncertainty, we propose a novel compound kernel that captures the control-affine nature of the problem. Furthermore, by the use of GP Upper… ▽ More This paper presents a method to design a min-norm Control Lyapunov Function (CLF)-based stabilizing controller for a control-affine system with uncertain dynamics using Gaussian Process (GP) regression. In order to estimate both state and input-dependent model uncertainty, we propose a novel compound kernel that captures the control-affine nature of the problem. Furthermore, by the use of GP Upper Confidence Bound analysis, we provide probabilistic bounds of the regression error, leading to the formulation of a CLF-based stability chance constraint which can be incorporated in a min-norm optimization problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP). The data-collection process and the training of the GP regression model are carried out in an episodic learning fashion. We validate the proposed algorithm and controller in numerical simulations of an inverted pendulum and a kinematic bicycle model, resulting in stable trajectories which are very similar to the ones obtained if we actually knew the true plant dynamics. △ Less

Submitted 23 March, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: The first two authors contributed equally. To appear at the 2021 American Control Conference (ACC)

arXiv:2011.06047 [pdf, other]

Multi-Hypothesis Interactions in Game-Theoretic Motion Planning

Authors: Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu, Claire Tomlin

Abstract: We present a novel method for handling uncertainty about the intentions of non-ego players in dynamic games, with application to motion planning for autonomous vehicles. Equilibria in these games explicitly account for interaction among other agents in the environment, such as drivers and pedestrians. Our method models the uncertainty about the intention of other agents by constructing multiple hy… ▽ More We present a novel method for handling uncertainty about the intentions of non-ego players in dynamic games, with application to motion planning for autonomous vehicles. Equilibria in these games explicitly account for interaction among other agents in the environment, such as drivers and pedestrians. Our method models the uncertainty about the intention of other agents by constructing multiple hypotheses about the objectives and constraints of other agents in the scene. For each candidate hypothesis, we associate a Bernoulli random variable representing the probability of that hypothesis, which may or may not be independent of the probability of other hypotheses. We leverage constraint asymmetries and feedback information patterns to incorporate the probabilities of hypotheses in a natural way. Specifically, increasing the probability associated with a given hypothesis from $0$ to $1$ shifts the responsibility of collision avoidance from the hypothesized agent to the ego agent. This method allows the generation of interactive trajectories for the ego agent, where the level of assertiveness or caution that the ego exhibits is directly related to the easy-to-model uncertainty it maintains about the scene. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: For associated mp4 file, see https://youtu.be/x7VtYDrWTWM

Showing 1–50 of 167 results for author: Tomlin, C