(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–46 of 46 results for author: Parker-Holder, J

.
  1. arXiv:2406.04268  [pdf, other

    cs.LG cs.AI

    Open-Endedness is Essential for Artificial Superhuman Intelligence

    Authors: Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel

    Abstract: In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, w… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2405.20835  [pdf, other

    cs.LG cs.AI cs.CL

    Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

    Authors: Davide Paglieri, Saurabh Dash, Tim Rocktäschel, Jack Parker-Holder

    Abstract: Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial fo… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2402.17139  [pdf, other

    cs.CV cs.AI

    Video as the New Language for Real-World Decision Making

    Authors: Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

    Abstract: Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  4. arXiv:2402.16822  [pdf, other

    cs.CL cs.AI cs.LG

    Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

    Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

    Abstract: As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel app… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  5. arXiv:2402.15391  [pdf, other

    cs.LG cs.AI cs.CV

    Genie: Generative Interactive Environments

    Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

    Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: https://sites.google.com/corp/view/genie-2024/

  6. arXiv:2401.13460  [pdf, other

    cs.LG cs.AI cs.MA

    Multi-Agent Diagnostics for Robustness via Illuminated Diversity

    Authors: Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel

    Abstract: In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encaps… ▽ More

    Submitted 28 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  7. arXiv:2312.09187  [pdf, other

    cs.LG

    Vision-Language Models as a Source of Rewards

    Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang , et al. (1 additional authors not shown)

    Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More

    Submitted 21 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures

  8. arXiv:2310.02782  [pdf, other

    cs.LG cs.AI

    Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

    Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), th… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  9. arXiv:2308.10797  [pdf, other

    cs.LG cs.AI

    Stabilizing Unsupervised Environment Design with a Learned Adversary

    Authors: Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel

    Abstract: A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses… ▽ More

    Submitted 22 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: CoLLAs 2023 - Oral; Second and third authors contributed equally

  10. arXiv:2303.06614  [pdf, other

    cs.LG cs.AI stat.ML

    Synthetic Experience Replay

    Authors: Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder

    Abstract: A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to… ▽ More

    Submitted 26 October, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Published at NeurIPS, 2023

  11. arXiv:2303.03376  [pdf, other

    cs.LG cs.MA

    MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

    Authors: Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

    Abstract: Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over either environment parameters (in single-agent settings) or co-player policies (in multi-agent settings). However, the strengths and weaknesses of co-players can… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR) 2023

  12. arXiv:2301.07608  [pdf, other

    cs.LG cs.AI cs.NE

    Human-Timescale Adaptation in an Open-Ended Task Space

    Authors: Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls , et al. (3 additional authors not shown)

    Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

  13. arXiv:2211.15944  [pdf, other

    cs.LG cs.AI

    The Effectiveness of World Models for Continual Reinforcement Learning

    Authors: Samuel Kessler, Mateusz Ostaszewski, Michał Bortkiewicz, Mateusz Żarski, Maciej Wołczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Miłoś

    Abstract: World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay meth… ▽ More

    Submitted 12 July, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted at CoLLAs 2023, 21 pages, 15 figures

  14. arXiv:2210.12719  [pdf, other

    cs.LG cs.AI

    Learning General World Models in a Handful of Reward-Free Deployments

    Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette

    Abstract: Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we i… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: To be published at NeurIPS 2022. Code and videos available at https://ycxuyingchen.github.io/cascade/

  15. arXiv:2207.11584  [pdf, other

    cs.LG cs.AI

    Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

    Authors: Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel

    Abstract: Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated in… ▽ More

    Submitted 15 August, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

    Comments: 19 pages, 12 figures, to be published in the Conference on Lifelong Learning Agents 2022

  16. arXiv:2207.09405  [pdf, other

    cs.LG cs.AI

    Bayesian Generational Population-Based Training

    Authors: Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael A. Osborne

    Abstract: Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architec… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: AutoML Conference 2022. 10 pages, 4 figure, 3 tables (28 pages, 10 figures, 7 tables including references and appendices)

  17. arXiv:2207.05219  [pdf, other

    cs.LG cs.AI stat.ML

    Grounding Aleatoric Uncertainty for Unsupervised Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

    Abstract: Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or… ▽ More

    Submitted 24 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022

  18. arXiv:2206.04779  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

    Authors: Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh

    Abstract: Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we esta… ▽ More

    Submitted 6 July, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Published at TMLR, 2023

  19. arXiv:2203.11889  [pdf, other

    cs.LG cs.AI cs.NE cs.SC stat.ML

    Insights From the NeurIPS 2021 NetHack Challenge

    Authors: Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Daejin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel , et al. (4 additional authors not shown)

    Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices

  20. arXiv:2203.08015  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    On-the-fly Strategy Adaptation for ad-hoc Agent Coordination

    Authors: Jaleh Zand, Jack Parker-Holder, Stephen J. Roberts

    Abstract: Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. Multi-agent reinforcement learning (MARL) has the potential to achieve this goal, demonstrating success in a series of challenging problems. However, whilst these advances are significant, the vast majority of focus has been on the self-play paradig… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Extended abstract published in AAMAS 2022

  21. arXiv:2203.01302  [pdf, other

    cs.LG

    Evolving Curricula with Regret-Based Environment Design

    Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the stude… ▽ More

    Submitted 30 September, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: First two authors contributed equally

  22. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

    Authors: Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

    Abstract: The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems,… ▽ More

    Submitted 2 June, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: Published in JAIR. Co-first authors and co-last authors are listed in alphabetical order

    MSC Class: 68T01 ACM Class: I.2.6

    Journal ref: Journal of Artificial Intelligence Research 74 (2022) 517-568

  23. arXiv:2112.14570  [pdf, other

    cs.GT cs.LG cs.MA

    Lyapunov Exponents for Diversity in Differentiable Games

    Authors: Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

    Abstract: Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian ("ridges"). RR is designed for conservative gradient systems (i.e., settings involving a single loss function), where it branches at saddles - easy-to-find bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: AAMAS2022, 24 pages

  24. arXiv:2111.02994  [pdf, other

    cs.LG

    Towards an Understanding of Default Policies in Multitask Policy Optimization

    Authors: Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano

    Abstract: Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical founda… ▽ More

    Submitted 23 March, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

  25. arXiv:2110.04135  [pdf, other

    cs.LG cs.AI

    Revisiting Design Choices in Offline Model-Based Reinforcement Learning

    Authors: Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts

    Abstract: Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves construct… ▽ More

    Submitted 16 March, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Spotlight @ ICLR 2022; Spotlight @ RL4RealLife Workshop ICML2021

  26. arXiv:2110.02439  [pdf, other

    cs.LG cs.AI

    Replay-Guided Adversarial Environment Design

    Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

    Abstract: Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emer… ▽ More

    Submitted 13 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  27. arXiv:2109.13202  [pdf, other

    cs.LG stat.ML

    MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

    Authors: Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in deep reinforcement learning (RL) is heavily driven by the availability of challenging benchmarks used for training agents. However, benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods. While there exist environments for assessing particular open problems in RL (such as exploration, transfer learning, unsuper… ▽ More

    Submitted 16 November, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2021: Datasets and Benchmarks Track

  28. arXiv:2107.07999  [pdf, other

    cs.LG cs.AI

    From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

    Authors: Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

    Abstract: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topo… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 20 pages, 12 figures

  29. arXiv:2106.15883  [pdf, other

    cs.LG

    Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL

    Authors: Jack Parker-Holder, Vu Nguyen, Shaan Desai, Stephen Roberts

    Abstract: Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

  30. arXiv:2106.02940  [pdf, other

    cs.LG cs.AI

    Same State, Different Task: Continual Reinforcement Learning without Interference

    Authors: Samuel Kessler, Jack Parker-Holder, Philip Ball, Stefan Zohren, Stephen J. Roberts

    Abstract: Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally inc… ▽ More

    Submitted 15 March, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

    Comments: Accepted as an oral at AAAI 2022. 17 pages and 12 figures

  31. arXiv:2104.05632  [pdf, other

    cs.LG cs.AI

    Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

    Authors: Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts

    Abstract: Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dyn… ▽ More

    Submitted 3 August, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted @ ICML 2021; Spotlight @ ICLR 2021 "Self-Supervision for Reinforcement Learning Workshop"

  32. arXiv:2102.04353  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Unlocking Pixels for Reinforcement Learning via Implicit Attention

    Authors: Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

    Abstract: There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning h… ▽ More

    Submitted 1 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  33. arXiv:2102.03765  [pdf, other

    cs.LG

    Tactical Optimism and Pessimism for Deep Reinforcement Learning

    Authors: Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan

    Abstract: In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to address function approximation errors, which previously led to disappointing performance. However, a direct consequence of pessimism is reduced exploration, runni… ▽ More

    Submitted 6 April, 2022; v1 submitted 7 February, 2021; originally announced February 2021.

  34. arXiv:2101.07415  [pdf, other

    cs.LG cs.NE cs.RO

    ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

    Abstract: In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and e… ▽ More

    Submitted 15 March, 2023; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Previously published at ICLR 2020 NAS Workshop. See https://github.com/google-research/google-research/tree/master/es_enas for associated code

  35. arXiv:2011.06505  [pdf, other

    cs.LG

    Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

    Authors: Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster

    Abstract: Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumption… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Camera-ready version, NeurIPS 2020

  36. arXiv:2006.11911  [pdf, other

    cs.LG stat.ML

    Towards Tractable Optimism in Model-Based Reinforcement Learning

    Authors: Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

    Abstract: The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the… ▽ More

    Submitted 3 December, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: Presented as a conference paper at UAI 2021

  37. arXiv:2003.13563  [pdf, other

    cs.LG stat.ML

    Stochastic Flows and Geometric Optimization on the Orthogonal Group

    Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  38. arXiv:2002.02693  [pdf, other

    cs.LG stat.ML

    Ready Policy One: World Building Through Active Learning

    Authors: Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

    Abstract: Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rely on combining greedy policies with exploration heuristics, and even those which utilize principled exploration bonuses construct dual objectives in an ad hoc fashion. In this paper we int… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  39. arXiv:2002.02518  [pdf, other

    cs.LG stat.ML

    Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

    Authors: Jack Parker-Holder, Vu Nguyen, Stephen Roberts

    Abstract: Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can lead to failure. Despite the importance of tuning hyperparameters, it remains expensive and is often done in a naive and laborious way. A recent solution to this problem is Population Based Training… ▽ More

    Submitted 4 June, 2021; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: Camera-ready version, NeurIPS 2020

  40. arXiv:2002.00632  [pdf, other

    cs.LG stat.ML

    Effective Diversity in Population Based Reinforcement Learning

    Authors: Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

    Abstract: Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean f… ▽ More

    Submitted 7 October, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: Camera-ready version, NeurIPS 2020

  41. arXiv:1907.06511  [pdf, other

    cs.NE cs.AI cs.LG cs.RO

    Reinforcement Learning with Chromatic Networks for Compact Architecture Search

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

    Abstract: We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way. By defining the combinatorial search space of NAS to be the set of different edge-partitionings (colorings) into same-weight classes, we represent compact architectures via efficient learned edge-partitionings. For several RL… ▽ More

    Submitted 6 April, 2021; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Published at ICLR 2020 Neural Architecture Search Workshop. This paper is deprecated; please see arXiv:2101.07415 for the newer version

  42. arXiv:1906.04349  [pdf, other

    cs.LG stat.ML

    Learning to Score Behaviors for Guided Policy Optimization

    Authors: Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

    Abstract: We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over policy behaviors that can in turn be used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual fo… ▽ More

    Submitted 4 March, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

  43. arXiv:1905.12667  [pdf, other

    cs.LG stat.ML

    Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

    Authors: Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

    Abstract: We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes. We successfully apply DPPMCs to problems involving nonisotropic distributions arising in guided evolution strategy (GES) methods for RL, CMA-ES techn… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  44. arXiv:1903.04268  [pdf, other

    math.OC cs.LG stat.ML

    From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization

    Authors: Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

    Abstract: We present a new algorithm ASEBO for optimizing high-dimensional blackbox functions. ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly. It addresses the exploration-exploitation trade-off of blackbox optimization with expensive blackbox queries by continuously learning the bias of the lower-dimensional model used to a… ▽ More

    Submitted 4 June, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  45. arXiv:1903.02993  [pdf, other

    cs.LG stat.ML

    Provably Robust Blackbox Optimization for Reinforcement Learning

    Authors: Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

    Abstract: Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rew… ▽ More

    Submitted 8 July, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  46. arXiv:1801.02764  [pdf, other

    cs.LG

    Compressing Deep Neural Networks: A New Hashing Pipeline Using Kac's Random Walk Matrices

    Authors: Jack Parker-Holder, Sam Gass

    Abstract: The popularity of deep learning is increasing by the day. However, despite the recent advancements in hardware, deep neural networks remain computationally intensive. Recent work has shown that by preserving the angular distance between vectors, random feature maps are able to reduce dimensionality without introducing bias to the estimator. We test a variety of established hashing pipelines as wel… ▽ More

    Submitted 25 September, 2018; v1 submitted 8 January, 2018; originally announced January 2018.