(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–15 of 15 results for author: Modayil, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04242  [pdf, other

    cs.LG cs.AI cs.NE

    The Ungrounded Alignment Problem

    Authors: Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones

    Abstract: Modern machine learning systems have demonstrated substantial abilities with methods that either embrace or ignore human-provided knowledge, but combining benefits of both styles remains a challenge. One particular challenge involves designing learning systems that exhibit built-in responses to specific abstract stimulus patterns, yet are still plastic enough to be agnostic about the modality and… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 7 pages, plus references and appendix

  2. arXiv:2311.02215  [pdf, other

    cs.LG cs.AI

    Towards model-free RL algorithms that scale well with unstructured data

    Authors: Joseph Modayil, Zaheer Abbas

    Abstract: Conventional reinforcement learning (RL) algorithms exhibit broad generality in their theoretical formulation and high performance on several challenging domains when combined with powerful function approximation. However, developing RL algorithms that perform well across problems with unstructured observations at scale remains challenging because most function approximation methods rely on extern… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  3. arXiv:2303.07507  [pdf, other

    cs.LG cs.AI

    Loss of Plasticity in Continual Deep Reinforcement Learning

    Authors: Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado

    Abstract: The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon i… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  4. arXiv:2203.09498  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents

    Authors: Patrick M. Pilarski, Andrew Butcher, Elnaz Davoodi, Michael Bradley Johanson, Dylan J. A. Brenneis, Adam S. R. Parker, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White

    Abstract: Learned communication between agents is a powerful tool when approaching decision-making problems that are hard to overcome by any single agent in isolation. However, continual coordination and communication learning between machine agents or human-machine partnerships remains a challenging open problem. As a stepping stone toward solving the continual communication learning problem, in this paper… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 54 pages, 29 figures, 4 tables

  5. arXiv:2201.03709  [pdf, other

    cs.AI cs.LG cs.MA

    Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

    Authors: Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski

    Abstract: In this paper, we contribute a multi-faceted study into Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. Signalling is intimately connected to time and timing. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict th… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

    Comments: 9 pages, 7 figures

  6. arXiv:2112.07774  [pdf, other

    cs.AI cs.HC cs.MA

    Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study

    Authors: Dylan J. A. Brenneis, Adam S. Parker, Michael Bradley Johanson, Andrew Butcher, Elnaz Davoodi, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White, Patrick M. Pilarski

    Abstract: Artificial intelligence systems increasingly involve continual learning to enable flexibility in general situations that are not encountered during system training. Human interaction with autonomous systems is broadly studied, but research has hitherto under-explored interactions that occur while the system is actively learning, and can noticeably change its behaviour in minutes. In this pilot stu… ▽ More

    Submitted 22 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

  7. arXiv:2106.09776  [pdf, other

    cs.LG cs.AI

    Adapting the Function Approximation Architecture in Online Reinforcement Learning

    Authors: John D. Martin, Joseph Modayil

    Abstract: The performance of a reinforcement learning (RL) system depends on the computational architecture used to approximate a value function. Deep learning methods provide both optimization techniques and architectures for approximating nonlinear functions from noisy, high-dimensional observations. However, prevailing optimization techniques are not designed for strictly-incremental online updates. Nor… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  8. arXiv:1907.02908  [pdf, other

    cs.LG cs.AI stat.ML

    On Inductive Biases in Deep Reinforcement Learning

    Authors: Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver

    Abstract: Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent's objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  9. arXiv:1904.11455  [pdf, other

    cs.LG cs.AI stat.ML

    Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

    Authors: Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

    Abstract: Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problemati… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: Full version of RLDM abstract

  10. arXiv:1812.02648  [pdf, other

    cs.AI cs.LG

    Deep Reinforcement Learning and the Deadly Triad

    Authors: Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil

    Abstract: We know from reinforcement learning theory that temporal difference learning can fail in certain cases. Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can diverge with the value estimates becoming unbounded. However, several algorithms successfully combine these three properties,… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  11. arXiv:1811.07004  [pdf, ps, other

    cs.AI cs.LG

    The Barbados 2018 List of Open Issues in Continual Learning

    Authors: Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

    Abstract: We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: NIPS Continual Learning Workshop 2018

  12. arXiv:1711.08378  [pdf

    cs.AI

    Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

    Authors: M. Botvinick, D. G. T. Barrett, P. Battaglia, N. de Freitas, D. Kumaran, J. Z Leibo, T. Lillicrap, J. Modayil, S. Mohamed, N. C. Rabinowitz, D. J. Rezende, A. Santoro, T. Schaul, C. Summerfield, G. Wayne, T. Weber, D. Wierstra, S. Legg, D. Hassabis

    Abstract: We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approac… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

  13. arXiv:1710.02298  [pdf, other

    cs.AI cs.LG

    Rainbow: Combining Improvements in Deep Reinforcement Learning

    Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

    Abstract: The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 260… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

    Comments: Under review as a conference paper at AAAI 2018

  14. arXiv:1206.6262  [pdf, other

    cs.AI cs.LG

    Scaling Life-long Off-policy Learning

    Authors: Adam White, Joseph Modayil, Richard S. Sutton

    Abstract: We pursue a life-long learning approach to artificial intelligence that makes extensive use of reinforcement learning algorithms. We build on our prior work with general value functions (GVFs) and the Horde architecture. GVFs have been shown able to represent a wide variety of facts about the world's dynamics that may be useful to a long-lived agent (Sutton et al. 2011). We have also previously sh… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

  15. arXiv:1112.1133  [pdf, other

    cs.LG cs.RO

    Multi-timescale Nexting in a Reinforcement Learning Robot

    Authors: Joseph Modayil, Adam White, Richard S. Sutton

    Abstract: The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to "next" constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, predicting thousands of f… ▽ More

    Submitted 8 June, 2012; v1 submitted 5 December, 2011; originally announced December 2011.

    Comments: (11 pages, 5 figures, This version to appear in the Proceedings of the Conference on the Simulation of Adaptive Behavior, 2012)