(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–21 of 21 results for author: Lever, G

.
  1. arXiv:2405.02425  [pdf, other

    cs.RO cs.AI

    Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

    Authors: Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess

    Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-b… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2311.15951  [pdf, other

    cs.LG cs.AI cs.RO

    Replay across Experiments: A Natural Extension of Off-Policy RL

    Authors: Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier

    Abstract: Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involve… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  3. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

    Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley , et al. (3 additional authors not shown)

    Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Project website: https://sites.google.com/view/op3-soccer

  4. arXiv:2302.06607  [pdf, other

    cs.GT

    Generative Adversarial Equilibrium Solvers

    Authors: Denizalp Goktas, David C. Parkes, Ian Gemp, Luke Marris, Georgios Piliouras, Romuald Elie, Guy Lever, Andrea Tacchetti

    Abstract: We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other playe… ▽ More

    Submitted 20 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 41 pages, 13 figures

  5. arXiv:2209.10958  [pdf, ps, other

    cs.MA cs.AI

    Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

    Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Published in AI Communications 2022

  6. arXiv:2105.12196  [pdf, other

    cs.AI cs.MA cs.NE cs.RO

    From Motor Control to Team Play in Simulated Humanoid Football

    Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

    Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  7. arXiv:1912.05676  [pdf, other

    cs.MA cs.CL cs.LG

    Biases for Emergent Communication in Multi-agent Reinforcement Learning

    Authors: Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel

    Abstract: We study the problem of emergent communication, in which language arises because speakers and listeners must communicate information in order to solve tasks. In temporally extended reinforcement learning domains, it has proved hard to learn such communication without centralized training of agents, due in part to a difficult joint exploration problem. We introduce inductive biases for positive sig… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: Accepted at NeurIPS 2019

  8. arXiv:1911.13232  [pdf, other

    cs.LG cs.CL

    CONAN: Complementary Pattern Augmentation for Rare Disease Detection

    Authors: Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao

    Abstract: Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  9. arXiv:1909.12823  [pdf, other

    cs.MA cs.AI cs.LG

    A Generalized Training Approach for Multiagent Learning

    Authors: Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

    Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-… ▽ More

    Submitted 14 February, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  10. arXiv:1902.07151  [pdf, other

    cs.AI

    Emergent Coordination Through Competition

    Authors: Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel

    Abstract: We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our stud… ▽ More

    Submitted 21 February, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

    Journal ref: ICLR (2019)

  11. arXiv:1807.01281  [pdf, other

    cs.LG cs.AI stat.ML

    Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

    Authors: Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. I… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  12. arXiv:1706.05296  [pdf, other

    cs.AI

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Authors: Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

    Abstract: We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observab… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    ACM Class: I.2.11

  13. arXiv:1607.01981  [pdf, other

    stat.ML cs.LG

    Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent

    Authors: Aleksandar Botev, Guy Lever, David Barber

    Abstract: We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nestero… ▽ More

    Submitted 11 July, 2016; v1 submitted 7 July, 2016; originally announced July 2016.

  14. arXiv:1507.08271  [pdf, other

    cs.AI cs.LG stat.ML

    A Gauss-Newton Method for Markov Decision Processes

    Authors: Thomas Furmston, Guy Lever

    Abstract: Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analys… ▽ More

    Submitted 6 August, 2015; v1 submitted 29 July, 2015; originally announced July 2015.

  15. arXiv:1310.7906   

    math.OC

    Convergence Analysis of the Approximate Newton Method for Markov Decision Processes

    Authors: Thomas Furmston, Guy Lever

    Abstract: Recently two approximate Newton methods were proposed for the optimisation of Markov Decision Processes. While these methods were shown to have desirable properties, such as a guarantee that the preconditioner is negative-semidefinite when the policy is $\log$-concave with respect to the policy parameters, and were demonstrated to have strong empirical performance in challenging domains, such as t… ▽ More

    Submitted 4 August, 2015; v1 submitted 29 October, 2013; originally announced October 2013.

    Comments: This work has been removed because a more recent piece (A Gauss-Newton method for Markov Decision Processes, T. Furmston & G. Lever) of work has subsumed it

  16. arXiv:1302.4696  [pdf, other

    physics.chem-ph cond-mat.soft q-bio.BM

    Electrostatic considerations affecting the calculated HOMO-LUMO gap in protein molecules

    Authors: Greg Lever, Daniel J Cole, Nicholas D M Hine, Peter D Haynes, Mike C Payne

    Abstract: A detailed study of energy differences between the highest occupied and lowest unoccupied molecular orbitals (HOMO-LUMO gaps) in protein systems and water clusters is presented. Recent work questioning the applicability of Kohn-Sham density-functional theory to proteins and large water clusters (E. Rudberg, J. Phys.: Condens. Mat. 2012, 24, 072202) has demonstrated vanishing HOMO-LUMO gaps for the… ▽ More

    Submitted 19 February, 2013; originally announced February 2013.

    Comments: 13 pages, 4 figures

  17. arXiv:1206.4655  [pdf

    cs.LG

    Modelling transition dynamics in MDPs with RKHS embeddings

    Authors: Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton

    Abstract: We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of conditional distributions as \emph{embeddings} in a reproducing kernel Hilbert space (RKHS). Such r… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  18. arXiv:1205.4656  [pdf, other

    cs.LG stat.ML

    Conditional mean embeddings as regressors - supplementary

    Authors: Steffen Grünewälder, Guy Lever, Luca Baldassarre, Sam Patterson, Arthur Gretton, Massimilano Pontil

    Abstract: We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vector-valued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vector-v… ▽ More

    Submitted 24 July, 2012; v1 submitted 21 May, 2012; originally announced May 2012.

  19. A Uniformly Derived Catalogue of Exoplanets from Radial Velocities

    Authors: Morgan D. J. Hollis, Sreekumar T. Balan, Greg Lever, Ofer Lahav

    Abstract: A new catalogue of extrasolar planets is presented by re-analysing a selection of published radial velocity data sets using EXOFIT (Balan & Lahav 2009). All objects are treated on an equal footing within a Bayesian framework, to give orbital parameters for 94 exoplanetary systems. Model selection (between one- and two-planet solutions) is then performed, using both a visual flagging method and a s… ▽ More

    Submitted 3 January, 2012; originally announced January 2012.

    Comments: 16 pages, 6 figures, 6 tables

    Journal ref: MNRAS 423, Iss.3, pp.2800-2814 (2012)

  20. arXiv:1112.4722   

    cs.LG

    Modeling transition dynamics in MDPs with RKHS embeddings of conditional distributions

    Authors: Steffen Grünewälder, Luca Baldassarre, Massimiliano Pontil, Arthur Gretton, Guy Lever

    Abstract: We propose a new, nonparametric approach to estimating the value function in reinforcement learning. This approach makes use of a recently developed representation of conditional distributions as functions in a reproducing kernel Hilbert space. Such representations bypass the need for estimating transition probabilities, and apply to any domain on which kernels can be defined. Our approach avoids… ▽ More

    Submitted 18 October, 2012; v1 submitted 20 December, 2011; originally announced December 2011.

    Comments: The article can now be found under arXiv:1206.4655. We combined both versions and are withdrawing this version because of the resulting redundancy

  21. arXiv:1110.4416  [pdf, other

    cs.LG

    Data-dependent kernels in nearly-linear time

    Authors: Guy Lever, Tom Diethe, John Shawe-Taylor

    Abstract: We propose a method to efficiently construct data-dependent kernels which can make use of large quantities of (unlabeled) data. Our construction makes an approximation in the standard construction of semi-supervised kernels in Sindhwani et al. 2005. In typical cases these kernels can be computed in nearly-linear time (in the amount of data), improving on the cubic time of the standard construction… ▽ More

    Submitted 19 October, 2011; originally announced October 2011.