(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–5 of 5 results for author: Bortkiewicz, M

.
  1. arXiv:2403.00514  [pdf, other

    cs.LG

    Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

    Authors: Michal Nauman, Michał Bortkiewicz, Piotr Miłoś, Tomasz Trzciński, Mateusz Ostaszewski, Marek Cygan

    Abstract: Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms ra… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  2. arXiv:2402.02868  [pdf, other

    cs.LG

    Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

    Authors: Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

    Abstract: Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: for… ▽ More

    Submitted 17 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML 2024 Spotlight

  3. arXiv:2211.15944  [pdf, other

    cs.LG cs.AI

    The Effectiveness of World Models for Continual Reinforcement Learning

    Authors: Samuel Kessler, Mateusz Ostaszewski, Michał Bortkiewicz, Mateusz Żarski, Maciej Wołczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Miłoś

    Abstract: World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay meth… ▽ More

    Submitted 12 July, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted at CoLLAs 2023, 21 pages, 15 figures

  4. arXiv:2211.06351  [pdf, other

    cs.LG cs.AI cs.MA

    Emergency action termination for immediate reaction in hierarchical reinforcement learning

    Authors: Michał Bortkiewicz, Jakub Łyskawa, Paweł Wawrzyński, Mateusz Ostaszewski, Artur Grudkowski, Tomasz Trzciński

    Abstract: Hierarchical decomposition of control is unavoidable in large dynamical systems. In reinforcement learning (RL), it is usually solved with subgoals defined at higher policy levels and achieved at lower policy levels. Reaching these goals can take a substantial amount of time, during which it is not verified whether they are still worth pursuing. However, due to the randomness of the environment, t… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  5. arXiv:2207.01562  [pdf, other

    cs.CV cs.LG

    Progressive Latent Replay for efficient Generative Rehearsal

    Authors: Stanisław Pawlak, Filip Szatkowski, Michał Bortkiewicz, Jan Dubiński, Tomasz Trzciński

    Abstract: We introduce a new method for internal replay that modulates the frequency of rehearsal based on the depth of the network. While replay strategies mitigate the effects of catastrophic forgetting in neural networks, recent works on generative replay show that performing the rehearsal only on the deeper layers of the network improves the performance in continual learning. However, the generative app… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.