Search | arXiv e-print repository

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

Abstract: Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in… ▽ More Identifying the trade-offs between model-based and model-free methods is a central question in reinforcement learning. Value-based methods offer substantial computational advantages and are sometimes just as statistically efficient as model-based methods. However, focusing on the core problem of policy evaluation, we show information about the transition dynamics may be impossible to represent in the space of value functions. We explore this through a series of case studies focused on structures that arises in many important problems. In several, there is no information loss and value-based methods are as statistically efficient as model based ones. In other closely-related examples, information loss is severe and value-based methods are severely outperformed. A deeper investigation points to the limitations of the representational power as the driver of the inefficiency, as opposed to failure in algorithm design. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2301.13289 [pdf, other]

On the Statistical Benefits of Temporal Difference Learning

Authors: David Cheikhi, Daniel Russo

Abstract: Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive time-steps. Focusing on finite state Markov chains, we provide a crisp asymptot… ▽ More Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive time-steps. Focusing on finite state Markov chains, we provide a crisp asymptotic theory of the statistical advantages of this approach. First, we show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates. Depending on problem structure, the reduction could be enormous or nonexistent. Next, we prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states: TD's errors are bounded in terms of a novel measure - the problem's trajectory crossing time - which can be much smaller than the problem's time horizon. △ Less

Submitted 14 February, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 26 pages, 7 figures, submitted to ICML 2023

arXiv:2105.05761 [pdf, ps, other]

From Average Embeddings To Nearest Neighbor Search

Authors: Alexandr Andoni, David Cheikhi

Abstract: In this note, we show that one can use average embeddings, introduced recently in [Naor'20, arXiv:1905.01280], to obtain efficient algorithms for approximate nearest neighbor search. In particular, a metric $X$ embeds into $\ell_2$ on average, with distortion $D$, if, for any distribution $μみゅー$ on $X$, the embedding is $D$ Lipschitz and the (square of) distance does not decrease on average (wrt $μみゅー$)… ▽ More In this note, we show that one can use average embeddings, introduced recently in [Naor'20, arXiv:1905.01280], to obtain efficient algorithms for approximate nearest neighbor search. In particular, a metric $X$ embeds into $\ell_2$ on average, with distortion $D$, if, for any distribution $μみゅー$ on $X$, the embedding is $D$ Lipschitz and the (square of) distance does not decrease on average (wrt $μみゅー$). In particular existence of such an embedding (assuming it is efficient) implies a $O(D^3)$ approximate nearest neighbor search under $X$. This can be seen as a strengthening of the classic (bi-Lipschitz) embedding approach to nearest neighbor search, and is another application of data-dependent hashing paradigm. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2003.13563 [pdf, other]

Stochastic Flows and Geometric Optimization on the Orthogonal Group

Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult $\mathrm{Humanoid}$ agent from $\mathrm{OpenAI}$ $\mathrm{Gym}$ and improving convolutional neural networks. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Showing 1–4 of 4 results for author: Cheikhi, D