Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Zhong, Han; Hu, Jiachen; Xue, Yecheng; Li, Tongyang; Wang, Liwei

Quantum Physics

arXiv:2302.10796v2 (quant-ph)

[Submitted on 21 Feb 2023 (v1), last revised 13 Jun 2024 (this version, v2)]

Title:Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Authors:Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang

View PDF HTML (experimental)

Abstract:While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $\Omega(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.

Comments:	ICML 2024
Subjects:	Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2302.10796 [quant-ph]
	(or arXiv:2302.10796v2 [quant-ph] for this version)
	https://doi.org/10.48550/arXiv.2302.10796

Submission history

From: Han Zhong [view email]
[v1] Tue, 21 Feb 2023 16:23:11 UTC (408 KB)
[v2] Thu, 13 Jun 2024 17:00:41 UTC (80 KB)

Quantum Physics

Title:Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantum Physics

Title:Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators