POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Khetarpal, Khimya; Vernade, Claire; O'Donoghue, Brendan; Singh, Satinder; Zahavy, Tom

Computer Science > Artificial Intelligence

arXiv:2212.14530 (cs)

[Submitted on 30 Dec 2022]

Title:POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Authors:Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

View PDF

Abstract:We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.

Comments:	24 pages, 6 figures
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2212.14530 [cs.AI]
	(or arXiv:2212.14530v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2212.14530

Submission history

From: Khimya Khetarpal [view email]
[v1] Fri, 30 Dec 2022 03:09:45 UTC (16,064 KB)

Computer Science > Artificial Intelligence

Title:POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators