Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Yan, Kai; Schwing, Alexander G.; Wang, Yu-xiong

Computer Science > Machine Learning

arXiv:2311.01331 (cs)

[Submitted on 2 Nov 2023 (v1), last revised 9 Jun 2024 (this version, v3)]

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Authors:Kai Yan, Alexander G. Schwing, Yu-xiong Wang

View PDF HTML (experimental)

Abstract:In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods, as exemplified by SMODICE, minimize the state occupancy divergence between the learner's and empirical expert policies. However, such methods are limited to either $f$-divergences (KL and $chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To enable more flexible distance metrics, we propose Primal Wasserstein DICE (PW-DICE). It minimizes the primal Wasserstein distance between the learner and expert state occupancies and leverages a contrastively learned distance metric. Theoretically, our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods. The code is available at this https URL.

Comments:	25 pages. Accepted to ICML 2024
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.01331 [cs.LG]
	(or arXiv:2311.01331v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.01331

Submission history

From: Kai Yan [view email]
[v1] Thu, 2 Nov 2023 15:41:57 UTC (6,658 KB)
[v2] Tue, 21 Nov 2023 18:50:49 UTC (6,657 KB)
[v3] Sun, 9 Jun 2024 18:43:27 UTC (20,868 KB)

Computer Science > Machine Learning

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators