SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Rathbun, Ethan; Amato, Christopher; Oprea, Alina

Computer Science > Machine Learning

arXiv:2405.20539 (cs)

[Submitted on 30 May 2024]

Title:SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Authors:Ethan Rathbun, Christopher Amato, Alina Oprea

View PDF HTML (experimental)

Abstract:Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

Comments:	23 pages, 14 figures, NeurIPS
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Cite as:	arXiv:2405.20539 [cs.LG]
	(or arXiv:2405.20539v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.20539

Submission history

From: Ethan Rathbun [view email]
[v1] Thu, 30 May 2024 23:31:25 UTC (6,262 KB)

Computer Science > Machine Learning

Title:SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators