Recurrent Policies for Handling Partially Observable Environments with ReLAx

This repository contains an implementation of PPO-GAE algorithm with lagged LSTM policy (and critic) and its comparison with 0-lag MLP PPO-GAE.

To simulate partial observability in a controlled manner a gym.Wrapper which masks observation's array elements with zeros with eps probability was created. In our experiments, the degree of partial observability was controlled through altering eps value.

Experiments results are shown below:

As we can see, for the fully observable case (eps=0) MLP and LSTM policies show roughly the same performance. For a moderate degree of partial observability (eps=0.25) LSTM policy shows slightly faster learning at the early stages. For a considerable degree of partial observability (eps=0.5) LSTM policy shows significantly better performance comparing to MLP policy. However, both actors struggled to converge to fully observable case asymptotic performance. For a staggering degree of partial observability (eps=0.75) both policies failed to learn.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
tensorboard_logs		tensorboard_logs
trained_models		trained_models
README.md		README.md
lags_for_pomdp.ipynb		lags_for_pomdp.ipynb
pomdp_comparison.png		pomdp_comparison.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recurrent Policies for Handling Partially Observable Environments with ReLAx

About

Releases

Packages

Languages

nslyubaykin/rnns_for_pomdp

Folders and files

Latest commit

History

Repository files navigation

Recurrent Policies for Handling Partially Observable Environments with ReLAx

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages