GitHub - medipixel/rl_algorithms: Structural implementation of RL key algorithms

Welcome!

This repository contains Reinforcement Learning algorithms which are being used for research activities at Medipixel. The source code will be frequently updated. We are warmly welcoming external contributors! :)


BC agent on LunarLanderContinuous-v2	RainbowIQN agent on PongNoFrameskip-v4	SAC agent on Reacher-v2

Contributors

Thanks goes to these wonderful people (emoji key):

_{Jinwoo Park (Curt)}

_{Kyunghwan Kim}

_darthegg

_{Mincheol Kim}

_김민섭

_{Leejin Jung}

This project follows the all-contributors specification.

Algorithms

Performance

We have tested each algorithm on some of the following environments.

The performance is measured on the commit 4248057. Please note that this won't be frequently updated.

Reacher-v2

We reproduced the performance of DDPG, TD3, and SAC on Reacher-v2 (Mujoco). They reach the score around -3.5 to -4.5. See W&B Log for more details.

PongNoFrameskip-v4

RainbowIQN learns the game incredibly fast! It accomplishes the perfect score (21) within 100 episodes! The idea of RainbowIQN is roughly suggested from W. Dabney et al.. See W&B Log for more details.

LunarLander-v2 / LunarLanderContinuous-v2

We used these environments just for a quick verification of each algorithm, so some of experiments may not show the best performance. Click the following lines to see the figures.

LunarLander-v2: RainbowDQN, RainbowDQfD

See W&B log for more details.

LunarLanderContinuous-v2: A2C, PPO, DDPG, TD3, SAC

See W&B log for more details.

LunarLanderContinuous-v2: DDPG, PER-DDPG, DDPGfD, BC-DDPG

See W&B log for more details.

LunarLanderContinuous-v2: SAC, SACfD, BC-SAC

See W&B log for more details.

Getting started

Prerequisites

This repository is tested on Anaconda virtual environment with python 3.6.1+

$ conda create -n rl_algorithms python=3.6.1
$ conda activate rl_algorithms

In order to run Mujoco environments (e.g. Reacher-v2), you need to acquire Mujoco license.

Installation

First, clone the repository.

git clone https://github.com/medipixel/rl_algorithms.git
cd rl_algorithms

For users

Install packages required to execute the code. It includes python setup.py install. Just type:

make dep

For developers

If you want to modify code you should configure formatting and linting settings. It automatically runs formatting and linting when you commit the code. Contrary to make dep command, it includes python setup.py develop. Just type:

make dev

After having done make dev, you can validate the code by the following commands.

make format  # for formatting
make test  # for linting

Usages

You can train or test algorithm on env_name if configs/env_name/algorithm.py exists. (configs/env_name/algorithm.py contains hyper-parameters)

python run_env_name.py --cfg-path <config-path>

e.g. running soft actor-critic on LunarLanderContinuous-v2.

python run_lunarlander_continuous_v2.py --cfg-path ./configs/lunarlander_continuous_v2/sac.py <other-options>

e.g. running a custom agent, if you have written your own configs: configs/env_name/ddpg-custom.py.

python run_env_name.py --cfg-path ./configs/lunarlander_continuous_v2/ddpg-custom.py

You will see the agent run with hyper parameter and model settings you configured.

Arguments for run-files

In addition, there are various argument settings for running algorithms. If you check the options to run file you should command

python <run-file> -h

--test
- Start test mode (no training).
--off-render
- Turn off rendering.
--log
- Turn on logging using W&B.
--seed <int>
- Set random seed.
--save-period <int>
- Set saving period of model and optimizer parameters.
--max-episode-steps <int>
- Set maximum episode step number of the environment. If the number is less than or equal to 0, it uses the default maximum step number of the environment.
--episode-num <int>
- Set the number of episodes for training.
--render-after <int>
- Start rendering after the number of episodes.
--load-from <save-file-path>
- Load the saved models and optimizers at the beginning.

Show feature map with Grad-CAM

You can show a feature map that the trained agent extract using Grad-CAM(Gradient-weighted Class Activation Mapping). Grad-CAM is a way of combining feature maps using the gradient signal, and produce a coarse localization map of the important regions in the image. You can use it by adding Grad-CAM config and --grad-cam flag when you run. For example:

python run_env_name.py --cfg-path <config-path> --test --grad-cam

It can be only used the agent that uses convolutional layers like DQN for Pong environment. You can see feature maps of all the configured convolution layers.

W&B for logging

We use W&B for logging of network parameters and others. For logging, please follow the steps below after requirement installation:

Create a wandb account

Check your API key in settings, and login wandb on your terminal: $ wandb login API_KEY

Initialize wandb: $ wandb init

For more details, read W&B tutorial.

Class Diagram

Class diagram at #135. This won't be frequently updated.

Name	Latest commit message	Commit time
Failed to load latest commit information.
.circleci	Modify readme and config for all-contributors (#194 )	Mar 26, 2020
configs	Fix DQN, C51, IQN loss functions to registry (#215 )	Apr 29, 2020
data	Fix wrong state and action return on n-step buffer (#111 )	Mar 13, 2019
rl_algorithms	Fix dqfd loss function to remove get_dqn_loss (#218 )	May 22, 2020
tests	Rename base_network to brain (#216 )	May 8, 2020
tools	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
.all-contributorsrc	Add backbone & head registry (#205 )	Apr 20, 2020
.flake8	Set pre-commit setting (#35 )	Jan 21, 2019
.gitignore	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
.isort.cfg	Add DQfD (#100 )	Mar 1, 2019
.pre-commit-config.yaml
.pylintrc	Add discrete setting in sac and asynchronous setting in ppo (#83 )	Feb 18, 2019
Dockerfile	Modify readme and config for all-contributors (#194 )	Mar 26, 2020
LICENSE.md	Modify README.md and LICENSE (#82 )	Feb 22, 2019
Makefile	Add batch_size argument n-step buffer (#207 )	Apr 14, 2020
README.md	Add backbone & head registry (#205 )	Apr 20, 2020
mypy.ini	Add mypy and type annotation for ac ~ ddpg (#37 )	Jan 23, 2019
requirements-dev.txt	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
requirements.txt	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
run_continuous.sh	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
run_lunarlander_continuous_v2.py	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
run_lunarlander_v2.py	Add registry, modify config and Do packaging (#184 )	Jan 8, 2020
run_pong_no_frameskip_v4.py	Add backbone & head registry (#205 )	Apr 20, 2020
run_reacher_v2.py	Add backbone & head registry (#205 )	Apr 20, 2020
setup.py	Modify setup.py (#186 )	Jan 9, 2020

medipixel / rl_algorithms

README.md

Contents

Welcome!

Contributors

Algorithms

Performance

Reacher-v2

PongNoFrameskip-v4

LunarLander-v2 / LunarLanderContinuous-v2

Getting started

Prerequisites

Installation

For users

For developers

Usages

Arguments for run-files

Show feature map with Grad-CAM

W&B for logging

Class Diagram

References

medipixel / rl_algorithms

Join GitHub today

Clone with HTTPS

Downloading

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Files

README.md

Contents

Welcome!

Contributors

Algorithms

Performance

Reacher-v2

PongNoFrameskip-v4

LunarLander-v2 / LunarLanderContinuous-v2

Getting started

Prerequisites

Installation

For users

For developers

Usages

Arguments for run-files

Show feature map with Grad-CAM

W&B for logging

Class Diagram

References