Skip to content
#

a3c

Here are 131 public repositories matching this topic...

tensorlayer
0xtyls
0xtyls commented Jan 3, 2020

I understand that these two python files show two different methods to construct a model. The original n_epoch is 500 which works perfect for both python files. But if I change n_epoch to 20, only tutorial_mnist_mlp_static.py can achieve a high test accuracy (~0.97). The other file tutorial_mnist_mlp_static_2.py only get 0.47.

The models built from these two files looks the same for me (the s

fredcallaway
fredcallaway commented Jun 29, 2017

I was surprised to see this loss function because it is generally used when the target is a distribution (i.e. sums to 1). This is not the case for the advantage estimate. However, I worked out the math and it does appear to be doing the right thing which is neat!

I think this trick should be mentioned in the code.

JacobHanouna
JacobHanouna commented Mar 7, 2020

BTgym have two main sections, the Gym framework and the RL algorithm framework.
The RL part is tailored to the unique gym requirements of BTgym, but as new research in the field is emerging there will be a benefit in exploring new algorithms that aren't implemented by this project.

The following tutorial is my own attempt of testing the integration between the Gym part of BTgym with an externa

MogicianWu
MogicianWu commented Sep 6, 2017

In tensorflow document, it says:

use_locking: If True, updating of the var, ms, and mom tensors is protected by a lock; otherwise the behavior is undefined, but may exhibit less contention.

However in the code this flag is set to False. Could this cause a problem by the racing condition?

Also, I don't understand why the original paper states it's better to share g across different thread

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

  • Updated Jan 22, 2019
  • Jupyter Notebook

Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games

  • Updated Feb 27, 2019
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the a3c topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the a3c topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.