Policy Gradients is one of the Reinforcement Learning Algorithm. In this experiment we consider a full RL problem, which means there are several states and each of our actions are in such a way that it not only considers current reward but also the rewards in the long run. Thus to make an optimal policy we should consider the temporal dynamics of the environment.
- Jupyter Notebook
- Numpy
- OpenAI Gym (https://gym.openai.com/docs/)
- Tensorflow (https://www.tensorflow.org/install/)
Most of the conceptual and programmatic understanding is borrowed from the Reinforcement Learning Series by Arthur Juliani here.