Skip to content

Implementation of various RL algorithms in Python using Gym and Pytorch.

License

Notifications You must be signed in to change notification settings

sobhanshukueian/Reinforcement-Learning-Playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Playground 🚀

Hey there! I'm super stoked to kick off a new project in this repository. My plan is to implement a bunch of awesome Reinforcement Learning (RL) algorithms using Python, OpenAI Gym environments, and Pytorch.

WorldOfChaldeaChaldeaGIF (2)

Table of Contents

Introduction

Reinforcement Learning (RL) is a fascinating field of artificial intelligence where agents learn to make decisions by interacting with their environment. This playground provides an organized collection of popular RL algorithms to help you understand, implement, and compare their performance on the classic OpenAI Gym environment - CartPole.

OpenAI Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It provides a variety of environments ranging from classic control tasks to Atari 2600 games. Here are some of the environments we'll be working with:

CartPole-v1 Pendulum-v0 MountainCar-v0
images/mountain car.gif Pendulum-v0 mountain-car

Algorithms

Policy Gradient

Policy Gradient methods focus on directly learning the policy and the agent's strategy to make decisions. Implement and experiment with policy gradient algorithms like REINFORCE, A2C (Advantage Actor-Critic), and more.

Results

cartpolebe227de0-9bf6-4c2e-95db-66e4255a740d

DQN (Deep Q-Network)

DQN is a fundamental RL algorithm that uses a deep neural network to approximate the Q-value function. Experience the power of Q-learning and deep neural networks in training agents to balance the CartPole.

Actor-Critic (Check here for my implementation Actor-Critic.)

Actor-critic algorithms combine the benefits of value-based and policy-based methods by maintaining both a policy (the actor) and a value function (the critic). Explore algorithms such as A3C (Asynchronous Advantage Actor-Critic) and A2C to enhance your understanding of actor-critic approaches.

PPO (Proximal Policy Optimization)

PPO is a state-of-the-art policy optimization algorithm known for its stability and sample efficiency. Dive into the world of PPO and see how it outperforms other policy gradient methods.

About

Implementation of various RL algorithms in Python using Gym and Pytorch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published