Reinforcement learning for stabilising double inverted pendulum

This is a final project of Reinforcement learning course at skoltech which is devoted for stabilising double inverted pendulum by Kovalev.V.V and Maximilian.P

We pursued several problems:

stabilizing cart pole with the horizontal axis, when initstate = $[0,\pi / 2 + \sigma,\pi / 2 + \sigma,0,0,0]$
force the swing up of pendulum

Algorithms which was used

Proximal policy optimization (PPO)
Model predictive control (MPC)
Deep double Q-network (DDQN)

Folders and Files Description

Folders and files

Folder name	Description
`1_pole_inverted_pendulum`	Source code for simple case of inverted pendulum
`DDQN`	Results of training deep double Q-network
`MPC`	Model predictive control using Casadi optimization
`PPO`	Results of training Proximal policy optimization
`SAC`	Results of training Soft Actor critic

Files

File name	Description
`Dynamics.py`	Double inverted pendulum dynamics which was written on Casadi
`Environment.py`	Containing the SAc agent class
`networks.py`	Networks in used by agents (Actor, Critic and Value networks)
`utils.py`	General utility functions
`buffer.py`	A replay buffer class, used for offline training

You can see below the learning curves along with gifs of agents play the Inverted Double Pendulum and Inverted Pendulum Swing environment.

Proximal policy optimization

Episode rewards curves:

Random initial state (90 +- 3 degrees)

Fixed initial state (90 degrees)

Agent after 100k timesteps of training

Agent after 500k timesteps of training

Agent after 1000k timesteps of training

Model predictive control

Balancing task

Swing up

How to use

For each algorithm used you can try to launch and test the provided jupyter notebooks

All PPO the trained models are stored in PPO/PPO3_Trained. In order check results launch 'Visualization.ipynb'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement learning for stabilising double inverted pendulum

We pursued several problems:

Folders and Files Description

Folders and files

Files

Proximal policy optimization

Model predictive control

How to use

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement learning for stabilising double inverted pendulum

We pursued several problems:

Folders and Files Description

Folders and files

Files

Proximal policy optimization

Model predictive control

How to use