Reinforcement learning for stabilising double inverted pendulum

This is a final project of Reinforcement learning course at skoltech which is devoted for stabilising double inverted pendulum by Kovalev.V.V and Maximilian.P

We pursued several problems:

stabilizing cart pole with the horizontal axis, when initstate = $[0,\pi / 2 + \sigma,\pi / 2 + \sigma,0,0,0]$
force the swing up of pendulum

Algorithms which was used

Proximal policy optimization (PPO)
Model predictive control (MPC)
Deep double Q-network (DDQN)

Folders and Files Description

Folders and files

Folder name	Description
`1_pole_inverted_pendulum`	Source code for simple case of inverted pendulum
`DDQN`	Results of training deep double Q-network
`MPC`	Model predictive control using Casadi optimization
`PPO`	Results of training Proximal policy optimization
`SAC`	Results of training Soft Actor critic

Files

File name	Description
`Dynamics.py`	Double inverted pendulum dynamics which was written on Casadi
`Environment.py`	Containing the SAc agent class
`networks.py`	Networks in used by agents (Actor, Critic and Value networks)
`utils.py`	General utility functions
`buffer.py`	A replay buffer class, used for offline training

You can see below the learning curves along with gifs of agents play the Inverted Double Pendulum and Inverted Pendulum Swing environment.

Proximal policy optimization

Episode rewards curves:

Random initial state (90 +- 3 degrees)

Fixed initial state (90 degrees)

Agent after 100k timesteps of training

Agent after 500k timesteps of training

Agent after 1000k timesteps of training

Model predictive control

Balancing task

Swing up

How to use

For each algorithm used you can try to launch and test the provided jupyter notebooks

All PPO the trained models are stored in PPO/PPO3_Trained. In order check results launch 'Visualization.ipynb'

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
1_pole_inverted_pendulum		1_pole_inverted_pendulum
DDQN		DDQN
MPC		MPC
PPO		PPO
SAC		SAC
Dynamics.py		Dynamics.py
Dynamics_old.py		Dynamics_old.py
Environment.py		Environment.py
Environment_old.py		Environment_old.py
MPC.ipynb		MPC.ipynb
MPC_Solver.py		MPC_Solver.py
README.md		README.md
Visualization.ipynb		Visualization.ipynb
animation.gif		animation.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement learning for stabilising double inverted pendulum

We pursued several problems:

Folders and Files Description

Folders and files

Files

Proximal policy optimization

Model predictive control

How to use

About

Releases

Packages

Contributors 2

Languages

Slavoch/RL_double_pendulum

Folders and files

Latest commit

History

Repository files navigation

Reinforcement learning for stabilising double inverted pendulum

We pursued several problems:

Folders and Files Description

Folders and files

Files

Proximal policy optimization

Model predictive control

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages