Skip to content

Latest commit

 

History

History
85 lines (53 loc) · 3.44 KB

README.md

File metadata and controls

85 lines (53 loc) · 3.44 KB

Reinforcement learning for stabilising double inverted pendulum

This is a final project of Reinforcement learning course at skoltech which is devoted for stabilising double inverted pendulum by Kovalev.V.V and Maximilian.P

We pursued several problems:

  • stabilizing cart pole with the horizontal axis, when initstate = $[0,\pi / 2 + \sigma,\pi / 2 + \sigma,0,0,0]$
  • force the swing up of pendulum

Algorithms which was used

  • Proximal policy optimization (PPO)
  • Model predictive control (MPC)
  • Deep double Q-network (DDQN)

Folders and Files Description

Folders and files

Folder name Description
1_pole_inverted_pendulum Source code for simple case of inverted pendulum
DDQN Results of training deep double Q-network
MPC Model predictive control using Casadi optimization
PPO Results of training Proximal policy optimization
SAC Results of training Soft Actor critic

Files

File name Description
Dynamics.py Double inverted pendulum dynamics which was written on Casadi
Environment.py Containing the SAc agent class
networks.py Networks in used by agents (Actor, Critic and Value networks)
utils.py General utility functions
buffer.py A replay buffer class, used for offline training

You can see below the learning curves along with gifs of agents play the Inverted Double Pendulum and Inverted Pendulum Swing environment.

Proximal policy optimization

Episode rewards curves:

Random initial state (90 +- 3 degrees)

Fixed initial state (90 degrees)

Agent after 100k timesteps of training PPO_100K

Agent after 500k timesteps of training PPO_500K

Agent after 1000k timesteps of training 2022-10-24-19-20-36

Model predictive control

Balancing task balancing

Swing up swingup

How to use

For each algorithm used you can try to launch and test the provided jupyter notebooks

All PPO the trained models are stored in PPO/PPO3_Trained. In order check results launch 'Visualization.ipynb'