Skip to content

Reinforcement learning for stabilising double inverted pendulum

Notifications You must be signed in to change notification settings

Slavoch/RL_double_pendulum

Repository files navigation

Reinforcement learning for stabilising double inverted pendulum

This is a final project of Reinforcement learning course at skoltech which is devoted for stabilising double inverted pendulum by Kovalev.V.V and Maximilian.P

We pursued several problems:

  • stabilizing cart pole with the horizontal axis, when initstate = $[0,\pi / 2 + \sigma,\pi / 2 + \sigma,0,0,0]$
  • force the swing up of pendulum

Algorithms which was used

  • Proximal policy optimization (PPO)
  • Model predictive control (MPC)
  • Deep double Q-network (DDQN)

Folders and Files Description

Folders and files

Folder name Description
1_pole_inverted_pendulum Source code for simple case of inverted pendulum
DDQN Results of training deep double Q-network
MPC Model predictive control using Casadi optimization
PPO Results of training Proximal policy optimization
SAC Results of training Soft Actor critic

Files

File name Description
Dynamics.py Double inverted pendulum dynamics which was written on Casadi
Environment.py Containing the SAc agent class
networks.py Networks in used by agents (Actor, Critic and Value networks)
utils.py General utility functions
buffer.py A replay buffer class, used for offline training

You can see below the learning curves along with gifs of agents play the Inverted Double Pendulum and Inverted Pendulum Swing environment.

Proximal policy optimization

Episode rewards curves:

Random initial state (90 +- 3 degrees)

Fixed initial state (90 degrees)

Agent after 100k timesteps of training PPO_100K

Agent after 500k timesteps of training PPO_500K

Agent after 1000k timesteps of training 2022-10-24-19-20-36

Model predictive control

Balancing task balancing

Swing up swingup

How to use

For each algorithm used you can try to launch and test the provided jupyter notebooks

All PPO the trained models are stored in PPO/PPO3_Trained. In order check results launch 'Visualization.ipynb'

About

Reinforcement learning for stabilising double inverted pendulum

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published