This is a final project of Reinforcement learning course at skoltech which is devoted for stabilising double inverted pendulum by Kovalev.V.V and Maximilian.P
- stabilizing cart pole with the horizontal axis, when initstate =
$[0,\pi / 2 + \sigma,\pi / 2 + \sigma,0,0,0]$ - force the swing up of pendulum
Algorithms which was used
- Proximal policy optimization (PPO)
- Model predictive control (MPC)
- Deep double Q-network (DDQN)
Folder name | Description |
---|---|
1_pole_inverted_pendulum |
Source code for simple case of inverted pendulum |
DDQN |
Results of training deep double Q-network |
MPC |
Model predictive control using Casadi optimization |
PPO |
Results of training Proximal policy optimization |
SAC |
Results of training Soft Actor critic |
File name | Description |
---|---|
Dynamics.py |
Double inverted pendulum dynamics which was written on Casadi |
Environment.py |
Containing the SAc agent class |
networks.py |
Networks in used by agents (Actor, Critic and Value networks) |
utils.py |
General utility functions |
buffer.py |
A replay buffer class, used for offline training |
You can see below the learning curves along with gifs of agents play the Inverted Double Pendulum and Inverted Pendulum Swing environment.
Episode rewards curves:
Random initial state (90 +- 3 degrees)
Fixed initial state (90 degrees)
Agent after 100k timesteps of training
Agent after 500k timesteps of training
Agent after 1000k timesteps of training
For each algorithm used you can try to launch and test the provided jupyter notebooks
All PPO the trained models are stored in PPO/PPO3_Trained
. In order check results launch 'Visualization.ipynb'