This repository contains the source code used to produce the results obtained in Reinforcement Learning with Model Predictive Control for Highway Ramp Metering submitted to IEEE Transactions on Intelligent Transportation Systems.
In this work, we propose to formulate the ramp metering control problem as a Markov Decision Process (MDP) and solve it using Reinforcement Learning (RL), where Model Predictive Control (MPC) acts as the function approximator. This combination allows us to leverage both the flexible, data-driven nature of RL and structured, model-based approach of MPC to come up with a learning-based control scheme that is able to tune its parametrisation automatically to enhance closed-loop performance.
If you find the paper or this repository helpful in your publications, please consider citing it.
@article{airaldi2023reinforcement,
title = {Reinforcement Learning with Model Predictive Control for Highway Ramp Metering},
author = {Airaldi, Filippo and De Schutter, Bart and Dabiri, Azita},
journal={arXiv preprint arXiv:2311.08820},
year = {2023},
doi = {10.48550/ARXIV.2311.08820},
url = {https://arxiv.org/abs/2311.08820}
}
The code was created with Python 3.11.4
. To access it, clone the repository
git clone https://github.com/FilippoAiraldi/mpcrl-for-ramp-metering.git
cd mpcrl-for-ramp-metering
and then install the required packages by, e.g., running
pip install -r requirements.txt
In case you want to simulate also the additional agents contained in other_agents
(explained below), please also install the corresponding requirements, e.g.,
pip install -r other_agents/requirements-pi-alinea.txt
The repository code is structured in the following way (in alphabetical order)
metanet
contains the implementation of the training environment, which represent the traffic network benchmark, and is based on the METANET modelling framework, implemented in sym-metanet. The API follows the standarda OpenAI'sgym
stylempc
contains the implementation of the MPC optimization scheme, which is based on csnlpother_agents
contains implementations of the other agents compared against in the paper (non-learning MPC, PI-ALINEA, and DDPG).resouces
contains media and other miscellaneous resourcesrl
contains the implementation of the MPC-based RL agents, which are based on mpcrlsim
contains lzma-compressed simulation results of different variants of the proposed approachutil
contains utility classes and functions for, e.g., constant, plotting, I/O, etc.launch.py
launches simulations for different agentsvisualization.py
visualizes the simulation results.
Training and evaluation simulations can easily be launched via the command below. The provided arguments are set to reproduce the same main results found in the paper, assuming there are no discrepancies due to OS, CPU, etc.. For help about the implications of each different argument, run
python launch.py --help
In what follows, we provide the commands to reproduce the main results in the paper, for each type of agent. Note that the runname
variable is used to name the output file, which will be saved under the filename ${runname}.xz
.
Train with
python launch.py --agent-type=lstdq --gamma=0.98 --update-freq=240 --lr=1.0 --lr-decay=0.925 --max-update=0.3 --replaymem-size=2400 --replaymem-sample=0.5 --replaymem-sample-latest=0.5 --exp-chance=0.5 --exp-strength=0.025 --exp-decay=0.5 --agents=15 --episodes=80 --scenarios=2 --demands-type=random --sym-type=SX --seed=0 --verbose=1 --n-jobs=15 --runname=${runname}
Evaluate with
python launch.py --agent-type=nonlearning-mpc --gamma=0.98 --agents=15 --episodes=80 --scenarios=2 --demands-type=random --sym-type=SX --seed=0 --verbose=1 --n-jobs=15 --runname=${runname}
Evaluate with
python launch.py --agent-type=pi-alinea --Kp=32.07353865774536 --Ki=0.5419114131900662 --queue-management --agents=15 --episodes=80 --scenarios=2 --demands-type=random --sym-type=SX --seed=0 --verbose=1 --n-jobs=15 --runname=${runname}
The proportional and integral gains in PI-ALINEA can be fine-tuned by running
python other_agents/pi_alinea --tuned --n-trials=100 --agent=8
Train with
python launch.py --agent-type=ddpg --lr=1e-3 --gamma=0.98 --tau=1e-2 --batch-size=512 --buffer-size=200_000 --noise-std=0.3 --noise-decay-rate=4.75e-5 --devices=${your_devices} --agents=15 --episodes=80 --scenarios=2 --demands-type=random --sym-type=SX --seed=0 --verbose=1 --n-jobs=4 --runname=${runname}
To visualize simulation results, simply run
python visualization.py ${runname1}.xz ... ${runnameN}.xz --all
You can additionally pass --paper
, which will cause the paper's figures (or their ancestors) to be created. In this case, some of the simulation results' filepaths have been hardcoded for simplicity. For example, run the following to reproduce a part of the main figures in the paper
python visualization.py sims/sim_15_dynamics_a_rho_wo_track_higher_var.xz --all --paper
Please note that not all the simulations support the --all
flag, as some of them do not cotain all the necessary information to create all the figures (depending on the simulated agent type).
Here we clarify the naming convention used for the saved simulation results, that can be found in the sims
folder. Note that in each of the saved files, after decompression, you can find the arguments that were used to launch the simulation, as well as the simulation results themselves (which may differ from agent type to agent type).
Filenames always start with the name of the algorithm used, followed by the number of agents that were simulated. Then, additional information on each simulation can follow
- MPC-based RL: for these simulations (a.k.a.,
lstdq
), we also report whether and which of the dynamics parameters (amonga
,rho_crit
, andv_free
) were allowed to be learnt, and if these were also used as tracking setpoints in the MPC objective (more details in the paper) - PI-ALINEA: included is also whether the queue management strategy was enabled or not
The repository is provided under the GNU General Public License. See the LICENSE file included with this repository.
Filippo Airaldi, PhD Candidate [f.airaldi@tudelft.nl | filippoairaldi@gmail.com]
Delft Center for Systems and Control in Delft University of Technology
This research is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 101018826 - CLariNet).
Copyright (c) 2023 Filippo Airaldi.
Copyright notice: Technische Universiteit Delft hereby disclaims all copyright interest in the program “mpcrl-for-ramp-metering” (Reinforcement Learning with Model Predictive Control for Highway Ramp Metering) written by the Author(s). Prof. Dr. Ir. Fred van Keulen, Dean of 3mE.