This repository is part of a group project for the module COMP0124 Multi-agent Artificial Intelligence (2023/24) of University College London. The project expands upon the value-based tie breaking mechanism introduced in the paper titled "SCRIMP: Scalable Communication for Reinforcement- and Imitation-Learning-Based Multi-Agent Pathfinding" (Wang et al., 2023). This repository was forked from the authors' repository SCRIMP and contains modifications to the original code to align with our analysis.
- Add conda environment file for python 3.9.
- Renamed
driver.py
totrain_model.py
. - Add
multi_train.py
for training multiple models consecutively. - Add argument parser for model training and evaluation scripts.
- Upload final model to wandb automatically.
- Log model evaluation result to wandb's table.
- Add block factor and congestion factor to the probability construction in the tie breaking mechanism.
-
Install
python==3.9
and the project dependencies-
Using conda, or
$ conda env create -f environment.yml $ conda activate maai
-
pip
$ pip install -r requirement.txt
-
-
Setup the OdrM* package
-
Build the package
$ cd od_mstar3 $ python setup.py build_ext --inplace $ cd ..
-
Testing
$ python >>> import od_mstar3.cpp_mstar # should be done without any error
-
-
Setup wandb for real-time training monitoring and evaluation result
-
Register an account https://wandb.ai/site
-
Login to wandb on the machine
$ conda activate maai # make sure the environment is on $ wandb login # then follows the instructions
-
-
Train a single model
-
Set training parameters in
alg_parameters.py
. -
Run the single training script.
$ python train_model.py
-
Trained models will be stored in the corresponding experiment directory in
models/MAPF/
asnet_checkpoint.pkl
and uploaded in wandb if 1RecordingParameters.wandbis set to
True`.
-
-
Train multiple models
-
Set multiple sets of training configs using
CONFIG_SETS
inmulti_train.py
. -
Run the multi training script to train the models one by one.
$ python multi_train.py
-
Trained models will be stored in the corresponding experiment directory in
models/MAPF/
asnet_checkpoint.pkl
and uploaded in wandb if 1RecordingParameters.wandbis set to
True`.
-
-
Evaluate a single model
-
Locate the model's path, e.g.
models/MAPF/expt1/final/net_checkpoint.pkl
. -
Run the evaluation script.
$ python eval_model.py models/MAPF/expt1/final/ -n expt --gpu
Notes:
- The model's directory is used instead of the path to
net_checkpoint.pkl
. - The argument after
-n
specifies the name of experiment. - The
--gpu
specifies the use of gpu for evaluation.
- The model's directory is used instead of the path to
-
Evaluation results are printed in the terminal and uploaded to wandb.
-
alg_parameters.py
- Training parameters.
train_model.py
- Single model training program. Holds global training network for PPO.
multi_train.py
- Multi models training program. Allow setting multiple sets of training parameters.
runner.py
- A single process for collecting training data.
eval_model.py
- Single model evaluation program.
mapf_gym.py
- Defines the classical Reinforcement Learning environment of Multi-Agent Pathfinding.
episodic_buffer.py
- Defines the episodic buffer used to generate intrinsic rewards.
model.py
- Defines the neural network-based operation model.
net.py
- Defines network architecture.
Tian Ruen Woon (tianruen)
Ruibo Zhang (RuiboZhang1)
Yuen Chung Chan (chan-yc)