DistrGNN is a research project supported by a scholarship from the University of Pisa. It focuses on investigating and implementing pipeline parallelism in Graph Neural Networks (GNNs). The study case is based on the methodology outlined in the paper Vision GNN: An Image is Worth Graph of Nodes by Han et al. (2022), presented at Advances in Neural Information Processing Systems (NeurIPS). The project explores the efficient distribution of GNN computations across multiple computing nodes, with a particular emphasis on pipeline parallelism. Additionally, it examines the combination of pipeline parallelism and data parallelism to optimize performance and scalability in large-scale GNN training tasks.
More details on this study, including technical insights and experimental results, are available in the project's report report.pdf.
The project code is located in the src
directory. Here's a breakdown of the key files:
model.py
: The file contains the implementation of the model (both the sequential and the "pipeline" version).seq.py
: Script for running the sequential GNN model.pipe.py
: Script for running the GNN model with pipeline parallelism.data_pipe.py
: Script for running the GNN model with combined data and pipeline parallelism.report.pdf
: Project report with detailed technical insights and experimental results.
- Clone the repository to your local machine:
git clone https://github.com/JacopoRaffi/DistributedGNN.git cd DistriutedGNN
- Install all the dependencies:
pip install -r requirements.txt
Execution of the sequential model
cd src
python3 seq.py --filename log_file.csv
Example of execution of a 2-stage pipeline
cd src
torchrun --nproc_per_node=1 --nnodes=2 pipe.py --filename log_file.csv
Executing the combination wit two copies and each copies splitted in a 2-stage Pipeline
cd src
torchrun --nproc_per_node=1 --nnodes=4 data_pipe.py --filename log_file.csv
I would like to thank Prof. Patrizio Dazzi and the University of Pisa for this opportunity.
[1] Han, K., Wang, Y., Guo, J., Tang, Y., & Wu, E. (2022). Vision GNN: An image is worth a graph of nodes. In Advances in Neural Information Processing Systems (Vol. 35, pp. 8291-8303) [Curran Associates, Inc.]. https://proceedings.neurips.cc/paper_files/paper/2022/hash/3743e69c8e47eb2e6d3afaea80e439fb-Abstract-Conference.html
[2] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q. V., Wu, Y., & Chen, Z. (2019). GPipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems (Vol. 32) [Curran Associates, Inc.]. https://arxiv.org/abs/1811.06965