Skip to content

Commit

Permalink
saving VAE graph NN
Browse files Browse the repository at this point in the history
  • Loading branch information
jbris committed Sep 26, 2023
1 parent 901f5cd commit 94fd098
Show file tree
Hide file tree
Showing 8 changed files with 580 additions and 7 deletions.
37 changes: 33 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,46 @@ Website: [Nextflow Graph Machine Learning](https://jbris.github.io/nextflow-grap
- [Nextflow Graph Machine Learning](#nextflow-graph-machine-learning)
- [Table of contents](#table-of-contents)
- [Introduction](#introduction)
- [The pipeline](#the-pipeline)
- [The Nextflow pipeline](#the-nextflow-pipeline)
- [Python Environment](#python-environment)
- [MLOps](#mlops)
- [ArangoDB](#arangodb)

# Introduction

The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs).
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.

# The pipeline
[For developing GNNs, we use PyTorch Geometric.](https://pytorch-geometric.readthedocs.io/en/latest/)

# The Nextflow pipeline

[Nextflow has been included to orchestrate the GRN reconstruction pipeline.](https://www.nextflow.io/)

The pipeline is composed of the following steps:

1. Exploratory data analysis: View the GRN and calculate some summary statistics.
2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.
3. ArangoDB Importing: Import the graph into ArangoDB.
4. Train a graph neural network using SAGE convolutional layers.
4. GNN training: Train a GNN using SAGE convolutional layers.
5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.

# Python Environment

[Python dependencies are specified in this requirements.txt file.](services/python/requirements.txt).

These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0

Execute the following command to pull the image: *docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0*

## MLOps

* [A Docker compose file has been provided to launch an MLOps stack.](docker-compose.yml)
* [See the .env file for Docker environment variables.](.env)
* [The docker_up.sh script can be executed to launch the Docker services.](scripts/docker_up.sh)
* [DVC is included for data version control.](https://dvc.org/)
* [MLFlow is available for experiment tracking.](https://mlflow.org/)
* [MinIO is available for storing experiment artifacts.](https://min.io/)

# ArangoDB

[This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.](https://docs.arangodb.com/3.11/data-science/adapters/arangodb-networkx-adapter/)
Loading

0 comments on commit 94fd098

Please sign in to comment.