GNN ON FAMILY PEDIGREE

Repository containing the code used for evaluating a Graph Neural Network (GNN) model for family pedigree data.

RESEARCH QUESTION:
Can we “impute” a phenotype by knowing nothing about the target individual and only leveraging information for each node in the familial pedigree?

FILES

FILE STRUCTURE

|--- / 

    pipeline_data.sh 
    pipeline_model.sh 
    run_explainability.sh 
    run_tuning.sh

    |--- data/
        statfile.csv
        maskfile.csv
        edgefile_onlyparents.csv
        featfile_chd.csv
        |--- extended_data/
            statfile_Drug.csv
            statfile_EndPt.csv
            statfile_SES.csv
            statfile_all.csv
            featfile_chd_Drug.csv
            featfile_chd_EndPt.csv
            featfile_chd_SES.csv
            featfile_chd_all.csv
        |--- scripts/
            extract_study_population.py
            extract_edge_onlyparents.py
            add_extra_features.py

    |--- src/
        main.py
        data.py
        model.py
        utils.py
        explainability.py
        my_explainability.py

    |--- logs/
    |--- output/

FILE CONTENT

data/
the folder contains all input files used for the GNN models:

statfile : contains basic information available for every patient
maskfile : specifies if patient is used in the project, if it is a target patient and if it is used to train/validate or test the model
edgefile : specifies all the graph edges i.e. all the connections between patients
featfile : needs to be manually generated, specifies the features to be used for training the model

plus the scripts used to create them:

extract_study_population.py : create statfile.csv and maskfile.csv
extract_edge_onlyparents.py : create edgefile_onlyparents.csv
add_extra_features.py : extend the main statfile with extra registry information (see extended_data folder)

NB:
extended_data/ can be substituted with another folder containing a different extension of the stafiles, e.g. using a subsample of all the available covariates

src/
the folder contains all the scripts used for the GNN:

utils.py : utility functions
model.py : GNN model architecture
data.py : construct pytorch_geometric objects
main.py : perform model train and test

plus the shell pipelines used:

pipeline_data.sh : used for extracting the study population and create the GNN input files
pipeline_models.sh : used for training and testing the desired models
run_explainability.sh : used for extracting the GNNExpaliner results on the desired model
run_tuning.sh : used for performing the hyperparameter finetuning (using Optuna package)

REFERENCES

project inspired by Sophie Wharrie's paper on a similar analysis in finregistry
PREPRINT: https://arxiv.org/abs/2304.05010

PEOPLE

CODE AUTHOR

Matteo Ferro matteo.ferro@heslinki.fi

COLLABORATORS

Zhiyu Yang zhiyu.yang@helsinki.fi
Sophie Wharrie sophie.wharrie@aalto.fi

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data/scripts		data/scripts
src		src
README.md		README.md
environment.yml		environment.yml
pipeline_data.sh		pipeline_data.sh
pipeline_models.sh		pipeline_models.sh
run_explainability.sh		run_explainability.sh
run_tuning.sh		run_tuning.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GNN ON FAMILY PEDIGREE

FILES

FILE STRUCTURE

FILE CONTENT

REFERENCES

PEOPLE

About

Releases 1

Packages

Languages

dsgelab/gnn_family_pedigree

Folders and files

Latest commit

History

Repository files navigation

GNN ON FAMILY PEDIGREE

FILES

FILE STRUCTURE

FILE CONTENT

REFERENCES

PEOPLE

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages