Skip to content

All publication material relevant for the manuscript describing the flexynesis software package

Notifications You must be signed in to change notification settings

BIMSBbioinfo/flexynesis_manuscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flexynesis_manuscript

All publication material relevant for the manuscript describing the flexynesis software package

Project Folder

Accessible from Hulk/Beast/Max: /fast/AG_Akalin/buyar/flexynesis_manuscript_work/

The ./raw folder contains the original dataset downloaded from a source such as Cbioportal/TCGA/PharmacoGx/DepMAP. The ./prepared folder contains data prepared as input to flexynesis.

Datasets used in the manuscript

Below is a description of the datasets used in the manuscript and how to prepare them for analysis with flexynesis

Downloaded Datasets

Go to /fast/AG_Akalin/buyar/flexynesis_manuscript_work/datasets:

The ./raw folder contains:

  • CCLE.rds: downloaded from Zenodo.
  • GDSC2.rds: downloaded from Zenodo.
  • lgggbm_tcga_pub.tar.gz: downloaded from cbioportal.
  • brca_metabric.tar.gz: downloaded from cbioportal.
  • depmap: downloaded from depmap portal.
  • nbl_target_2018_pub.tar.gz: downloaded from cbioportal.
  • GDCData: TCGA cohort datasets for 33 cancer types downloaded using the TCGABiolinks package (See GitHub).
  • prot-trans: protein sequence embeddings obtained from prot-trans-xl-uniref50 model on uniprot sequences.
  • describeProt: protein level sequence/structure/function features from describeprot database (Download here).

PREPARED datasets used as input to flexynesis

The ./prepared folder contains:

  • ccle_vs_gdsc: Drug response data from cell lines from CCLE and GDSC2 datasets. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.gdsc_vs_ccle.R raw/
  • lgggbm_tcga_pub_processed: Merged cohorts of LGG + GBM samples. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.LGG_GBM.R ./src/get_cbioportal_data.R
  • brca_metabric_processed: METABRIC dataset processed.
/opt/R/4.2/bin/Rscript ./src/prepare_data.metabric.R ./src/get_cbioportal_data.R
  • single_cell_bonemarrow: CITE-Seq dataset from Seurat. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.cite_seq.R
  • neuroblastoma_target_vs_depmap: neuroblastoma patient samples (TARGET study) and cell lines (depmap). Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.neuroblastoma_finetuning.R ./src/get_cbioportal_data.R ./raw/depmap/ ./src/utils.R
  • tcga_cancertype: TCGA cancer cohort for ~21 cancer types 100 samples per each cohort. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.tcga_cancertype.R ./src/utils.R ./raw/GDCdata
  • depmap_gene_dependency: Dataset for gene-dependency prediction in cell lines. Consists of depmap gene expression + prottrans embeddings + describeprot features. Command:
/opt/R/4.2/bin/Rscript ./src/prepare_data.depmap.R ./src/utils.R ./raw/depmap/ ./raw/prot-trans/embeddings.protein_level.csv ./raw/uniprot2hgnc.RDS ./raw/describePROT/9606_value.csv

Figures

How to reproduce figures:

Go to /fast/AG_Akalin/buyar/flexynesis_manuscript_work/analyses:

Activate guix environment: .. code-block:: bash

source ../flexynesis_manuscript/manuscript/etc/profile

Figure 1: single-task figures

Rscript ../flexynesis_manuscript/src/figures_single_task.R ../flexynesis_manuscript/src/utils.R ./output2

Figures 2 and 3: multi-task figures

Rscript ../flexynesis_manuscript/src/figures_multitask.R ../flexynesis_manuscript/src/utils.R ./output2

Figure 4: unsupervised clustering (tcga cancer types)

Rscript ../flexynesis_manuscript/src/figures_tcga_unsupervised.R ../flexynesis_manuscript/src/utils.R ./unsupervised_cancertype/

Figure 5: cross-modality prediction of cell line dependency probabilities

Rscript ../flexynesis_manuscript/src/figures_depmap.R ../datasets/prepared/depmap_gene_dependency/ depmap_analysis/output/

Figure 6: demonstration of fine-tuning

Rscript ../flexynesis_manuscript/src/figures_finetuning.R ../flexynesis_manuscript/src/utils.R finetuning/

Figure 7: marker analysis

Rscript ../flexynesis_manuscript/src/figures_marker_analysis.R ../flexynesis_manuscript/src/utils.R marker_analysis/output/

Figure 8: benchmark summary

Rscript ../flexynesis_manuscript/src/figures_benchmarks.R benchmarks/output2

Environment

Install flexynesis

git clone https://github.com/BIMSBbioinfo/flexynesis.git
cd flexynesis
conda create -n flexynesis --file spec-file.txt
conda activate flexynesis
pip install -e .

Install other packages

guix package --manifest=guix.scm --profile=./manuscript

Activate environment

source ./manuscript/etc/profile
conda activate flexynesis

Documentation

Flexynesis documentation is built and served on bimsbstatic.

  1. Navigate to /data/bimsbstatic/public/akalin/buyar/flexynesis
  2. Run mkdocs build => this generates a website in ./site
  3. The documentation is served at https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis/site/

Manuscript

About

All publication material relevant for the manuscript describing the flexynesis software package

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published