This repository contains all the scripts needed to reproduce the data analysis and results of the manuscript "Remarkable diversity of alkaloid scaffolds in Piper fimbriulatum" (https://doi.org/10.1101/2024.12.10.627739).
- mzmine software (v4.2.0)
- SIRIUS software (v5.8.5)
- GNPS2 online platform
- Miniconda/Anaconda
- Python 3.11.0 or higher
-
To install mzmine and SIRIUS, follow the instructions provided in the corresponding online documentation (see mzmine and SIRIUS docs).
-
Concerning this GitHub repository, clone it by running the following command in your terminal:
git clone https://github.com/pluskal-lab/PiperFIM.git
- Create a new conda environment and install packages and dependencies listed in
requirements.txt
:
conda create -y --name piperfim
conda activate piperfim
conda install --file requirements.txt -y
Alternatively, you can run the activate.sh
script:
source activate.sh
- Download the
data
andresults
folder from Zenodo inside the main repository directory.
Note
Paths and names of all input and output files are listed in the config/config.yaml
file and can be changed directly from there.
Feature detection with mzmine can be reproduced using the provided batch file (mzmine_featdetect.mzbatch
in the scripts
folder) as described in Heuckeroth et al. 2024. Feature-based molecular networking (FBMN) on the GNPS2 platform and in silico chemical structure and compound class predictions with the SIRIUS software can be reproduced as described in the original publication.
The 01_lcms_dataprep.py
integrates output files from these software tools to facilitate downstream data analysis:
python scripts/01_lcms_dataprep.py
This will produce two output files: ftable_clean.csv
(mzmine-like feature table) and ntable_clean.csv
(GNPS2-like node table). The first can be used to perform statistical analysis, while the second can be importe in Cytoscape for enhanced exploration of FBMN results.
The 02_run_sparql_queries.py
script runs the SPARQL queries stored in the scripts/sparql_queries
folder, clean the results (e.g., remove duplicates) and saves the ouptut in the data/wikidata
folder. Queries are designed to retrieve all natural products that contain a specific substructure (defined by a SMILES) together with the plant genera each compound was isolated from, based on Wikidata. Literature references are also retrieved.
python scripts/02_run_sparql_queries.py
The 03_clean_wikidata.py
script cleans raw SPARQL query outputs by filtering out "unwanted substructures" erroneous reports in Wikidata as defined in the config.yaml
file. Cleaned results are saved in the results/phylo_tree/wikidata_clean
folder.
python scripts/03_run_sparql_queries.py
The 04_create_itol_annotation.py
script creates an annotation file (iTOL_scaffolds.txt
) to use in iTOL to map literature reports for each alkaloid scaffold (i.e., benzylisoquinoline, aporphine, piperolactam, piperidine, seco-benzylisoquinoline) in each genus covered in the angiorsperm tree of life published by Zuntini et al. 2024 (global_tree_brlen_pruned_renamed.tre
file). The resulting tree can be accessed at the following link.
The 05_create_small_tree.py.py
script creates a smaller version of the global_tree_brlen_pruned_renamed.tre
file by keeping only the orders where at least one alkaloid scaffold was reported. The resulting tree can be accessed at the following link.