This repository contains the source code necessary to analyze DArTseqMet data, a restriction enzyme genome reduction technique capable of identifying the DNA methylations in a sample on a genome-wide scale.
The method shown here is described in the paper "A cost-effective approach to DNA methylation detection by Methyl Sensitive DArT sequencing.". Later, the same approach was used to investigate DNA methylation in clones of Eucalyptus grandis grown in contrasting environments, as described in the paper "Patterns of DNA methylation changes in elite Eucalyptus clones across contrasting environments".
This computational protocol is designed to be executed using the Snakemake workflow management system.
A step-by-step installation of the major software components is given below.
The recommended method for installing Snakemake is by using Conda/Mamba, as shown below:
conda install -n base -c conda-forge mamba
Conda/mamba allows you to create different environments containing files, packages, and their dependencies that will not interact with other environments. Therefore, creating a new environment to contain the dependencies to execute this workflow is advantageous. For more information about conda and conda environments, please visit: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
Here, we create an environment named DArTseqMet while also installing Snakemake within it. Since the workflow uses bowtie2, which depends on python2.7, we need to create a second environment that can be used by Snakemake to avoid conflicts between software that relies on different versions of Python.
conda activate base
mamba create -c conda-forge -c bioconda -n DArTseqMet snakemake r-base -y
# Here, we create a new environment for bowtie two, then export it as a yaml file that can be called in the rules of the workflow that depend on it.
mamba create -c conda-forge -c bioconda -n bowtie2_env python=2.7 bowtie2
conda activate bowtie2_env
conda env export > bowtie2.yaml
conda deactivate
Next, we activate the DArTseqMet and install other necessary software for the workflow.
conda activate DArTseqMet
## Installing Trimmomatic from the bioconda channel
conda install -c bioconda trimmomatic
## Installing samtools from the bioconda channel
conda install -c bioconda samtools
## Installing bedtools from the bioconda channel
conda install -c bioconda bedtools
## Installing subread from the bioconda channel
mamba install -c bioconda subread
## Installing fastqc from the bioconda channel
conda install -c bioconda fastqc
We also need to install some R packages.
mamba install -c conda-forge r-docopt r-tidyverse r-data.table r-gdata r-gridextra r-essentials
Unfortunately, mamba does not work when installing Bioconductor packages. Therefore, we install the necessary Bioconductor packages directly in R using the command line below.
R -e "install.packages('BiocManager', repos='http://cran.us.r-project.org'); library('BiocManager'); BiocManager::install('DESeq2'); BiocManager::install('biostrings'); BiocManager::install('edgeR'); BiocManager::install('VennDiagram')"
Users need to adjust the config.yaml file to inform samples names and other parameters required for the analyses. Note that some files are required to be in a specific format. Files format and other restrictions are listed on the file config.yaml.
After adjusting the config.yaml file, the execution of the workflow is as simple as running one command line.
snakemake -p -c 7 --use-conda all
Note that some parameters are required:
- "-c" control the number of cores to be used. This value must be the same as informed in the configuration file (config.yaml)
- "--use-conda" allows the workflow to take advantage of conda to build the environment for bowtie 2.
Distributed under the GNU General Public License v3.0. See LICENSE
for more information.