Skip to content

This repository contains the source code necessary to analyze DArTseqMet data, identifying the DNA methylations present in a sample on a genome-wide scale.

Notifications You must be signed in to change notification settings

wendelljpereira/DArTseqMet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: GPL v3

Computational protocol for the analysis of DArTseqMet data.

This repository contains the source code necessary to analyze DArTseqMet data, a restriction enzyme genome reduction technique capable of identifying the DNA methylations in a sample on a genome-wide scale.

The method shown here is described in the paper "A cost-effective approach to DNA methylation detection by Methyl Sensitive DArT sequencing.". Later, the same approach was used to investigate DNA methylation in clones of Eucalyptus grandis grown in contrasting environments, as described in the paper "Patterns of DNA methylation changes in elite Eucalyptus clones across contrasting environments".

Installation

This computational protocol is designed to be executed using the Snakemake workflow management system.

Step wise installation

A step-by-step installation of the major software components is given below.

The recommended method for installing Snakemake is by using Conda/Mamba, as shown below:

conda install -n base -c conda-forge mamba

Conda/mamba allows you to create different environments containing files, packages, and their dependencies that will not interact with other environments. Therefore, creating a new environment to contain the dependencies to execute this workflow is advantageous. For more information about conda and conda environments, please visit: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

Here, we create an environment named DArTseqMet while also installing Snakemake within it. Since the workflow uses bowtie2, which depends on python2.7, we need to create a second environment that can be used by Snakemake to avoid conflicts between software that relies on different versions of Python.

conda activate base
mamba create -c conda-forge -c bioconda -n DArTseqMet snakemake r-base -y

# Here, we create a new environment for bowtie two, then export it as a yaml file that can be called in the rules of the workflow that depend on it.
mamba create -c conda-forge -c bioconda -n bowtie2_env python=2.7 bowtie2

conda activate bowtie2_env
conda env export > bowtie2.yaml
conda deactivate

Next, we activate the DArTseqMet and install other necessary software for the workflow.

conda activate DArTseqMet

## Installing Trimmomatic from the bioconda channel
conda install -c bioconda trimmomatic
## Installing samtools from the bioconda channel
conda install -c bioconda samtools
## Installing bedtools from the bioconda channel
conda install -c bioconda bedtools
## Installing subread from the bioconda channel
mamba install -c bioconda subread
## Installing fastqc from the bioconda channel
conda install -c bioconda fastqc

We also need to install some R packages.

mamba install -c conda-forge r-docopt r-tidyverse r-data.table r-gdata r-gridextra  r-essentials 

Unfortunately, mamba does not work when installing Bioconductor packages. Therefore, we install the necessary Bioconductor packages directly in R using the command line below.

R -e "install.packages('BiocManager', repos='http://cran.us.r-project.org'); library('BiocManager'); BiocManager::install('DESeq2'); BiocManager::install('biostrings'); BiocManager::install('edgeR'); BiocManager::install('VennDiagram')"

Executing the analysis

Adjusting the config.yaml file.

Users need to adjust the config.yaml file to inform samples names and other parameters required for the analyses. Note that some files are required to be in a specific format. Files format and other restrictions are listed on the file config.yaml.

Executing the workflow

After adjusting the config.yaml file, the execution of the workflow is as simple as running one command line.

snakemake -p -c 7 --use-conda all

Note that some parameters are required:

  • "-c" control the number of cores to be used. This value must be the same as informed in the configuration file (config.yaml)
  • "--use-conda" allows the workflow to take advantage of conda to build the environment for bowtie 2.

License

Distributed under the GNU General Public License v3.0. See LICENSE for more information.

About

This repository contains the source code necessary to analyze DArTseqMet data, identifying the DNA methylations present in a sample on a genome-wide scale.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published