GitHub - Bio-protocol/unmethylated-regions_UMR-extractor-WGBS

Identifying unmethylated regions from methylation data

This workflow identifies stably unmethylated regions in plant genomes using methylation data.

Installation

Running environment:
- The workflow was constructed based on macOS Catalina 10.15.7 running the Oracle v1.8 java runtime environment (JREs). However, you can also run this using your preferred Linux distribution.
Required software and versions:
- Trim_galore! v0.6.4_dev
- Cutadapt v1.8.1
- fastQC v0.11.5
- BSMAP v2.74
- samtools v1.3 - samtools v0.1.18 is also required to run BSMAP
- bamtools v2.4.0
- Java v1.8.0_45
- Picard v2.9.0
- bamUtil v1.0.13
- Python v2.7.5
- bedGraphToBigWig
- perl v5.26.2
- IGV v2.5.3
- R v4.1
  - tidyverse v2.0.0

Input Data

The example data used here is the paired-end fastq file generated by using the Illumina platform.

R1 FASTQ file: input/B73_chr1_subset_reads_1.fastq
R2 FASTQ file: input/B73_chr1_subset_reads_2.fastq

Each entry in a FASTQ files consists of 4 lines:

A sequence identifier with information about the sequencing run and the cluster. The exact contents of this line vary by based on the BCL to FASTQ conversion software used.
The sequence (the base calls; A, C, T, G and N).
A separator, which is simply a plus (+) sign.
The base call quality scores. These are Phred +33 encoded, using ASCII characters to represent the numerical quality scores.

The first entry of the input data:

@SRR8738272.153232
TGATTTGAAATTAAACGAATATGGAAATCGGTTTGAAGGTTTTGGAATCGAGTATAATTGGATTTACAAATGTGGTTTATGGGAATTTTTTTATGTGAAAGTTTTGATTCTGATGTATAATATTGA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@

Other input files are also required, such as:

A reference genome.
A file containing chromosome sizes. Each entry consists of two columns: the chromosome and the size of the chromosome.

Here is the example file:

maize_chr1_reference	20000

A reference genome cytosine tile file.

The file contains 6 columns: 1) The chromosome number 2) Start of the 100bp tile 3) End of the 100bp tile 4) Number of CG sites in the 100bp tile 5) Number of CHG sites in the 100bp tile 6) Number of CHH sites in the 100bp tile

Here are the first 5 lines of the example tile file:

chr	start	end	cg_sites	chg_sites	chh_sites
maize_chr1_reference	1	100	4	6	29
maize_chr1_reference	101	200	6	7	25
maize_chr1_reference	201	300	6	4	36
maize_chr1_reference	301	400	2	10	28

More example tile files can found in the example_genomes folder in the input folder. They will be provided by UQeSpace, with the DOI being available when the article is published.

Major steps

Step 1: Trimming the reads and running FastQC for quality checking

Note that you have to normalize the path in the shell script.

sh workflow/1_trim_reads.sh

Step 2: Mapping reads using BSMAP

sh workflow/2_map_reads.sh <samtool 0.1.18 path>

Step 3: View the results

Results can be converted into a bigWig format, which can be visualized using IGV.

sh 3_visualize_results.sh <bedgraph2BigWig path>

Step 4: Identify unmethylated regions

4_find_UMRs.sh

Expected results

License

It is a free and open source software, licensed under GPLv3.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
cache		cache
graphs		graphs
input		input
lib		lib
output		output
workflow		workflow
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
template.Rproj		template.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying unmethylated regions from methylation data

Installation

Input Data

Major steps

Step 1: Trimming the reads and running FastQC for quality checking

Step 2: Mapping reads using BSMAP

Step 3: View the results

Step 4: Identify unmethylated regions

Expected results

License

About

Releases

Packages

Contributors 3

Languages

Bio-protocol/unmethylated-regions_UMR-extractor-WGBS

Folders and files

Latest commit

History

Repository files navigation

Identifying unmethylated regions from methylation data

Installation

Input Data

Major steps

Step 1: Trimming the reads and running FastQC for quality checking

Step 2: Mapping reads using BSMAP

Step 3: View the results

Step 4: Identify unmethylated regions

Expected results

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages