-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the sRNA-workflow wiki!
The pipeline was build around a core of several modules from the publicly available University of East Anglia small RNA workbench (UEA sRNA WB, [2]), which were deployed to a dedicated Linux server to be used via the command line interface, with UNIX shell scripts performing basic data input and output operations.
The pipeline is subdivided into 5 principal steps:
Preprocessing
Filtering and identification of conserved miRNAs
TASI identification
Mircat (Identification of miRNA precursores)
PareSnip (Identification of targets)
Preprocessing encompasses the extraction of the compressed data in .tar.gz format, adapter removal and conversion to FASTA format and filtering reads.
Fasteris data The processed data from FASTERIS comes in a compressed archive were reads are separated by size. The first step preforms a sequential extraction of each given library. The resulting files are concatenated into a libXX.fastq file for each library. For each library the read counts are stored in a tab separated file (.tsb) in the count directory of the project identified with the respective libraries which it represents. Afterwards the fastq files are converted sequencially to fasta. During the conversion process the quality plots of the FASTQ reads are plotted and stored in the data>FASTQ directory of the project.
LCSciences The process is similar as for fasteris data however since the files are not separated by size there is no need for concatenation. The compressed file with the raw reads is extracted and converted to fasta. The quality plots are also stored in the data>quality directory of the respective project. Since the fasta reads still have adapters they are trimmed using the adapter stored in the config file workdirs.cfg This process will be changed in the future and the adapters will be trimmed from the FASTQ file instead of the FASTA file but at the moment there was a bug in the FASTQ trimming process that has already been resolved.
The final process in the preprocessing step is the filtering of low complexity, t rRNA, abundance and size. This process is done with the UEA wbench filter using the respective config file settings. Default settings: Min abundance: 5 Min size: 18 Max size: 26
Filtering and identification of conserved miRNAs is done with patman the reads are aligned against the genome supplied in the workdir.cfg config file with 0 mismatch. The resulting reads are then aligned with patman against the mirbase database with 0 mismatches allowed. Reads that align perfectly are stored in the libxx-cons.fa with the mirbase family in the header of the read. While all reads that don't align are sent to the libxx-noncons.fa file.
TASI All non-conserved reads used to search for TASI using UEA workbench TA-SI prediction.
MIRCAT Searches in the genome for miRNA precursors of the non-conserved reads.
PAREsnip Searches the conserved and non-conserved reads identified by mircat for targets in the transcriptome.
How to start: Make sure you have all the software necessary (Check list) UEA Workbench Optimized for linux version (~3.2) Srna-tools | toolbench | perl Java optimized for version (~1.7) Set up the variables in the config dir. You should also have the following software configured in your path Patman Tar sudo apt-get install tar Fastx Toolkit bowtie1 run sRNAworkFlow.sh Analysing inserts from fasteris
config - Directory that has all the variables for the workflow. workdirs.cfg- Sets variables with directories and files necessary for the project. workdir - path to workdir (will create one if it doesn't exist) genomes path to genomes MEMORY - Amount of memory to be used my java when using memory intensive scripts. Ex:10g, 2000m ... THREADS - Number of cores to be used during execution Inserts_DIR Path to the inserts directory (Fasteris) mirbase path to mirbase database software_dirs.cfg - Sets the directory paths to all major programs filter*.cfg - General parameters for wbench wbench_mircat.cfg - General parameters for mircat wbench_tasi.cfg - General parameters for TaSi.