2. Quick Start

Did you just get your hands on some short-read sequencing data and want to access the quality of the sequencing data? If so you might not want to read the documentation on how to use seQuoia and how it works! Chances are you just want to know:

a) is my data good quality? (sequencing issues)

b) is there any indication of contaminated? (sample issues)

To answer these, you can just run the basic or fast-basic (uses subsampled reads) workflows. These workflows have two QC modules: FastQC and Centrifuge, to assess sequencing and sample contamination issues, respectively.

Here is how you would run these workflows:

# run sheppard using bash wrapper to load environments automatically

bash /path/to/seQuoia/bin/sheppard.sh \
--meta full_meta_information.txt \ (see page describing how to create such a file!)
--illumina sequencing_data/ \ (see page describing how to create the file.)
--illumina_data_format illumina-paired \ (could be illumina-paired, illumina-single, or gp-directory)
--workflow /cil/shed/apps/internal/seQuoia/workflows/basic.py \ (can switch to fast_basic.py)
--outdir seQuoia_repos/ \
--poolsize 30 \ (limit to 30 samples at a time)
--cluster UGES (assumes you have access to gscid privileges on UGES / UGER)

Now you will likely be disappointed as you just generated an output directory with a bunch of subdirectories per sample without any comprehensive information file. In the next step, you will run the second step of the seQc suite which will generate such easy to interpret overviews, largely leveraging the amazing MultiQC suite.

WARNING: This step requires a decent amount of memory, depending on the number of samples you are looking at!! Please ish / qrsh interactively into a node prior to running.

# run reporter using bash wrapper to load environments automatically
bash /path/to/seQuoia/bin/reporter.sh \
--input /path/to/seQc_repos/ \ (this is the output directory from the first step [sheppard.sh]!)
--analysis_type basic \ (could be basic, nano-assembly)
--outdir /path/to/seQuoia_reporter/

reporter will give you three types of output files. These are detailed to greater extend on the page:

Multi-QC report - quick and beautiful visualization of sequencing stats
Some excel spreadsheet with high level stats with outliers marked using MAD approach
Simple text files for: 1) high_level_stats.txt 2) centrifuge_stats.txt - rev up the Shiny app on one of the nodes and visualize meta data in conjunction with the sequencing stats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Quick Start

Clone this wiki locally