The workflows in this repository are capable of running analyses on (NGS sequencing) data; ChIP-Seq, ATAC-Seq, CLIP-Seq, and RNA-Seq
Workflows are written in CWL.
These workflows are compatable within;
- or standalone with cwltool
Workflows available on SciDAP are automatically updated and version controlled based on PR's into this repository.
in order of importance
├── workflows
├── tools
├── dockerfiles
│ ├── scripts
├── tests
│ ├── data
-
The workflows directory is the home for all of the CWL files that makeup individual workflows. Workflows will use tools for running individual steps.
-
The tools directory is the home for all of the CWL files that handle general steps. Tools will utilize docker images for running analyses in containers (to maintain run conditions).
-
The dockerfiles directory is home to the files used to create images that are used by tools.
- The scripts directory is home to scripts that are included in built docker images. These scripts are often either in R, or bash
- The tests directory is home to json files (jobs) for each workflow/tool. These job files are what are used in order to test individual workflows. Using these tests requires data, which the child folder data contains through a repo-reference to Barski-Labs workflow_tests
There are 4 additional references/tags that can be included in different parts of a workflow for added compatability within SciDAP.
- Upstreams: For designating what workflows generate outputs that some workflow can use as inputs.
- Visual Plugins: For added visualizations of output data within the SciDAP platform
- Service Tags: For differentiating what kind of samples this workflow creates
- sql query for input: How to allow user to dynamically create sql query based off of options (saved as string for cwl input)
To extend user interface (dynamic form) with extra input fields not required by a workflow 'sd:metadata'
field were introduced.
It defines a list of workflow templates where inputs
object is used for constructing and storing extra fields with an original workflow.
Example of 'metadata' template for user interface
To allow selection of already analysed data as input for a workflow, we organize a graph of separate workflows. To link workflows we use ’sd:upstream’
, which defines a list of upstream workflows that this workflow can use for input data.
example of workflow with upstreams
Usually, workflows' output results (especially files) are provided as download links on the SciDAP platform.
With SciDAP's visualization plugins, output data can be presented as a;
- plot
- a genome (igv) browser
- a table
- or (in the case of html outputs), can be opened in a new tab.
The keyword 'sd:visualPlugins'
enables SciDAP visualization plugins.
The line
, pie
, chart
, igvbrowser
, syncfusiongrid
, and linkList
types are already available in the platform.
Example of visual plugins used on workflow outputs
The 'sd:serviceTag'
keyword enables new workflows to be added for the creation of:
- samples: uses keyword
'sample'
- analyses: uses keyword
'analysis'
- genelist: uses keywork
'genelist'
The service tag on a workflow will determine how samples are listed when viewing a project on the SciDAP platform.
Workflows without a service tag (or with one not recognized) will create samples in a tab called "not in use"
inputs:
#...
sql_query:
type: string
label: "Filtering parameters"
doc: "Filtering parameters (WHERE parameters for SQL query)"
'sd:filtering':
params:
columns: ["Refseq_id", "Gene_id", "txStart", "txEnd", "Strand", "Region", "Chr", "Start", "End", "Conc", "Conc1", "Conc2", "Fold", "p-value", "FDR", "Called1", "Called2"]
types: ["string", "string", "number", "number", "string", "string", "string", "number", "number", "number", "number", "number", "number", "number", "number","number", "number"]
will create an sql query based on the values given for any grouping and selection of the columns