Skip to content

datirium/workflows

Repository files navigation

Bioinformatics Workflows by Datirium LLC

how to contribute

The workflows in this repository are capable of running analyses on (NGS sequencing) data; ChIP-Seq, ATAC-Seq, CLIP-Seq, and RNA-Seq

Workflows are written in CWL.

These workflows are compatable within;

Workflows available on SciDAP are automatically updated and version controlled based on PR's into this repository.


Repo structure

in order of importance

├── workflows
├── tools
├── dockerfiles
│   ├── scripts
├── tests
│   ├── data
  • The workflows directory is the home for all of the CWL files that makeup individual workflows. Workflows will use tools for running individual steps.

  • The tools directory is the home for all of the CWL files that handle general steps. Tools will utilize docker images for running analyses in containers (to maintain run conditions).

  • The dockerfiles directory is home to the files used to create images that are used by tools.

    • The scripts directory is home to scripts that are included in built docker images. These scripts are often either in R, or bash
  • The tests directory is home to json files (jobs) for each workflow/tool. These job files are what are used in order to test individual workflows. Using these tests requires data, which the child folder data contains through a repo-reference to Barski-Labs workflow_tests

Augmented CWL standard for SciDAP

There are 4 additional references/tags that can be included in different parts of a workflow for added compatability within SciDAP.

  1. Upstreams: For designating what workflows generate outputs that some workflow can use as inputs.
  2. Visual Plugins: For added visualizations of output data within the SciDAP platform
  3. Service Tags: For differentiating what kind of samples this workflow creates
  4. sql query for input: How to allow user to dynamically create sql query based off of options (saved as string for cwl input)

Metadata

To extend user interface (dynamic form) with extra input fields not required by a workflow 'sd:metadata' field were introduced. It defines a list of workflow templates where inputs object is used for constructing and storing extra fields with an original workflow.

Example of 'metadata' template for user interface


Upstreams

To allow selection of already analysed data as input for a workflow, we organize a graph of separate workflows. To link workflows we use ’sd:upstream’, which defines a list of upstream workflows that this workflow can use for input data.

example of workflow with upstreams


VisualPlugins for an output type file

Usually, workflows' output results (especially files) are provided as download links on the SciDAP platform.

With SciDAP's visualization plugins, output data can be presented as a;

  • plot
  • a genome (igv) browser
  • a table
  • or (in the case of html outputs), can be opened in a new tab.

The keyword 'sd:visualPlugins' enables SciDAP visualization plugins.

The line, pie, chart, igvbrowser, syncfusiongrid, and linkList types are already available in the platform.

Example of visual plugins used on workflow outputs


Service Tags for workflows

The 'sd:serviceTag'keyword enables new workflows to be added for the creation of:

  • samples: uses keyword 'sample'
  • analyses: uses keyword 'analysis'
  • genelist: uses keywork 'genelist'

The service tag on a workflow will determine how samples are listed when viewing a project on the SciDAP platform.

Workflows without a service tag (or with one not recognized) will create samples in a tab called "not in use"

SQL for Input

inputs: 
#...
  sql_query:
    type: string
    label: "Filtering parameters"
    doc: "Filtering parameters (WHERE parameters for SQL query)"
    'sd:filtering':
      params:
        columns: ["Refseq_id", "Gene_id", "txStart", "txEnd", "Strand", "Region", "Chr", "Start", "End", "Conc", "Conc1", "Conc2", "Fold", "p-value", "FDR", "Called1", "Called2"]
        types:   ["string", "string", "number", "number", "string", "string", "string", "number", "number", "number", "number", "number", "number", "number", "number","number", "number"]
       

will create an sql query based on the values given for any grouping and selection of the columns