Bioinformatics Workflows by Datirium LLC

Repository folder structure

CWL for SciDAP

The workflows in this repository are capable of running analyses on (NGS sequencing) data; ChIP-Seq, ATAC-Seq, CLIP-Seq, and RNA-Seq

Workflows are written in CWL.

These workflows are compatable within;

Scientific Data Analysis Platform (SciDAP)

or standalone with cwltool

Workflows available on SciDAP are automatically updated and version controlled based on PR's into this repository.

Repo structure

in order of importance

├── workflows
├── tools
├── dockerfiles
│   ├── scripts
├── tests
│   ├── data

The workflows directory is the home for all of the CWL files that makeup individual workflows. Workflows will use tools for running individual steps.
The tools directory is the home for all of the CWL files that handle general steps. Tools will utilize docker images for running analyses in containers (to maintain run conditions).
The dockerfiles directory is home to the files used to create images that are used by tools.
- The scripts directory is home to scripts that are included in built docker images. These scripts are often either in R, or bash

The tests directory is home to json files (jobs) for each workflow/tool. These job files are what are used in order to test individual workflows. Using these tests requires data, which the child folder data contains through a repo-reference to Barski-Labs workflow_tests

Augmented CWL standard for SciDAP

There are 4 additional references/tags that can be included in different parts of a workflow for added compatability within SciDAP.

Upstreams: For designating what workflows generate outputs that some workflow can use as inputs.
Visual Plugins: For added visualizations of output data within the SciDAP platform
Service Tags: For differentiating what kind of samples this workflow creates
sql query for input: How to allow user to dynamically create sql query based off of options (saved as string for cwl input)

Metadata

To extend user interface (dynamic form) with extra input fields not required by a workflow 'sd:metadata' field were introduced. It defines a list of workflow templates where inputs object is used for constructing and storing extra fields with an original workflow.

Example of 'metadata' template for user interface

Upstreams

To allow selection of already analysed data as input for a workflow, we organize a graph of separate workflows. To link workflows we use ’sd:upstream’, which defines a list of upstream workflows that this workflow can use for input data.

example of workflow with upstreams

VisualPlugins for an output type file

Usually, workflows' output results (especially files) are provided as download links on the SciDAP platform.

With SciDAP's visualization plugins, output data can be presented as a;

plot
a genome (igv) browser
a table
or (in the case of html outputs), can be opened in a new tab.

The keyword 'sd:visualPlugins' enables SciDAP visualization plugins.

The line, pie, chart, igvbrowser, syncfusiongrid, and linkList types are already available in the platform.

Example of visual plugins used on workflow outputs

Service Tags for workflows

The 'sd:serviceTag'keyword enables new workflows to be added for the creation of:

samples: uses keyword 'sample'
analyses: uses keyword 'analysis'
genelist: uses keywork 'genelist'

The service tag on a workflow will determine how samples are listed when viewing a project on the SciDAP platform.

Workflows without a service tag (or with one not recognized) will create samples in a tab called "not in use"

SQL for Input

inputs: 
#...
  sql_query:
    type: string
    label: "Filtering parameters"
    doc: "Filtering parameters (WHERE parameters for SQL query)"
    'sd:filtering':
      params:
        columns: ["Refseq_id", "Gene_id", "txStart", "txEnd", "Strand", "Region", "Chr", "Start", "End", "Conc", "Conc1", "Conc2", "Fold", "p-value", "FDR", "Called1", "Called2"]
        types:   ["string", "string", "number", "number", "string", "string", "string", "number", "number", "number", "number", "number", "number", "number", "number","number", "number"]

will create an sql query based on the values given for any grouping and selection of the columns

Name		Name	Last commit message	Last commit date
Latest commit History 1,583 Commits
descriptions		descriptions
devel		devel
dockerfiles		dockerfiles
docs		docs
metadata		metadata
tests		tests
tools		tools
workflows		workflows
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
run_test.sh		run_test.sh
run_test_docker.sh		run_test_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioinformatics Workflows by Datirium LLC

Repo structure

Augmented CWL standard for SciDAP

Metadata

Upstreams

VisualPlugins for an output type file

Service Tags for workflows

SQL for Input

About

Releases

Packages

Contributors 13

Languages

License

datirium/workflows

Folders and files

Latest commit

History

Repository files navigation

Bioinformatics Workflows by Datirium LLC

Repo structure

Augmented CWL standard for SciDAP

Metadata

Upstreams

VisualPlugins for an output type file

Service Tags for workflows

SQL for Input

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages