Skip to content

Commit

Permalink
Fix old pages from 2018 / 2019 events
Browse files Browse the repository at this point in the history
  • Loading branch information
ewels committed Mar 10, 2024
1 parent b5ec9a8 commit 597d9ce
Show file tree
Hide file tree
Showing 35 changed files with 493 additions and 451 deletions.
20 changes: 11 additions & 9 deletions src/pages/nfcamp/2019/alex.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
title=Migrating legacy workflows to Nextflow
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: Migrating legacy workflows to Nextflow
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## Migrating legacy workflows to Nextflow

### Alexander Peltzer
*Bioinformatics Research Scientist, Quantitative Biology Center Tübingen, Germany*
*Bioinformatics Research Scientist, Quantitative Biology Center Tübingen, Germany*

Many researchers and institutions face similar issues: Having in-house legacy workflows written in bash or other formats and now facing issues with reproducibility, maintenance of these pipelines and the increasing computational demands poses severe threats to being able to address modern computational questions using such legacy workflows. In this talk, I intend to highlight the efforts we took at the QBIC to maintain compatibility between existing workflows but simultaneously porting all of our existing legacy pipelines to the Nextflow and nf-core frameworks to be able to answer these threats. With a specific application case on one of the most widely used ancient DNA pipelines, I intend to highlight the benefits of these migrated pipeline in comparison to the previously existing workflow in a 1:1 setting.

### Bio

Alexander Peltzer is a bioinformatics research scientist at the Quantitative Biology Center in Tübingen and is working in maintaining and developing modern solutions for data management and analysis for various omics technologies there. Before, he obtained a PhD in bioinformatics at the University of Tübingen and the Max Planck Institute for the Science of Human History where he worked on computational methods for ancient DNA reconstruction.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
30 changes: 16 additions & 14 deletions src/pages/nfcamp/2019/alper.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,35 @@
title=DolphinNext: A graphical user interface for distributed data processing of high throughput genomics
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: DolphinNext: A graphical user interface for distributed data processing of high throughput genomics
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## DolphinNext: A graphical user interface for distributed data processing of high throughput genomics

### Alper Kucukural
*Assistant Professor, UMass Medical School, USA*
*Assistant Professor, UMass Medical School, USA*

Emergence of new biomedical technologies, like next-generation sequencing (NGS) which is producing vast amounts of genomic data every day, is driving a big data revolution in biology. The dramatic increase in the volume, as well as the production rate of genomic data, has now made the data analysis new bottleneck for scientific discovery. Naturally, the need for highly-parallel data processing frameworks is greater than ever. It is also important for these frameworks to have certain design characteristics such as flexibility, portability, and reproducibility. Processing of sequencing data usually involves many different programs, each of which performs a specific step in the overall pipeline. Flexibility ensures that the pipelines can support a variety of use cases or data types without the need to modify existing pipelines or create new ones. Portability gives user the freedom to choose computational resources as he/she deems fit. Reproducibility across computing environments, which warrants credibility of the results, is a particularly important feature in the face of the sheer volume of data and complexity of the pipelines.

There exist several platforms that offer graphical user interfaces for designing and execution of complex pipelines (e.g. Galaxy, GenePattern, GeneProf). Unfortunately, none of these platforms supports parallelism or portability across computing environments. To address these and additional shortcomings discussed in this paper, we have created DolphinNext, an easy-to-use graphical user interface for creating and deploying complex workflows for parallel processing of high throughput genomic data. DolphinNext relies on Nextflow which is a framework enabling scalable and reproducible workflows using software containers. The central idea behind the creation of DolphinNext is to facilitate building and deployment of complex pipelines using a graphically-enabled modular approach. DolphinNext provides:

1. A drag and drop user interface that abstracts Nextflow pipelines and allows users to create pipelines without familiarity with Nextflow.
2. Reproducible pipelines by providing version tracking, and by creating a stand-alone version of any pipeline instance to run independently or to share in the publications.
3. Seamless portability to distributed computational environments such as high performance clusters or cloud based solutions.
1. A drag and drop user interface that abstracts Nextflow pipelines and allows users to create pipelines without familiarity with Nextflow.
2. Reproducible pipelines by providing version tracking, and by creating a stand-alone version of any pipeline instance to run independently or to share in the publications.
3. Seamless portability to distributed computational environments such as high performance clusters or cloud based solutions.
4. A graphical user interface to monitor pipeline execution that allows restarting of intermediate processes even after parameter changes.

### Bio

Dr. Kucukural designs and implements reusable, robust and production grade bioinformatics analysis pipelines and pipeline generation tools for processing next-generation sequencing data.
Dr. Kucukural designs and implements reusable, robust and production grade bioinformatics analysis pipelines and pipeline generation tools for processing next-generation sequencing data.
He mainly works on NGS data analysis; RNA-Seq, RIP-Seq, Chip-Seq, CLIP-Seq and derivatives. He implemented algorithms to reduce noise by calling the peaks caused by experimental and alignment biases especially for RIP and CLIP-Seq data.

Dr. Kucukural worked on analyzing deep sequencing data to discover key elements of splicing of pre-mRNAs to have better understanding of post-transcriptional regulations of RNAs. Moreover, he has deep knowledge of finding genome wide mRNA targets of RNA binding proteins (RBPs). Analyzing RNA targets of tdp43 RBP with deep sequencing on Rat and human was another focus of his research to understand the mechanisms of neuro-degenerative diseases such as Alzheimer and ALS.
Dr. Kucukural worked on analyzing deep sequencing data to discover key elements of splicing of pre-mRNAs to have better understanding of post-transcriptional regulations of RNAs. Moreover, he has deep knowledge of finding genome wide mRNA targets of RNA binding proteins (RBPs). Analyzing RNA targets of tdp43 RBP with deep sequencing on Rat and human was another focus of his research to understand the mechanisms of neuro-degenerative diseases such as Alzheimer and ALS.

He also worked for protein structure characterization and prediction. He applied techniques from graph theory on protein structure analysis and implemented the theories from computer sciences to biology to find solutions in drug design and small molecular docking fields. He developed applications using genetic algorithms to discover biomarkers and implemented feature detection methods using various clustering, classification and machine learning algorithms such as hidden markov models and support vector machines.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
30 changes: 16 additions & 14 deletions src/pages/nfcamp/2019/anna.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
title=FA-nf - A Bioinformatics pipeline for functional annotation implemented in Nextflow
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: FA-nf - A Bioinformatics pipeline for functional annotation implemented in Nextflow
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## FA-nf - A Bioinformatics pipeline for functional annotation implemented in Nextflow

Expand All @@ -13,21 +15,21 @@ status=published
### Toni Hermoso
*Bioinformatician, Centre for Genomics Regulation, Spain*

With the advantages of NGS technologies it became possible to obtain a whole genome sequence and its genome assembly of any novel organism at a relatively low cost and short time. To be able to work with this novel genome assembly, scientists need to know positions of the genic elements, especially protein-coding genes, and their putative function. Therefore, function annotation (FA) is an important step in de-novo genome processing and can provide important information about putative role of concrete gene. Such annotation usually includes assigning functional domains, i.e. from Pfam or Panther, ontology terms (GO), and specific elements, i.e. cleavage sites.
With the advantages of NGS technologies it became possible to obtain a whole genome sequence and its genome assembly of any novel organism at a relatively low cost and short time. To be able to work with this novel genome assembly, scientists need to know positions of the genic elements, especially protein-coding genes, and their putative function. Therefore, function annotation (FA) is an important step in de-novo genome processing and can provide important information about putative role of concrete gene. Such annotation usually includes assigning functional domains, i.e. from Pfam or Panther, ontology terms (GO), and specific elements, i.e. cleavage sites.

There are two main outcomes from functional annotation: the first one is an annotation itself, which allows scientists to perform various analysis to understand better genome function. The second one, is an additional quality check for genome assembly and predicted genes. Therefore, it is possible to identify suspicious genes which may belong to another species due to contamination, or non-functional overpredicted genes, erroneously annotated.

Here we will present a pipeline for a functional annotation of novel proteins from non-model organisms implemented in Nextflow. The pipeline allows to put together different widely-used tools in the field of functional annotation, including some Java applications and REST API services scripts. This software diversity and complexity is handled thanks to software containers (Docker or Singularity), allowing an easier maintenance and versioning of bundled programs. Data exchange and resulting reports are stored in a database, which can be either sitting on the very filesystem as a single database file using SQLite, or through a preset MySQL DBMS server. For that latter case, we also managed to set up a HPC-compatible MySQL on-demand approach that enabled parallel and subsequent processes of the pipeline to query a single MySQL server instance from different cluster nodes.

The provided output is compatible with commonly-used standard resources for downstream analysis, such as UCSC genome browser or Bioconductor packages.
The provided output is compatible with commonly-used standard resources for downstream analysis, such as UCSC genome browser or Bioconductor packages.
This pipeline was used in many genomic projects we collaborated with, among others, those of melon, common bean, wasp, and Iberian lynx.

### Bio
### Bio

**Anna Vlasova**, Bioinformatician at Institute of Molecular Pathology, Vienna, Austria.
**Anna Vlasova**, Bioinformatician at Institute of Molecular Pathology, Vienna, Austria.

**Toni Hermoso Pulido**, Bioinformatics Core Facility, Centre for Genomic Regulation, Barcelona. Degree in Biochemistry and PhD in Biotechnology at Autonomous University of Barcelona. Since 2009 he joined CRG as a member of the just established Bioinformatics Core Facility where he has been supporting scientific web services and databases, research training and data analyses at the centre.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
30 changes: 16 additions & 14 deletions src/pages/nfcamp/2019/anthony.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
title=Pay As You Go Cloud Bioinformatics for Pathogens
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: Pay As You Go Cloud Bioinformatics for Pathogens
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## Pay As You Go Cloud Bioinformatics for Pathogens

### Anthony Underwood
*Bioinformatics Implementation Manager, Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, UK*
*Bioinformatics Implementation Manager, Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, UK*

### Ben Taylor
*Senior Software Developer, Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, UK*
*Senior Software Developer, Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, UK*

Nextflow provides a mechanism to develop high throughput parallelizable pipelines on a small desktop machine and then scale out to larger infrastructures such as HPC clusters. If you do not have a cluster Nextflow allows the same pipeline to run in the cloud such as on AWS Batch where hundreds or thousands of jobs can run in parallel. However:
Nextflow provides a mechanism to develop high throughput parallelizable pipelines on a small desktop machine and then scale out to larger infrastructures such as HPC clusters. If you do not have a cluster Nextflow allows the same pipeline to run in the cloud such as on AWS Batch where hundreds or thousands of jobs can run in parallel. However:

1. it is not trivial to set up
2. users need to be comfortable on the command line
3. and there’s a risk you could rack up large bills

We have developed a CloudFormation template and web application to make scaling up Nextflow-based pipelines via deployment in AWS far simpler. Using this approach:

1. a new pipeline can be deployed with just a few clicks
2. end users can start and monitor the pipeline using a web page
3. limits can be enforced to prevent unexpected bills.
3. limits can be enforced to prevent unexpected bills.

By using AWS Lambda to provide the services that bind everything together, you only pay to store your data and for the EC2 compute used by batch when the pipeline is actually running.

We will talk about why and how we did this and ask for feedback on how our approach could be improved.

### Bio
Expand All @@ -35,6 +37,6 @@ We will talk about why and how we did this and ask for feedback on how our appro

**Ben Taylor** is a senior software developer in the Centre for Genomic Pathogen Surveillance developing software that optimises common bioinformatics processes and delivers them through user-friendly interfaces. He’s previously worked for the UK Government and private companies to make it easier to use Cloud Infrastructure.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
20 changes: 11 additions & 9 deletions src/pages/nfcamp/2019/anton.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
title=NF-web: a web interface for Nextflow
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: NF-web: a web interface for Nextflow
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## NF-web: a web interface for Nextflow

Expand All @@ -14,8 +16,8 @@ We are presenting an open-source web application for running Nextflow pipelines

### Bio

Senior Software Developer in Cellular Genetics Informatics support group at Sanger Institute. Developing internal services (NF-web) and paper supplement websites, working with computational infrastructure (Openstack, Kubernetes), maintaining the department JupyterHub deployment. Contributed to the development of multiple workflow engines (primarily CWL - cwltool, REANA, Rabix). Bachelor of System Analysis and Master of Computer Science, Kyiv Polytechnic Institute.
Senior Software Developer in Cellular Genetics Informatics support group at Sanger Institute. Developing internal services (NF-web) and paper supplement websites, working with computational infrastructure (Openstack, Kubernetes), maintaining the department JupyterHub deployment. Contributed to the development of multiple workflow engines (primarily CWL - cwltool, REANA, Rabix). Bachelor of System Analysis and Master of Computer Science, Kyiv Polytechnic Institute.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
24 changes: 13 additions & 11 deletions src/pages/nfcamp/2019/evan.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
title=Scientific workflows beyond the Ivory Tower
date=2019-05-28
type=col8
tags=nextflow,nfcamp,2019,workshop
status=published
~~~~~~
---
title: Scientific workflows beyond the Ivory Tower
date: 2019-05-28
type: col8
tags: nextflow,nfcamp,2019,workshop
status: published
layout: "@layouts/Page.astro"
---

## Scientific workflows beyond the Ivory Tower

### Evan Floden
*Seqera Labs & Centre for Genomic Regulation (CRG), Spain*
*Seqera Labs & Centre for Genomic Regulation (CRG), Spain*

Evan will discuss the progress of Seqera Labs over the past 12 months
as well unveil some exciting new developments for the Nextflow project.

### Bio

Evan is a workflow developer helping scientists and engineers to find solutions to their problems using Nextflow.
Evan is a workflow developer helping scientists and engineers to find solutions to their problems using Nextflow.
His background is in Biotechnology and Bioinformatics and he worked on the Nextflow project
during his PhD from 2014 before co-founding Seqera Labs. Evan's interests include genomics, scientific workflow
optimization, home automation and bicycle touring.
during his PhD from 2014 before co-founding Seqera Labs. Evan's interests include genomics, scientific workflow
optimization, home automation and bicycle touring.

### Registration
### Registration

To attend Nextflow Camp 2019 register at [this link](https://www.crg.eu/en/event/coursescrg-nextflow-2019).
Loading

0 comments on commit 597d9ce

Please sign in to comment.