Skip to content

Commit

Permalink
Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein…
Browse files Browse the repository at this point in the history
… coding genes
  • Loading branch information
GallVp committed Dec 5, 2024
1 parent 17673f7 commit fb9a0f4
Show file tree
Hide file tree
Showing 14 changed files with 256 additions and 50 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
# This workflow is triggered on PRs to main branch on the repository
# It fails when someone tries to make a PR against the Plant-Food-Research-Open `main` branch instead of `dev`
on:
pull_request_target:
branches: [master]
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'Plant-Food-Research-Open/genepal'
run: |
Expand All @@ -22,7 +22,7 @@ jobs:
uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
with:
message: |
## This PR is against the `master` branch :x:
## This PR is against the `main` branch :x:
* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
Expand All @@ -32,9 +32,9 @@ jobs:
Hi @${{ github.event.pull_request.user.login }},
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
The `main` branch should always contain code from the latest release.
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Test successful pipeline download with 'nf-core pipelines download'

# Run the workflow when:
# - dispatched manually
# - when a PR is opened or reopened to master branch
# - when a PR is opened or reopened to main branch
# - the head branch of the pull request is updated, i.e. if fixes for a release are pushed last minute to dev.
on:
workflow_dispatch:
Expand All @@ -17,10 +17,10 @@ on:
- edited
- synchronize
branches:
- master
- main
pull_request_target:
branches:
- master
- main

env:
NXF_ANSI_LOG: false
Expand Down
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,24 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.6.0 - [4-Dec-2024]
## v0.6.0 - [6-Dec-2024]

### 'Added'

1. Added cDNA and CDS outputs to <OUTPUT_DIR>/annotations/<SAMPLE> directory [#118](https://github.com/Plant-Food-Research-Open/genepal/issues/118)

### `Fixed`

1. Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein coding genes [#121](https://github.com/Plant-Food-Research-Open/genepal/issues/121)
2. Switched branch name from `master` to `main` in the GHA CIs

### `Dependencies`

1. Nextflow!>=24.04.2
2. nf-schema@2.1.1

### `Deprecated`

## v0.5.0 - [21-Nov-2024]

### `Added`
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
- Merge multi-reference liftoffs
- Remove liftoff transcripts marked by _valid_ORF=False_
- Remove liftoff genes with any intron shorter than 10 bp
- Remove rRNA and tRNA from liftoff
- Remove rRNA, tRNA and other non-protein coding models from liftoff
- Optionally, allow or remove iso-forms
- Remove BRAKER models from Liftoff loci
- Merge Liftoff and BRAKER models
Expand Down
4 changes: 2 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF
}

withName: '.*:FASTA_LIFTOFF:GFFREAD_BEFORE_LIFTOFF' {
ext.args = '--no-pseudo --keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:MERGE_LIFTOFF_ANNOTATIONS' {
Expand All @@ -212,7 +212,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF

withName: '.*:FASTA_LIFTOFF:GFFREAD_AFTER_LIFTOFF' {
ext.prefix = { "${meta.id}.liftoff" }
ext.args = '--keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST:AGAT_CONVERTSPGFF2GTF' {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
include { GUNZIP as GUNZIP_FASTA } from '../../modules/nf-core/gunzip/main'
include { GUNZIP as GUNZIP_GFF } from '../../modules/nf-core/gunzip/main'
include { GFFREAD as GFFREAD_BEFORE_LIFTOFF } from '../../modules/nf-core/gffread/main'
include { LIFTOFF } from '../../modules/nf-core/liftoff/main'
include { AGAT_SPMERGEANNOTATIONS as MERGE_LIFTOFF_ANNOTATIONS } from '../../modules/nf-core/agat/spmergeannotations/main'
include { AGAT_SPFLAGSHORTINTRONS } from '../../modules/gallvp/agat/spflagshortintrons/main'
include { AGAT_SPFILTERFEATUREFROMKILLLIST } from '../../modules/nf-core/agat/spfilterfeaturefromkilllist/main'
include { GFFREAD as GFFREAD_AFTER_LIFTOFF } from '../../modules/nf-core/gffread/main'
include { GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST } from '../../subworkflows/local/gff_tsebra_spfilterfeaturefromkilllist'
include { GUNZIP as GUNZIP_FASTA } from '../../../modules/nf-core/gunzip/main'
include { GUNZIP as GUNZIP_GFF } from '../../../modules/nf-core/gunzip/main'
include { GFFREAD as GFFREAD_BEFORE_LIFTOFF } from '../../../modules/nf-core/gffread/main'
include { LIFTOFF } from '../../../modules/nf-core/liftoff/main'
include { AGAT_SPMERGEANNOTATIONS as MERGE_LIFTOFF_ANNOTATIONS } from '../../../modules/nf-core/agat/spmergeannotations/main'
include { AGAT_SPFLAGSHORTINTRONS } from '../../../modules/gallvp/agat/spflagshortintrons/main'
include { AGAT_SPFILTERFEATUREFROMKILLLIST } from '../../../modules/nf-core/agat/spfilterfeaturefromkilllist/main'
include { GFFREAD as GFFREAD_AFTER_LIFTOFF } from '../../../modules/nf-core/gffread/main'
include { GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST } from '../../../subworkflows/local/gff_tsebra_spfilterfeaturefromkilllist'

workflow FASTA_LIFTOFF {
take:
target_assemby // Channel: [ meta, fasta ]
xref_fasta // Channel: [ meta2, fasta ]
xref_gff // Channel: [ meta2, gff3 ]
target_assembly // Channel: [ meta, fasta ]
xref_fasta // Channel: [ meta2, fasta(.gz)? ]
xref_gff // Channel: [ meta2, gff3(.gz)? ]
val_filter_liftoff_by_hints // val(true|false)
braker_hints // [ meta, gff ]
tsebra_config // Channel: [ cfg ]
allow_isoforms // val(true|false)
val_allow_isoforms // val(true|false)


main:
ch_versions = Channel.empty()

// MODULE: GUNZIP as GUNZIP_FASTA
ch_xref_fasta_branch = xref_fasta
| branch { meta, file ->
| branch { _meta, file ->
gz: "$file".endsWith(".gz")
rest: !"$file".endsWith(".gz")
}
Expand All @@ -40,7 +40,7 @@ workflow FASTA_LIFTOFF {

// MODULE: GUNZIP as GUNZIP_GFF
ch_xref_gff_branch = xref_gff
| branch { meta, file ->
| branch { _meta, file ->
gz: "$file".endsWith(".gz")
rest: !"$file".endsWith(".gz")
}
Expand All @@ -61,7 +61,7 @@ workflow FASTA_LIFTOFF {
ch_versions = ch_versions.mix(GFFREAD_BEFORE_LIFTOFF.out.versions.first())

// MODULE: LIFTOFF
ch_liftoff_inputs = target_assemby
ch_liftoff_inputs = target_assembly
| combine(
ch_xref_gunzip_fasta
| join(
Expand All @@ -72,7 +72,7 @@ workflow FASTA_LIFTOFF {
[
[
id: "${meta.id}.from.${ref_meta.id}",
target_assemby: meta.id
target_assembly: meta.id
],
target_fa,
ref_fa,
Expand All @@ -81,21 +81,21 @@ workflow FASTA_LIFTOFF {
}

LIFTOFF(
ch_liftoff_inputs.map { meta, target_fa, ref_fa, ref_gff -> [ meta, target_fa ] },
ch_liftoff_inputs.map { meta, target_fa, ref_fa, ref_gff -> ref_fa },
ch_liftoff_inputs.map { meta, target_fa, ref_fa, ref_gff -> ref_gff },
ch_liftoff_inputs.map { meta, target_fa, _ref_fa, _ref_gff -> [ meta, target_fa ] },
ch_liftoff_inputs.map { _meta, _target_fa, ref_fa, _ref_gff -> ref_fa },
ch_liftoff_inputs.map { _meta, _target_fa, _ref_fa, ref_gff -> ref_gff },
[]
)

ch_liftoff_gff3 = LIFTOFF.out.polished_gff3
| map { meta, gff -> [ [ id: meta.target_assemby ], gff ] }
| map { meta, gff -> [ [ id: meta.target_assembly ], gff ] }
| groupTuple

ch_versions = ch_versions.mix(LIFTOFF.out.versions.first())

// MODULE: AGAT_SPMERGEANNOTATIONS as MERGE_LIFTOFF_ANNOTATIONS
ch_merge_inputs = ch_liftoff_gff3
| branch { meta, list_polished ->
| branch { _meta, list_polished ->
one: list_polished.size() == 1
many: list_polished.size() > 1
}
Expand All @@ -119,23 +119,29 @@ workflow FASTA_LIFTOFF {
ch_flagged_gff = AGAT_SPFLAGSHORTINTRONS.out.gff
ch_versions = ch_versions.mix(AGAT_SPFLAGSHORTINTRONS.out.versions.first())

// COLLECTFILE: Kill list for valid_ORF=False transcripts
// tRNA, rRNA
// gene with any intron marked as 'pseudo=' by AGAT/SPFLAGSHORTINTRONS
// collectFile: Kill list for valid_ORF=False transcripts
// tRNA, rRNA, gene with any intron marked as
// 'pseudo=' by AGAT/SPFLAGSHORTINTRONS
ch_kill_list = ch_flagged_gff
| map { meta, gff ->

def tx_from_gff = gff.readLines()
.findAll { it ->
// Can't add to kill list
if ( it.startsWith('#') ) { return false }

def cols = it.split('\t')
def feat = cols[2]

if ( feat in [ 'tRNA', 'rRNA' ] ) { return true }
if ( feat !in [ 'transcript', 'mRNA', 'gene' ] ) { return false }
// Add to kill list anything other than standard features
if ( feat !in [ 'gene', 'transcript', 'mRNA', 'exon', 'CDS', 'five_prime_UTR', 'three_prime_UTR' ] ) { return true }

// Ignore [ 'exon', 'CDS', 'five_prime_UTR', 'three_prime_UTR' ]
if ( feat !in [ 'gene', 'transcript', 'mRNA' ] ) { return false }

def attrs = cols[8]

// Add [ 'gene', 'transcript', 'mRNA' ] with 'valid_ORF=False' or 'pseudo=' attributes to kill list
( attrs.contains('valid_ORF=False') || attrs.contains('pseudo=') )
}
.collect {
Expand All @@ -160,8 +166,8 @@ workflow FASTA_LIFTOFF {


AGAT_SPFILTERFEATUREFROMKILLLIST(
ch_agat_kill_inputs.map { meta, gff, kill -> [ meta, gff ] },
ch_agat_kill_inputs.map { meta, gff, kill -> kill },
ch_agat_kill_inputs.map { meta, gff, _kill -> [ meta, gff ] },
ch_agat_kill_inputs.map { _meta, _gff, kill -> kill },
[] // default config
)

Expand All @@ -179,7 +185,7 @@ workflow FASTA_LIFTOFF {
val_filter_liftoff_by_hints ? ch_attr_trimmed_gff : Channel.empty(),
braker_hints,
tsebra_config,
allow_isoforms,
val_allow_isoforms,
'liftoff'
)

Expand Down
105 changes: 105 additions & 0 deletions subworkflows/local/fasta_liftoff/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
nextflow_workflow {

name "Test Subworkflow FASTA_LIFTOFF"
script "../main.nf"
workflow "FASTA_LIFTOFF"
config './nextflow.config'

tag "subworkflows"
tag "subworkflows_gallvp"
tag "subworkflows/fasta_liftoff"
tag "subworkflows/gff_tsebra_spfilterfeaturefromkilllist"

tag "gunzip"
tag "gffread"
tag "liftoff"
tag "agat"
tag "agat/spmergeannotations"
tag "agat/spflagshortintrons"
tag "agat/spfilterfeaturefromkilllist"

setup {
run('GUNZIP', alias: 'GUNZIP_GENOME_FASTA') {
script "../../../../modules/nf-core/gunzip"

process {
"""
input[0] = [
[ id:'test' ],
file(params.modules_testdata_base_path + 'genomics/eukaryotes/actinidia_chinensis/genome/chr1/genome.fasta.gz', checkIfExists: true)
]
"""
}
}

run('GUNZIP', alias: 'GUNZIP_BRAKER_HINTS') {
script "../../../../modules/nf-core/gunzip"

process {
"""
input[0] = [
[ id:'test' ],
file(params.modules_testdata_base_path + 'genomics/eukaryotes/actinidia_chinensis/genome/chr1/genome.hints.gff.gz', checkIfExists: true)
]
"""
}
}
}


test("liftoff - GCF_019202715 - to - actinidia_chinensis") {

when {
workflow {
"""
input[0] = GUNZIP_GENOME_FASTA.out.gunzip
input[1] = Channel.of([
[ id:'ref' ],
file ( "${baseDir}/subworkflows/local/fasta_liftoff/tests/testdata/GCF_019202715.1.fna.gz", checkIfExists: true )
])
input[2] = Channel.of([
[ id:'ref' ],
file ( "${baseDir}/subworkflows/local/fasta_liftoff/tests/testdata/GCF_019202715.1.gff.gz", checkIfExists: true )
])
input[3] = true // val_filter_liftoff_by_hints
input[4] = GUNZIP_BRAKER_HINTS.out.gunzip
input[5] = Channel.of ( file("${baseDir}/assets/tsebra-template.cfg", checkIfExists: true) )
| map { cfg ->
def enforce_full_intron_support = true
def param_intron_support = enforce_full_intron_support ? '1.0' : '0.0'
def param_e1 = params.allow_isoforms ? '0.1' : '0.0'
def param_e2 = params.allow_isoforms ? '0.5' : '0.0'
def param_e3 = params.allow_isoforms ? '0.05' : '0.0'
def param_e4 = params.allow_isoforms ? '0.2' : '0.0'
[
'tsebra-config.cfg',
cfg
.text
.replace('PARAM_INTRON_SUPPORT', param_intron_support)
.replace('PARAM_E1', param_e1)
.replace('PARAM_E2', param_e2)
.replace('PARAM_E3', param_e3)
.replace('PARAM_E4', param_e4)
]
}
| collectFile
input[6] = false // val_allow_isoforms
"""
}
}

then {
assertAll(
{ assert workflow.success},
{ assert snapshot(workflow.out).match()}
)
}
}
}
Loading

0 comments on commit fb9a0f4

Please sign in to comment.