The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Added cDNA and CDS outputs to <OUTPUT_DIR>/annotations/ directory #118
- Added parameter
add_attrs_to_proteins_cds_fastas
- Added parameter
filter_genes_by_aa_length
with default set to24
which allows removal of genes with ORFs shorter than 24 #125
- Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein coding genes #121
- Switched branch name from
master
tomain
in the GHA CIs - Fixed an issue in
genepal_report.Rmd
which caused the pangene matrix plot to fail when the number of clusters exceeded 65536 #124 - Fixed an issue where
GENEPALREPORT
process failed due to OOM kill signal from SLURM #123 - Fixed an issue where Gff merge after liftoff failed when one of the Gff files did not contain any genes
- Fixed an issue where
gxf_fasta_agat_spaddintrons_spextractsequences
crashed due to short introns #89
- Nextflow!>=24.04.2
- nf-schema@2.1.1
- Removed parameter
add_attrs_to_proteins_fasta
- Added MultiQC #65
- Updated nf-core template to 3.0.2 #66
- Integrated nf-test into pipeline CI #68
- Updated the flowchart #87
- Added a large test dataset for the
test_full
profile #90 - Now
.gff.gz
and.gff3.gz
inputs are also allowed for thebenchmark
column in--input
- Now removing liftoff genes with any intron shorted than 10bp #89
- Now also removing
rRNA
andtRNA
after liftoff as the downstream logic in the pipeline can not correctly handle these - Now skipping FastQC by default #98
- Added an HTML report #44
- Added content type as text/html for the MultiQC and genepal reports
- Added sra-tools for RNASeq data download #102
- Now using
${meta.id}_trim
as prefix forFASTQC
files - Updated citations to include DOIs
- Fixed a bug where FASTQ versions were not correctly captured
- Now using the correct out channel from
STAR_ALIGN
. This bug was introduced by a module update during the development of this version #74 - Fixed OrthoFinder results copy failure on AWS #108
- Nextflow!>=24.04.2
- nf-schema@2.1.1
- Resource parameters have been removed:
max_memory
,max_cpus
,max_time
- Removed a number of unnecessary parameters:
monochromeLogs
,config_profile_contact
,config_profile_url
,validationFailUnrecognisedParams
,validationLenientMode
,validationSchemaIgnoreParams
,validationShowHiddenParams
,validate_params
- Removed
extra_fastp_args
and replaced it withfastp_extra_args
- Removed and replaced
skip_fastp
andskip_fastqc
withfastp_skip
andfastqc_skip
#82
- Added
orthofinder_annotations
param - Added
FASTA_GFF_ORTHOFINDER
sub-workflow - Added evaluation by BUSCO #41
- Included common tax ids for eggnog mapper #27
- Implemented hierarchical naming scheme: geneI.tJ, geneI.tJ.exonK, geneI.tJ.cdsK #19, #34
- Now sorting list of bam and list of fastq before cat to avoid resume cache misses
- Allowed BAM files for RNA evidence #3
- Added
GXF_FASTA_AGAT_SPADDINTRONS_SPEXTRACTSEQUENCES
sub-workflow for splice type statistics #11 - Changed
orthofinder_annotations
from FASTA/GFF to protein FASTA #43 - Added param
enforce_full_intron_support
to turn on/off strict model purging by TSEBRA #21 - Added param
filter_liftoff_by_hints
to evaluate liftoff models with TSEBRA to make sure they have the same level of evidence as BRAKER #28 - Added a script to automatically check module version updates
- Reduced
BRAKER3
threads to 8 #55 - Now the final annotations are stored in the
annotations
folder #53 - Now a single
fasta
file can be directly specified forprotein_evidence
eggnogmapper_db_dir
is not a required parameter anymoreeggnogmapper_tax_scope
is now set to 1 (root div) by default- Added a
test
profile based on public data - Added parameter
add_attrs_to_proteins_fasta
to enable/disable addition of decoded gff attributes to proteins fasta #58 - Added a check for input assemblies. If an assembly is smaller than 1 MB (or 300KB in zipped format), the pipeline errors out before starting the downstream processes #47
- Now
REPEATMASKER
GFF output is saved viaCUSTOM_RMOUTTOGFF3
#54 - Added
benchmark
column to the input sheet and usedGFFCOMPARE
to perform benchmarking #63 - Added
SEQKIT_RMDUP
to detect duplicate sequence and wrap the fasta to 80 characters - Updated parameter section labels for annotation and post-annotation filtering #64
- Updated modules and sub-workflows
- Fixed BRAKER spellings #36
- Fixed liftoff failure when lifting off from a single reference #40
- Added versions from GFF_STORE sub-workflows #33
- NextFlow!>=23.04.4
- nf-validation=1.1.3
- Renamed
external_protein_fastas
param toprotein_evidence
- Renamed
fastq
param torna_evidence
- Renamed
braker_allow_isoforms
param toallow_isoforms
- Moved liftoffID from gene level to mRNA/transcript level
- Moved
version_check.sh
to.github/version_checks.sh
- Removed dependency on https://github.com/kherronism/nf-modules.git for
BRAKER3
andREPEATMASKER
modules which are now installed from https://github.com/GallVp/nxf-components.git - Removed dependency on https://github.com/PlantandFoodResearch/nxf-modules.git
- Now the final annotations are not stored in the
final
folder - Now BRAKER3 outputs are not saved by default #53 and saved under
etc
folder when enabled - Removed
local
profile. Local executor is the default when no executor is specified. Therefore, thelocal
profile was not needed. - Removed
CUSTOM_DUMPSOFTWAREVERSIONS
pipeline_info/software_versions.yml
has been replaced withpipeline_info/genepal_software_mqc_versions.yml
- Added a stub test to evaluate the case where an assembly is soft masked but has no annotations
- Fixed a bug where
is_masked
was ignored by the pipeline - Fixed a bug in param validation which allowed specification of
braker_hints
withoutbraker_gff3
- NextFlow!>=23.04.4
- nf-validation=1.1.3
- Increased time limit for REPEATMODELER_REPEATMODELER to 5 days
- Now removing comments from fasta file before feeding it to BRAKER added tests for the perl one liner
- Fixed CHANGELOG version check failure in
version_check.sh
- Increased the SLURM job time limit to 14 days
- NextFlow!>=23.04.4
- nf-validation=1.1.3
- Increased time limit for REPEATMODELER_REPEATMODELER to 3 days, REPEATMASKER to 2 days, EDTA_EDTA to 7 days, BRAKER3 to 7 days and EGGNOGMAPPER to 1 day
- NextFlow!>=23.04.4
- nf-validation=1.1.3
- Added changelog and semantic versioning
- Changed license to MIT
- Updated
.editorconfig
- Moved .literature to test/ branch
- Renamed
genepal_local
tolocal_genepal
- Renamed
genepal_pfr
topfr_genepal
- Added versioning checking
- Updated github workflow to use pre-commit instead of prettier and editorconfig check
- Added central singularity cache dir for pfr config
- Added
SORTMERNA_INDEX
beforeSORTMERNA
- Fixed sample contamination bug introduced by
file.simpleName
- Now using empty files for stub testing in CI
- Now BRAKER can be skipped by including BRAKER outputs from previous runs in the
target_assemblies
param - Added
gffcompare
to merge liftoff annotations - Renamed
samplesheet
param tofastq
- Now using assemblysheet in combination with nf-validation for assembly input
- Added nextflow_schema.json
- Now using nf-validation to validate fastqsheet provided by params.fastq
- Moved
manifest.config
andreporting_defaults.config
content tonextflow.config
- Now using a txt file for
params.external_protein_fastas
- Now using nf-validation for
params.liftoff_annotations
- Now using nf-validation for all the parameters
- Added
PURGE_BRAKER_MODELS
sub-workflow - Added
GFF_EGGNOGMAPPER
sub-workflow - Now using a custom version of
GFFREAD
which supportsmeta
andfasta
- Now using TSEBRA to purge models which do not have full intron support from BRAKER hints
- Added params
eggnogmapper_evalue
andeggnogmapper_pident
- Added
PURGE_NOHIT_BRAKER_MODELS
sub-workflow - Now merging BRAKER and liftoff models before running eggnogmapper
- Added
GFF_MERGE_CLEANUP
sub-workflow - Now using
description
field to store notes and textual annotations in the gff files - Now using
mRNA
in place oftranscript
in gff files - Now
eggnogmapper_purge_nohits
is set tofalse
by default - Added
GFF_STORE
sub workflow external_protein_fastas
andeggnogmapper_db_dir
are not mandatory parameters- Added contributors
- Add a document for the pipeline parameters
- Updated
pfr_genepal
andpfr/profile.config
- Now using local tests/stub files for GitHub CI
- Now removing iso-forms left by TSEBRA using
AGAT_SPFILTERFEATUREFROMKILLLIST
- Added
pyproject.toml
- Now using PFAMs from eggnog if description is '-'
- Removed liftoff models with
valid_ORF=False
- Updated license text to include 'Copyright (c) 2024 The New Zealand Institute for Plant and Food Research Limited'
- NextFlow!>=23.04.4
- nf-validation=1.1.3