All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
v3.6.3 - 2024-11-04
- Dynamic dispatch to NEON connection scorer on Aarch64 MacOS.
v3.6.2 - 2024-11-03
pyrodigal
console script not being installed.
v3.6.1 - 2024-11-03
- Compilation of the connection scoring code for AVX-512.
- Import issue on platforms without AVX2 runtime support.
- Missing metadata in
pyproject.toml
.
v3.6.0 - 2024-11-02
- Support for Python 3.13.
- Reorganize project to build with CMake and
scikit-build-core
. - Build separate Python modules for various SIMD implementations to avoid potential linking issues.
- Pointer dereference issue when calling
TrainingInfo.load
in PyPI or with objects missing areadinto
method.
- Support for Python 3.6.
v3.5.2 - 2024-09-04
- Warning in CLI when given sequences with empty identifiers.
- FASTA parser used in CLI crashing on empty header lines (#61).
v3.5.1 - 2024-07-17
- Outdated code in
pyrodigal.cli
breaking the CLI.
v3.5.0 - 2024-07-17 - YANKED
- Support for reading from
stdin
in CLI (#35). - Flag for changing parallel computation to use
Pool
instead ofThreadPool
(#57). - Better documentation of command line interface (#56).
- Allow changing the formatter class in
pyrodigal.cli.argument_parser
.
- Migrate documentation to
pydata-sphinx-theme
.
- Cython warnings with unused
except *
statements inMetagenomicBins
. - Signatures of
__init__
methods missing from all Cython types after thev3.0
update. - Small typos in documentation.
v3.4.1 - 2024-05-23
- Refactor SIMD code to reduce number of required registers, and improve SSE2 performance.
- Refactor Prodigal initialization functions into sparse initializer code to reduce library size.
v3.4.0 - 2024-05-19
strict
argument toGene.translate
to control translation of ambiguous codons with unambiguous translation (#54).strict_translation
argument toGenes.write_genbank
andGenes.write_translation
.- Support for translation tables 26 to 33 in
Gene.translate
. - Support for translation tables 26, 29, 30, 32 and 33 in
GeneFinder.train
. Genes.score
property to count the total score of all extracted genes.full_id
parameter toGenes.write_gff
,Genes.write_translation
andGenes.write_genes
to control theID
field written for each gene (#53).
Gene.translate
now raises a warning when called with a translation table incompatible with the training info.
- Bug in code for masking trailing nucleotides (#55).
v3.3.0 - 2024-01-24
Scorer
internal API to separate connection scoring and overlap disentangling.
- Bug with computation of minimum node in connection scoring loop (hyattpd/Prodigal#108).
- Out-of-bounds sequence access in
_shine_dalgarno_exact
and_shine_dalgarno_mm
methods ofSequence
. - Memory leak in
Nodes.__setstate__
caused by incorrect reallocation.
v3.2.2 - 2024-01-21
- Always mark SSE2 support on x86-64 CPUs independently of
archspec
-detected features (#49).
v3.2.1 - 2023-11-27
- Option to change argument parser in
pyrodigal.cli.main
.
v3.2.0 - 2023-11-27
- AVX-512 implementation of the SIMD pre-filter.
- Additional support for reading
lz4
andxz
andzstd
-compressed input in the CLI. - Option to change gene finder type in
pyrodigal.cli.main
.
v3.1.1 - 2023-11-06
- Incorrect unpickling of
GeneFinder
causing crashes with multiprocessing (#46).
v3.1.0 - 2023-10-22
- Support for Python 3.12.
min_mask
argument toGeneFinder
to control the minimum lenght of masked regions onmask=True
.
v3.0.1 - 2023-09-27
Genes.write_scores
andGenes.write_gff
crashing on emptyGenes
(#44).
v3.0.0 - 2023-09-17
MetagenomicBins
collection to store a dense array ofMetagenomicBin
objects.metagenomic_bins
keyword argument toGeneFinder
allowing to control which models are used when running gene finding in meta mode (#24).metagenomic_bin
attribute toGenes
referencing the metagenomic model with which the genes were predicted, if in meta mode.- Additional
TrainingInfo
properties (missing_motif_weight
,coding_statistics
). - Setters for all remaining
TrainingInfo
properties. - Proper
TrainingInfo
constructor with configuration option for all attributes. TrainingInfo.to_dict
method to extract all parameters from aTrainingInfo
.Genes.write_genbank
method to write a GenBank record with all predicted genes from a sequence.include_stop
flag toGene.translate
andGenes.write_translations
to allow excluding the stop codon from the translated sequence.include_translation_table
flag toGenes.write_gff
to include the translation table to the GFF attributes of each gene.gbk
output format to the Pyrodigal CLI.Sequence.unknown
property exposing the number of unknown nucleotides in the sequence.Sequence.start_probability
andSequence.stop_probability
to estimate the probability of encountering a start and a stop codon based on the GC%.
Genes.write_gff
not properly reporting the number of bytes written.- Merge several
nogil
sections inSequence
constructor. - Several Cython functions missing a
noexcept
qualifier.
- BREAKING: Rename
OrfFinder
toGeneFinder
for consistency. - BREAKING: Use
memoryview
to expose allTrainingInfo
attributes instead manually building lists or tuples. - Reorganize memory management of the built-in metagenomic models.
- Make the internal Cython model public (
pyrodigal.lib
) to allow importing the underlying classes in other Cython projects. - Use
typing.Literal
for allowed translation table values inpyrodigal.lib
annotations - Cache intermediate log-odds in
Nodes._raw_coding_score
to reduce calls topow
andlog
functions. - Inline connection scoring functions to reduce function call overhead.
- Reorganize
struct _node
fields to reduce size in memory. - Make
GeneFinder.find_genes
andGeneFinder.train
reserve memory for theNodes
based on the GC% of the input sequence. - Avoid storing temporary results in the generic implementation of
ConnectionScorer.compute_skippable
. - Use Cython
freelist
for allocatingNode
,Gene
,MetagenomicBin
andMask
. - Increase minimum allocation for
Genes
andNodes
to reduce early reallocations.
- BREAKING:
metagenomic_bin
attribute ofTrainingInfo
.
v2.3.0 - 2023-07-20
- Bump Cython to
v3.0.0
.
v2.2.0 - 2023-06-19
- Release GIL while masking sequence regions in
Sequence.__init__
. - Use
archspec
instead ofcpu_features
for runtime feature detection.
- Support for reading
gzip
andbz2
-compressed input in the CLI. - CLI flag to run ORF detection in parallel when input contains several contigs.
- Support for Python 3.5.
v2.1.0 - 2023-02-20
- Update Prodigal to
v2.6.3+c1e2d36
to fix a bug with Shine-Dalgarno detection on reverse contig edge (hyattpd/Prodigal#100).
- ArchLinux User Repository package generation in CI.
v2.0.4 - 2023-01-09
- GC% computation and RBS scoring for reverse strand nodes close to the contig edge (#27).
v2.0.3 - 2022-12-20
OrfFinder(mask=True)
ignoring the minimum mask size when masking regions (#26).
- Use
cibuildhweel
for building wheel distributions.
- Wheels for MacOS Aarch64 platforms.
v2.0.2 - 2022-11-01
- Syntax issue in Cython files failing build on Bioconda runner.
v2.0.1 - 2022-11-01
- Syntax issue in Cython files failing build on some environments.
v2.0.0 - 2022-11-01
- MMX implementation of the SIMD prefilter.
- Proper GFF headers and metadata section to GFF output.
Sequence.gc_frame_plot
method to compute the max GC frame profile from Python.metagenomic_bin
property toTrainingInfo
to support recovering the object corresponding to a pre-trained model.meta
attribute toGenes
to store whether genes were predicted in single or in meta mode.pyrodigal.PRODIGAL_VERSION
constant storing the wrapped Prodigal version.pyrodigal.MIN_SINGLE_GENOME
andpyrodigal.IDEAL_SINGLE_GENOME
constants storing the minimum and recommended sequence sizes for training.
- Make all write methods of
Genes
objects require asequence_id
argument instead of using the internal sequence number. - Rewrite SIMD prefilter using a generic template with C macros.
- Make
Mask
record coordinates in start-inclusive end-exclusive mode to follow Python conventions. - Make connection scoring tests only score some randomly selected node pairs for faster runs.
- Rewrite tests to use
importlib.resources
for managing test data.
from_bytes
andfrom_string
constructors ofSequence
objects.
- Duplicate extraction of start codons located on contig edges inside
Nodes._extract
(#21). - Pickling and unpickling of
TrainingInfo
objects corresponding to pre-trained models. - Implementation of
calc_most_gc_frame
being inconsistent with the Prodigal implementation. - Implementation of the maximum search in
score_connection_forward_start
not following the (weird?) behaviour from Prodigal (#21). - Gene identifier being used instead of the sequence identifier in the GFF output (#18).
- Out of bound access to sequence data in
Sequence._shine_dalgarno_mm
andSequence._shine_dalgarno_exact
.
v1.1.2 - 2022-08-31
- Use the
vbicq
Arm intrinsic in the NEON implementation to combinevandq
andvmvnq
.
- Prevent direct instantiation of
Node
andGene
objects from Python code. - Configuration of platform-specific NEON flags in
setup.py
not being applied to the linker.
v1.1.1 - 2022-07-08
- Some
cpu_features
source files not being included in source distribution.
v1.1.0 - 2022-06-09
OrfFinder.train
can now be given more than one sequence argument to train on contigs from an unclosed genome.- Updated
cpu_features
tov0.7.0
and added hardware detection of NEON features on Linux Aarch64 platforms.
v1.0.2 - 2022-05-13
- Detection of Arm64 platform in
setup.py
(#16).
v1.0.1 - 2022-04-28
pyrodigal.cli
now concatenates training sequences the same way as Prodigal does.
v1.0.0 - 2022-04-20
Stable version, to be published in the Journal of Open-Source Software.
pickle
protocol implementation forNodes
,TrainingInfo
,OrfFinder
,Sequence
,Masks
andGenes
objects.- Buffer protocol implementation for
Sequence
, allowing access to raw digits. __eq__
and__repr__
magic methods toMask
objects.
- Optimized code used for region masking to avoid searching for the same mask repeatedly.
TRANSLATION_TABLES
andMETAGENOMIC_BINS
are now exposed as constants in the toppyrodigal
module.- Refactored connection scoring into different functions based on the type (start/stop) and strand (direct/reverse) of the node being scored.
- Changed the growth factor for dynamic arrays to be the same as the one used in CPython
list
buffers.
v0.7.3 - 2022-04-06
Gene.score
property to get the gene score as reported in the score data string.
OrfFinder.find_genes
not producing consistent results across runs in meta mode (#13).OrfFinder.find_genes
returningNodes
with incomplete score information.
v0.7.2 - 2022-03-15
- Improve performance of
mer_ndx
andscore_connection
using dedicated implementations with better branch prediction. - Mark arguments as
const
in C code where possible.
- Signatures of Cython classes not displaying properly because of the
embedsignature
directive. _sequence.h
functions not being inlined as expected.
v0.7.1 - 2022-03-14
- Rewrite internal
Sequence
code using inlined functions to increase performance when the strand is known.
Nodes.copy
potentially failing on empty collections after trying to allocate 0 bytes.TestGenes.test_write_scores
failing on some machines because of float rounding issues.Gene.translate
ignoring theunknown_residue
argument value and always using"X"
.- Memory leak in
Pyrodigal.train
cause by memory not being freed after building the GC frame plot.
v0.7.0 - 2022-03-12
- Support for setting a custom minimum gene length in
pyrodigal.OrfFinder
. Genes.write_scores
method to write the node scores to a file.Gene.__repr__
andNode.__repr__
methods to display some useful attributes.Sequence.__str__
method to get back a nucleotide string from aSequence
object.
- Use a more compact data structure to store
Gene
data.
Nodes._calc_orf_gc
reading nucleotides after the sequence end when computing GC content for edge nodes.
pyrodigal.Pyrodigal
class (usepyrodigal.OrfFinder
instead).pyrodigal.Predictions
class (functionality merged intopyrodigal.Genes
).
v0.6.4 - 2021-12-23
load
anddump
methods toTrainingInfo
for storing and loading a raw training info structure.- Support for creating an
OrfFinder
pre-configured with a training info. -t
and-n
flags to the CLI.
v0.6.3 - 2021-12-23
pyrodigal
command line script exposing a CLI mimicking the originalprodigal
binary.write_gff
,write_genes
andwrite_translations
methods topyrodigal.Predictions
to write the predictions results to a file in different formats.- Implementation for masking regions of unknown nucleotides in input sequences.
- Renamed
pyrodigal.Pyrodigal
class topyrodigal.OrfFinder
.
setup.py
build different SIMD implementations with the same set of feature flags, causing compilers to re-optimize the SIMD implementations.
v0.6.2 - 2021-09-25
- Sphinx documentation with small install guide and API reference.
setup.py
not detecting SSE2 and AVX2 build support because of a linker error.
- Build OSX extension without AVX2 support since runtime detection of AVX2 to avoid the
Illegal Instruction: 4
bug on older CPUs.
v0.6.1 - 2021-09-24
- Source distribution lacking C files necessary for building
cpu_features
.
v0.6.0 - 2021-09-23
- SIMD code to build an index of which connections can be skipped when scoring node connections in the dynamic programming routine (#6).
v0.5.4 - 2021-09-18
Prediction.confidence
method to compute the confidence for a prediction like reported in Prodigal's GFF output.Prediction.sequence
method get the nucleotide sequence of a predicted gene (#4).
- Replaced internal storage of input sequences to use a byte array instead of a bitmap.
- Extract
Prediction.gc_cont
number directly from the start node instead of the text representation to get full accuracy. - Prodigal bug causing nodes on the reverse strand to always receive a penalty instead of penalizing only small ORFs (hyattpd/Prodigal#88).
v0.5.3 - 2021-09-12
Prediction.translate
not translating the last unknown codon properly for genes on the direct strand.
v0.5.2 - 2021-09-11
- Make
Pyrodigal.train
return a reference to the newly createdTrainingInfo
for inspection if needed. - Reimplement
add_nodes
andadd_genes
to use a growable array instead of counting and pre-allocating the C arrays.
- Inconsistent handling of unknown nucleotides in input sequences and gene translations.
v0.5.1 - 2021-09-04
- Additional
Gene
properties to access the score
- Use more efficient
PyUnicode
macros when reading or creating a string containing a nucleotide or a protein sequence. - Release the GIL when creating a bitmap for an
str
given as input toPyrodigal.find_genes
. - Release the GIL when creating the protein sequence returned by
Gene.translate
.
Pyrodigal.find_genes
andGene.translate
not behaving like Prodigal when handling sequences with unknown nucleotides.
v0.5.0 - 2021-06-15
pyrodigal.TrainingInfo
class exposing variables obtained during training as an attribute toPyrodigal
,Gene
andGenes
instance.- Support for passing objects implementing the buffer protocol to
Pyrodigal.find_genes
andPyrodigal.train
instead of requiringstr
sequences.
- Potential data race on training info in case a
Gene.translate
with a non-default translation table was being translated at the same time as aPyrodigal.find_genes
call. - Spurious handling of Unicode strings causing potential issues on platform using a different base encoding.
v0.4.7 - 2021-04-09
Pyrodigal.find_genes
segfaulting on some sequences when called insingle
mode (#2).MemoryError
potentially not being properly raised on allocation issues for sequence bitmaps.
v0.4.6 - 2021-03-05
- Tests are now in the
pyrodigal.tests
module and can be run after a site install.
Pyrodigal.find_genes
stalling on sequences shorter than 3 nucleotides.
v0.4.5 - 2021-03-03
- Compilation of OSX and Windows wheels.
v0.4.4 - 2021-03-03
- Mark package as OS-independent.
- Support for Python 3.5.
- Compilation of PyPy wheels on OSX.
v0.4.3 - 2021-03-01
- Buffer overflow when running in
meta
mode on a sequence too small to have any dynamic programming nodes.
v0.4.2 - 2021-02-07
- Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
v0.4.1 - 2021-01-07
- Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
- Broken linking of static
libprodigal
against the_pyrodigal
extension on some OSX environments (bioconda/bioconda-recipes#25568).
v0.4.0 - 2021-01-06
trans_table
keyword argument toPyrodigal.train
has been renamed totranslation_table
.
- Option to change the translation table to any allowed number in
Gene.translate
(#1).
v0.3.2 - 2020-11-27
- Broken compilation of PyPy wheels in Travis-CI.
v0.3.1 - 2020-11-27
- Link to Zenodo record in
README.md
. Typing :: Typed
classifier to the PyPI metadata.- Explicit support for Python 3.9.
- Streamlined compilation process when building from source distribution.
v0.3.0 - 2020-09-07
- Thread-safety for all
Pyrodigal
methods
- Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence.
v0.2.4 - 2020-09-04
- Precompiled wheels for Windows x86-64 platform.
- Compilation of large
Prodigal/training.c
file is now done in chunks and usesstatic const
to reduce build time.
v0.2.3 - 2020-08-09
- Buffer overflow issue with Pyrodigal in
closed=False
mode.
v0.2.2 - 2020-07-14
- Access to the translation table of a
Gene
object.
v0.2.1 - 2020-05-29
- Memory issues causing PyPy to crash when using
Pyrodigal
in single mode.
v0.2.0 - 2020-05-28
- Support for Prodigal's single mode.
v0.1.1 - 2020-04-30
- Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
v0.1.0 - 2020-04-27
Initial release.