Skip to content

Commit

Permalink
Merge pull request #4 from alejandrogzi/lex_algorithm
Browse files Browse the repository at this point in the history
Pre-sorting using a lexicograph-based algorithm
  • Loading branch information
alejandrogzi authored Oct 17, 2023
2 parents 57b3952 + 7d1141e commit 1c974ed
Show file tree
Hide file tree
Showing 8 changed files with 675 additions and 187 deletions.
194 changes: 193 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "bed2gtf"
version = "1.6.0"
version = "1.7.0"
authors = ["alejandrogzi <jose.gonzalesdezavala1@unmsm.edu.pe>"]
edition = "2021"
license = "MIT"
Expand All @@ -18,6 +18,9 @@ peak_alloc = {version = "0.2.0"}
log = "0.4.14"
simple_logger = "4.0.0"
indoc = "1.0"
natord = "1.0.9"
thiserror = "1.0"
chrono = "0.4.31"

[profile.release]
lto = true
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ chr27 bed2gtf exon 17266470 17266572 . + . gene_id "ENSG00000151743"; transcript

in a few seconds.

>[!IMPORTANT]
>
>Now bed2gtf uses a lexicograph-based algorithm to offer the user not only a .gtf file but a nicely sorted .gtf file. The algorithm was originally implemented in [gtfsort](https://github.com/alejandrogzi/gtfsort), and some parts have been coupled with bed2gtf code.

## Usage
``` rust
Usage: bed2gtf[EXE] --bed <BED> --isoforms <ISOFORMS> --output <OUTPUT>
Expand Down Expand Up @@ -161,8 +166,7 @@ bed2gtf is basically the reimplementation of C binaries merged in 1 step. This t
### Limitations
At the time of bed2gtf being publicly available some gaps have not been covered yet.
1. By the last release of bed2gtf (1.5.0) a novel GTF sorting tool have benn already developed, covering one of the past limitations of bed2gtf, the tool resides here: [gtfsort](https://github.com/alejandrogzi/gtfsort)
2. Biotype. As you may know (or not), GTF files specify the gene_biotype of each entry (e.g. protein_coding, processed_pseudogene, snoRNA, etc). This is probably the biggest limitation in this release. Currently, bed2gtf DOES NOT assume any biotype. In future releases will probably be an option to specify the gene_biotype [-b/--biotype] or maybe be included in the isoforms file.
1. Biotype. As you may know (or not), GTF files specify the gene_biotype of each entry (e.g. protein_coding, processed_pseudogene, snoRNA, etc). This is probably the biggest limitation in this release. Currently, bed2gtf DOES NOT assume any biotype. In future releases will probably be an option to specify the gene_biotype [-b/--biotype] or maybe be included in the isoforms file.
### Annex
Expand Down
Loading

0 comments on commit 1c974ed

Please sign in to comment.