-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added documentation with Documenter.jl
- Loading branch information
Showing
7 changed files
with
186 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[deps] | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
|
||
[compat] | ||
Documenter = "~0.23" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
using Documenter, GenomicAnnotations | ||
|
||
makedocs(sitename = "GenomicAnnotations.jl", authors = "Karl Dyrhage", | ||
pages = [ | ||
"index.md", | ||
"I/O" => "io.md", | ||
"Accessing and modifying annotations" => "accessing.md", | ||
"Filtering: the @genes macro" => "genes.md" | ||
], | ||
format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true") | ||
) | ||
|
||
# deploydocs( | ||
# repo = "github.com/kdyrhage/GenomicAnnotations.jl.git", | ||
# ) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Accessing and modifying annotations | ||
|
||
# Feature | ||
Features (genes) can be added using `addgene!`. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section). | ||
```@docs | ||
addgene! | ||
``` | ||
|
||
After adding a new feature, `sort!` can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome: | ||
```julia | ||
sort!(chr) | ||
``` | ||
|
||
Existing features can be removed using `delete!`: | ||
```@docs | ||
delete!(::Gene) | ||
delete!(::AbstractVector{Gene}) | ||
``` | ||
|
||
# Qualifiers | ||
Features can have multiple qualifiers, which can be modified using Julia's property syntax: | ||
```julia | ||
# Remove newspace from gene product descriptions | ||
for gene in @genes(chr, iscds) | ||
replace!(gene.product, '\n' => ' ') | ||
end | ||
``` | ||
|
||
Properties also work on views of genes, typically generated using `@genes`: | ||
```julia | ||
interestinggenes = readlines("/path/to/list/of/interesting/genes.txt") | ||
@genes(chr, iscds, :locus_tag in interestinggenes).interesting .= true | ||
``` | ||
|
||
Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use `pushproperty!`: | ||
```@docs | ||
pushproperty! | ||
``` | ||
|
||
Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use `get()` than to access the property directly. | ||
```julia | ||
# chr.genes[2].pseudo returns missing, so this will throw an error | ||
if chr.genes[2].pseudo | ||
println("Gene 2 is a pseudogene") | ||
end | ||
|
||
# ... but this works: | ||
if get(chr.genes[2], :pseudo, false) | ||
println("Gene 2 is a pseudogene") | ||
end | ||
``` | ||
|
||
# Sequences | ||
The sequence of a `Chromosome` `chr` is stored in `chr.sequence`. Sequences of individual features can be read with `sequence`: | ||
```@docs | ||
sequence(::Gene) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Filtering: the @genes macro | ||
|
||
A useful tool provided by GenomicAnnotations is the macro `@genes`. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over: | ||
```julia | ||
# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes | ||
for gene in @genes(chr, iscds, length(gene) > 1000, ! :pseudo) | ||
println(gene.locus_tag) | ||
end | ||
``` | ||
|
||
```@docs | ||
@genes | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# GenomicAnnotations.jl | ||
|
||
## Description | ||
GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank format. | ||
|
||
|
||
## Installation | ||
```julia | ||
julia>] | ||
pkg> add GenomicAnnotations | ||
``` | ||
or | ||
```julia | ||
using Pkg | ||
Pkg.add("GenomicAnnotations") | ||
``` | ||
|
||
|
||
## Examples | ||
GenBank files are read with `readgbk(pathtofile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element. | ||
```julia | ||
chr = readgbk("test/example.gbk")[1] | ||
``` | ||
|
||
`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them. | ||
```julia | ||
for gene in chr.genes | ||
gene.locus_tag = "$(chr.name)_$(gene.locus_tag)" | ||
end | ||
|
||
chr.genes[2].locus_tag = "test123" | ||
``` | ||
|
||
The macro `@genes` can be used to filter through the annotations. The keyword `gene` is used to refer to the individual `Gene`s. `@genes` can also be used to modify annotations. | ||
```julia | ||
@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt | ||
``` | ||
|
||
Gene sequences can be accessed with `sequence(gene)`. For example, the following code will write the translated sequences of all protein-coding genes to a file: | ||
```julia | ||
using FASTX | ||
writer = FASTA.Writer(open("proteins.fasta", "w")) | ||
for gene in @genes(chr, iscds) | ||
aaseq = translate(sequence(gene)) | ||
write(writer, FASTA.record(gene.locus_tag, get(gene, :product, ""), aaseq)) | ||
end | ||
close(writer) | ||
``` | ||
|
||
Genes can be added using `addgene!`, and `sort!` can be used to make sure that the resulting annotations are in the correct order for printing. `delete!` is used to remove genes. | ||
```julia | ||
newgene = addgene!(chr, "regulatory", 670:677) | ||
newgene.locus_tag = "reg02" | ||
sort!(chr.genes) | ||
|
||
# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing` | ||
delete!(@genes(chr, :pseudo)) | ||
# Delete all genes 60 nt or shorter | ||
delete!(@genes(chr, length(gene) <= 60)) | ||
``` | ||
|
||
Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file. | ||
```julia | ||
println(chr.genes[1]) | ||
println(@genes(chr, iscds)) | ||
|
||
open("updated.gbk", "w") do f | ||
printgbk(f, chr) | ||
end | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# I/O | ||
|
||
## Input | ||
Annotation files are read with `readgbk(pathtofile)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2). | ||
|
||
```@docs | ||
readgbk(file) | ||
``` | ||
|
||
## Output | ||
Annotations can be printed as with GenBank formatting using `printgbk`. In the REPL, instances of `Gene` are displayed as they would be in the annotation file. | ||
|
||
```@docs | ||
printgbk | ||
``` |