Skip to content

Commit

Permalink
Added documentation with Documenter.jl
Browse files Browse the repository at this point in the history
  • Loading branch information
kdyrhage committed Sep 25, 2019
1 parent 74cfe19 commit e895a48
Show file tree
Hide file tree
Showing 7 changed files with 186 additions and 0 deletions.
11 changes: 11 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,14 @@ language: julia
julia:
- 1.0
- 1.1
- 1.2

jobs:
include:
- stage: "Documentation"
julia: 1.2
os: linux
script:
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- julia --project=docs/ docs/make.jl
after_success: skip
5 changes: 5 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"

[compat]
Documenter = "~0.23"
15 changes: 15 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
using Documenter, GenomicAnnotations

makedocs(sitename = "GenomicAnnotations.jl", authors = "Karl Dyrhage",
pages = [
"index.md",
"I/O" => "io.md",
"Accessing and modifying annotations" => "accessing.md",
"Filtering: the @genes macro" => "genes.md"
],
format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true")
)

# deploydocs(
# repo = "github.com/kdyrhage/GenomicAnnotations.jl.git",
# )
57 changes: 57 additions & 0 deletions docs/src/accessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Accessing and modifying annotations

# Feature
Features (genes) can be added using `addgene!`. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).
```@docs
addgene!
```

After adding a new feature, `sort!` can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:
```julia
sort!(chr)
```

Existing features can be removed using `delete!`:
```@docs
delete!(::Gene)
delete!(::AbstractVector{Gene})
```

# Qualifiers
Features can have multiple qualifiers, which can be modified using Julia's property syntax:
```julia
# Remove newspace from gene product descriptions
for gene in @genes(chr, iscds)
replace!(gene.product, '\n' => ' ')
end
```

Properties also work on views of genes, typically generated using `@genes`:
```julia
interestinggenes = readlines("/path/to/list/of/interesting/genes.txt")
@genes(chr, iscds, :locus_tag in interestinggenes).interesting .= true
```

Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use `pushproperty!`:
```@docs
pushproperty!
```

Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use `get()` than to access the property directly.
```julia
# chr.genes[2].pseudo returns missing, so this will throw an error
if chr.genes[2].pseudo
println("Gene 2 is a pseudogene")
end

# ... but this works:
if get(chr.genes[2], :pseudo, false)
println("Gene 2 is a pseudogene")
end
```

# Sequences
The sequence of a `Chromosome` `chr` is stored in `chr.sequence`. Sequences of individual features can be read with `sequence`:
```@docs
sequence(::Gene)
```
13 changes: 13 additions & 0 deletions docs/src/genes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Filtering: the @genes macro

A useful tool provided by GenomicAnnotations is the macro `@genes`. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:
```julia
# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes
for gene in @genes(chr, iscds, length(gene) > 1000, ! :pseudo)
println(gene.locus_tag)
end
```

```@docs
@genes
```
70 changes: 70 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# GenomicAnnotations.jl

## Description
GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank format.


## Installation
```julia
julia>]
pkg> add GenomicAnnotations
```
or
```julia
using Pkg
Pkg.add("GenomicAnnotations")
```


## Examples
GenBank files are read with `readgbk(pathtofile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
```julia
chr = readgbk("test/example.gbk")[1]
```

`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
```julia
for gene in chr.genes
gene.locus_tag = "$(chr.name)_$(gene.locus_tag)"
end

chr.genes[2].locus_tag = "test123"
```

The macro `@genes` can be used to filter through the annotations. The keyword `gene` is used to refer to the individual `Gene`s. `@genes` can also be used to modify annotations.
```julia
@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt
```

Gene sequences can be accessed with `sequence(gene)`. For example, the following code will write the translated sequences of all protein-coding genes to a file:
```julia
using FASTX
writer = FASTA.Writer(open("proteins.fasta", "w"))
for gene in @genes(chr, iscds)
aaseq = translate(sequence(gene))
write(writer, FASTA.record(gene.locus_tag, get(gene, :product, ""), aaseq))
end
close(writer)
```

Genes can be added using `addgene!`, and `sort!` can be used to make sure that the resulting annotations are in the correct order for printing. `delete!` is used to remove genes.
```julia
newgene = addgene!(chr, "regulatory", 670:677)
newgene.locus_tag = "reg02"
sort!(chr.genes)

# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing`
delete!(@genes(chr, :pseudo))
# Delete all genes 60 nt or shorter
delete!(@genes(chr, length(gene) <= 60))
```

Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file.
```julia
println(chr.genes[1])
println(@genes(chr, iscds))

open("updated.gbk", "w") do f
printgbk(f, chr)
end
```
15 changes: 15 additions & 0 deletions docs/src/io.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# I/O

## Input
Annotation files are read with `readgbk(pathtofile)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2).

```@docs
readgbk(file)
```

## Output
Annotations can be printed as with GenBank formatting using `printgbk`. In the REPL, instances of `Gene` are displayed as they would be in the annotation file.

```@docs
printgbk
```

0 comments on commit e895a48

Please sign in to comment.