Added documentation with Documenter.jl

BioJulia · Sep 25, 2019 · e895a48 · e895a48
1 parent 74cfe19
commit e895a48
Show file tree

Hide file tree

Showing 7 changed files with 186 additions and 0 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -2,3 +2,14 @@ language: julia
 julia:
   - 1.0
   - 1.1
+  - 1.2
+
+jobs:
+  include:
+    - stage: "Documentation"
+      julia: 1.2
+      os: linux
+      script:
+        - julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
+        - julia --project=docs/ docs/make.jl
+      after_success: skip
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -0,0 +1,5 @@
+[deps]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+
+[compat]
+Documenter = "~0.23"
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,15 @@
+using Documenter, GenomicAnnotations
+
+makedocs(sitename = "GenomicAnnotations.jl", authors = "Karl Dyrhage",
+    pages = [
+        "index.md",
+        "I/O" => "io.md",
+        "Accessing and modifying annotations" => "accessing.md",
+        "Filtering: the @genes macro" => "genes.md"
+    ],
+    format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true")
+)
+
+# deploydocs(
+#     repo = "github.com/kdyrhage/GenomicAnnotations.jl.git",
+# )
diff --git a/docs/src/accessing.md b/docs/src/accessing.md
@@ -0,0 +1,57 @@
+# Accessing and modifying annotations
+
+# Feature
+Features (genes) can be added using `addgene!`. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).
+```@docs
+addgene!
+```
+
+After adding a new feature, `sort!` can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:
+```julia
+sort!(chr)
+```
+
+Existing features can be removed using `delete!`:
+```@docs
+delete!(::Gene)
+delete!(::AbstractVector{Gene})
+```
+
+# Qualifiers
+Features can have multiple qualifiers, which can be modified using Julia's property syntax:
+```julia
+# Remove newspace from gene product descriptions
+for gene in @genes(chr, iscds)
+    replace!(gene.product, '\n' => ' ')
+end
+```
+
+Properties also work on views of genes, typically generated using `@genes`:
+```julia
+interestinggenes = readlines("/path/to/list/of/interesting/genes.txt")
+@genes(chr, iscds, :locus_tag in interestinggenes).interesting .= true
+```
+
+Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use `pushproperty!`:
+```@docs
+pushproperty!
+```
+
+Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use `get()` than to access the property directly.
+```julia
+# chr.genes[2].pseudo returns missing, so this will throw an error
+if chr.genes[2].pseudo
+    println("Gene 2 is a pseudogene")
+end
+
+# ... but this works:
+if get(chr.genes[2], :pseudo, false)
+    println("Gene 2 is a pseudogene")
+end
+```
+
+# Sequences
+The sequence of a `Chromosome` `chr` is stored in `chr.sequence`. Sequences of individual features can be read with `sequence`:
+```@docs
+sequence(::Gene)
+```
diff --git a/docs/src/genes.md b/docs/src/genes.md
@@ -0,0 +1,13 @@
+# Filtering: the @genes macro
+
+A useful tool provided by GenomicAnnotations is the macro `@genes`. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:
+```julia
+# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes
+for gene in @genes(chr, iscds, length(gene) > 1000, ! :pseudo)
+    println(gene.locus_tag)
+end
+```
+
+```@docs
+@genes
+```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -0,0 +1,70 @@
+# GenomicAnnotations.jl
+
+## Description
+GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank format.
+
+
+## Installation
+```julia
+julia>]
+pkg> add GenomicAnnotations
+```
+or
+```julia
+using Pkg
+Pkg.add("GenomicAnnotations")
+```
+
+
+## Examples
+GenBank files are read with `readgbk(pathtofile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
+```julia
+chr = readgbk("test/example.gbk")[1]
+```
+
+`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
+```julia
+for gene in chr.genes
+    gene.locus_tag = "$(chr.name)_$(gene.locus_tag)"
+end
+
+chr.genes[2].locus_tag = "test123"
+```
+
+The macro `@genes` can be used to filter through the annotations. The keyword `gene` is used to refer to the individual `Gene`s. `@genes` can also be used to modify annotations.
+```julia
+@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt
+```
+
+Gene sequences can be accessed with `sequence(gene)`. For example, the following code will write the translated sequences of all protein-coding genes to a file:
+```julia
+using FASTX
+writer = FASTA.Writer(open("proteins.fasta", "w"))
+for gene in @genes(chr, iscds)
+    aaseq = translate(sequence(gene))
+    write(writer, FASTA.record(gene.locus_tag, get(gene, :product, ""), aaseq))
+end
+close(writer)
+```
+
+Genes can be added using `addgene!`, and `sort!` can be used to make sure that the resulting annotations are in the correct order for printing. `delete!` is used to remove genes.
+```julia
+newgene = addgene!(chr, "regulatory", 670:677)
+newgene.locus_tag = "reg02"
+sort!(chr.genes)
+
+# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing`
+delete!(@genes(chr, :pseudo))
+# Delete all genes 60 nt or shorter
+delete!(@genes(chr, length(gene) <= 60))
+```
+
+Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file.
+```julia
+println(chr.genes[1])
+println(@genes(chr, iscds))
+
+open("updated.gbk", "w") do f
+    printgbk(f, chr)
+end
+```
diff --git a/docs/src/io.md b/docs/src/io.md
@@ -0,0 +1,15 @@
+# I/O
+
+## Input
+Annotation files are read with `readgbk(pathtofile)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2).
+
+```@docs
+readgbk(file)
+```
+
+## Output
+Annotations can be printed as with GenBank formatting using `printgbk`. In the REPL, instances of `Gene` are displayed as they would be in the annotation file.
+
+```@docs
+printgbk
+```