diff --git a/.travis.yml b/.travis.yml index 59e444f..7b9e788 100644 --- a/.travis.yml +++ b/.travis.yml @@ -2,3 +2,14 @@ language: julia julia: - 1.0 - 1.1 + - 1.2 + +jobs: + include: + - stage: "Documentation" + julia: 1.2 + os: linux + script: + - julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()' + - julia --project=docs/ docs/make.jl + after_success: skip diff --git a/docs/Project.toml b/docs/Project.toml new file mode 100644 index 0000000..3fcbfb7 --- /dev/null +++ b/docs/Project.toml @@ -0,0 +1,5 @@ +[deps] +Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" + +[compat] +Documenter = "~0.23" diff --git a/docs/make.jl b/docs/make.jl new file mode 100644 index 0000000..3cc40dd --- /dev/null +++ b/docs/make.jl @@ -0,0 +1,15 @@ +using Documenter, GenomicAnnotations + +makedocs(sitename = "GenomicAnnotations.jl", authors = "Karl Dyrhage", + pages = [ + "index.md", + "I/O" => "io.md", + "Accessing and modifying annotations" => "accessing.md", + "Filtering: the @genes macro" => "genes.md" + ], + format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true") +) + +# deploydocs( +# repo = "github.com/kdyrhage/GenomicAnnotations.jl.git", +# ) diff --git a/docs/src/accessing.md b/docs/src/accessing.md new file mode 100644 index 0000000..c822d4e --- /dev/null +++ b/docs/src/accessing.md @@ -0,0 +1,57 @@ +# Accessing and modifying annotations + +# Feature +Features (genes) can be added using `addgene!`. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section). +```@docs +addgene! +``` + +After adding a new feature, `sort!` can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome: +```julia +sort!(chr) +``` + +Existing features can be removed using `delete!`: +```@docs +delete!(::Gene) +delete!(::AbstractVector{Gene}) +``` + +# Qualifiers +Features can have multiple qualifiers, which can be modified using Julia's property syntax: +```julia +# Remove newspace from gene product descriptions +for gene in @genes(chr, iscds) + replace!(gene.product, '\n' => ' ') +end +``` + +Properties also work on views of genes, typically generated using `@genes`: +```julia +interestinggenes = readlines("/path/to/list/of/interesting/genes.txt") +@genes(chr, iscds, :locus_tag in interestinggenes).interesting .= true +``` + +Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use `pushproperty!`: +```@docs +pushproperty! +``` + +Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use `get()` than to access the property directly. +```julia +# chr.genes[2].pseudo returns missing, so this will throw an error +if chr.genes[2].pseudo + println("Gene 2 is a pseudogene") +end + +# ... but this works: +if get(chr.genes[2], :pseudo, false) + println("Gene 2 is a pseudogene") +end +``` + +# Sequences +The sequence of a `Chromosome` `chr` is stored in `chr.sequence`. Sequences of individual features can be read with `sequence`: +```@docs +sequence(::Gene) +``` diff --git a/docs/src/genes.md b/docs/src/genes.md new file mode 100644 index 0000000..78cd21d --- /dev/null +++ b/docs/src/genes.md @@ -0,0 +1,13 @@ +# Filtering: the @genes macro + +A useful tool provided by GenomicAnnotations is the macro `@genes`. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over: +```julia +# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes +for gene in @genes(chr, iscds, length(gene) > 1000, ! :pseudo) + println(gene.locus_tag) +end +``` + +```@docs +@genes +``` diff --git a/docs/src/index.md b/docs/src/index.md new file mode 100644 index 0000000..9849b7a --- /dev/null +++ b/docs/src/index.md @@ -0,0 +1,70 @@ +# GenomicAnnotations.jl + +## Description +GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank format. + + +## Installation +```julia +julia>] +pkg> add GenomicAnnotations +``` +or +```julia +using Pkg +Pkg.add("GenomicAnnotations") +``` + + +## Examples +GenBank files are read with `readgbk(pathtofile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element. +```julia +chr = readgbk("test/example.gbk")[1] +``` + +`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them. +```julia +for gene in chr.genes + gene.locus_tag = "$(chr.name)_$(gene.locus_tag)" +end + +chr.genes[2].locus_tag = "test123" +``` + +The macro `@genes` can be used to filter through the annotations. The keyword `gene` is used to refer to the individual `Gene`s. `@genes` can also be used to modify annotations. +```julia +@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt +``` + +Gene sequences can be accessed with `sequence(gene)`. For example, the following code will write the translated sequences of all protein-coding genes to a file: +```julia +using FASTX +writer = FASTA.Writer(open("proteins.fasta", "w")) +for gene in @genes(chr, iscds) + aaseq = translate(sequence(gene)) + write(writer, FASTA.record(gene.locus_tag, get(gene, :product, ""), aaseq)) +end +close(writer) +``` + +Genes can be added using `addgene!`, and `sort!` can be used to make sure that the resulting annotations are in the correct order for printing. `delete!` is used to remove genes. +```julia +newgene = addgene!(chr, "regulatory", 670:677) +newgene.locus_tag = "reg02" +sort!(chr.genes) + +# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing` +delete!(@genes(chr, :pseudo)) +# Delete all genes 60 nt or shorter +delete!(@genes(chr, length(gene) <= 60)) +``` + +Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file. +```julia +println(chr.genes[1]) +println(@genes(chr, iscds)) + +open("updated.gbk", "w") do f + printgbk(f, chr) +end +``` diff --git a/docs/src/io.md b/docs/src/io.md new file mode 100644 index 0000000..aaaa386 --- /dev/null +++ b/docs/src/io.md @@ -0,0 +1,15 @@ +# I/O + +## Input +Annotation files are read with `readgbk(pathtofile)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2). + +```@docs +readgbk(file) +``` + +## Output +Annotations can be printed as with GenBank formatting using `printgbk`. In the REPL, instances of `Gene` are displayed as they would be in the annotation file. + +```@docs +printgbk +```