Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
kdyrhage committed Mar 25, 2021
1 parent 9125b33 commit 7f2ae00
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 14 deletions.
11 changes: 10 additions & 1 deletion docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,11 @@ end

addphobius!(chrs, "phobius.txt")

printgbk("updated_genome.gbk", chrs)
open(GenBank.Writer, "updated_genome.gbk") do w
for chr in chrs
write(w, chr)
end
end
```


Expand All @@ -38,4 +42,9 @@ Note that GenBank and GFF3 headers do not contain the same information, thus all
using GenomicAnnotations
chrs = readgbk("genome.gbk")
printgff("genome.gff", chrs)
open(GFF.Writer, "genome.gff") do w
for chr in chrs
write(w, chr)
end
end
```
22 changes: 15 additions & 7 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,20 @@ pkg> add GenomicAnnotations


## Examples
GenBank and GFF3 files are read with `readgbk(input)` and `readgff(input)`, which return vectors of `Chromosome`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
GenBank and GFF3 files are read with `readgbk(input)` and `readgff(input)`, which return vectors of `Record`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first record.
```julia
chr = readgbk("test/example.gbk")[1]
```
Another way to read files is to use the corresponding `Reader` directly:
```julia
open(GenBank.Reader, "test/example.gbk") do reader
for record in reader
println(record.name)
end
end
```

`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
`Record`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
```julia
for gene in chr.genes
gene.locus_tag = "$(chr.name)_$(gene.locus_tag)"
Expand All @@ -37,8 +45,8 @@ using BioSequences
using FASTX
open(FASTA.Writer, "proteins.fasta") do w
for gene in @genes(chr, CDS)
aaseq = sequence(gene; translate = true)
write(w, FASTA.record(gene.locus_tag, get(:product, ""), aaseq))
aaseq = GenomicAnnotations.sequence(gene; translate = true)
write(w, FASTA.Record(gene.locus_tag, get(:product, ""), aaseq))
end
end
```
Expand All @@ -55,12 +63,12 @@ delete!(@genes(chr, :pseudo))
delete!(@genes(chr, length(gene) <= 60))
```

Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file. `printgff(io, chr)` prints the annotations as GFF3, in which case the GenBank header is lost.
Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `write(::GenBank.Writer, chr)` can be used to write them to a file. Use `GFF.Writer` instead to print the annotations as GFF3, in which case the GenBank header is lost.
```julia
println(chr.genes[1])
println(@genes(chr, CDS))

open("updated.gbk", "w") do f
printgbk(f, chr)
open(GenBank.Writer, "updated.gbk") do w
write(w, chr)
end
```
20 changes: 14 additions & 6 deletions docs/src/io.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,27 @@
# I/O

## Input
Annotation files are read with `readgbk(input)` or `readgff(input)`. Currently these assume that the file follows either standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2), or [GFF3](https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/). Any metadata in GFF3 files, apart from the header, is ignored.
Annotation files are read with `GenBank.Reader` and `GFF.Reader`. Currently these assume that the file follows either standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2), or [GFF3](https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/). Any metadata in GFF3 files, apart from the header, is ignored.
```julia
open(GenBank.Reader, "example.gbk") do reader
for record in reader
do_something()
end
end
```
`readgbk(input)` and `readgff(input)` are aliases for `collect(open(GenBank.Reader, input))` and `collect(open(GFF.Reader, input))`, respectively.

```@docs
readgbk
readgff
GenBank.Reader
GFF.Reader
```

## Output
Annotations can be printed with GenBank formatting using `printgbk`, and as GFF3 with `printgff`. Headers are not automatically converted between formats; `printgff` only prints the header of the first `Chromosome`, and only if it starts with a `#`, while `printgbk` prints a default header if the stored one starts with `#`.
Annotations can be printed with GenBank formatting using `GenBank.Writer`, and as GFF3 with `GFF.Writer`. Headers are not automatically converted between formats; `GFF.Writer` only prints the header of the first `Record`, and only if it starts with a `#`, while `GenBank.Writer` prints a default header if the stored one starts with `#`.

```@docs
printgbk
printgff
GenBank.Writer
GFF.Writer
```

In the REPL, instances of `Gene` are displayed as they would be in the annotation file.

2 comments on commit 7f2ae00

@kdyrhage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/32820

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.0 -m "<description of version>" 7f2ae009a9478277481a8c575be33342c4a0bc73
git push origin v0.2.0

Please sign in to comment.