Updated docs

BioJulia · Mar 25, 2021 · 7f2ae00 · 7f2ae00 · kdyrhage · Mar 25, 2021
1 parent 9125b33
commit 7f2ae00
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 14 deletions.
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -28,7 +28,11 @@ end
 
 addphobius!(chrs, "phobius.txt")
 
-printgbk("updated_genome.gbk", chrs)
+open(GenBank.Writer, "updated_genome.gbk") do w
+    for chr in chrs
+        write(w, chr)
+    end
+end
 ```
 
 
@@ -38,4 +42,9 @@ Note that GenBank and GFF3 headers do not contain the same information, thus all
 using GenomicAnnotations
 chrs = readgbk("genome.gbk")
 printgff("genome.gff", chrs)
+open(GFF.Writer, "genome.gff") do w
+    for chr in chrs
+        write(w, chr)
+    end
+end
 ```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -12,12 +12,20 @@ pkg> add GenomicAnnotations
 
 
 ## Examples
-GenBank and GFF3 files are read with `readgbk(input)` and `readgff(input)`, which return vectors of `Chromosome`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
+GenBank and GFF3 files are read with `readgbk(input)` and `readgff(input)`, which return vectors of `Record`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first record.
 ```julia
 chr = readgbk("test/example.gbk")[1]
 ```
+Another way to read files is to use the corresponding `Reader` directly:
+```julia
+open(GenBank.Reader, "test/example.gbk") do reader
+    for record in reader
+        println(record.name)
+    end
+end
+```
 
-`Chromosome`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
+`Record`s have five fields, `name`, `header`, `genes`, `genedata`, and `sequence`. The `name` is read from the `header`, which is stored as a string. The annotation data is stored in `genedata`, but generally you should use `genes` to access that data. For example, it can be used to iterate over annotations, and to modify them.
 ```julia
 for gene in chr.genes
     gene.locus_tag = "$(chr.name)_$(gene.locus_tag)"
@@ -37,8 +45,8 @@ using BioSequences
 using FASTX
 open(FASTA.Writer, "proteins.fasta") do w
     for gene in @genes(chr, CDS)
-        aaseq = sequence(gene; translate = true)
-        write(w, FASTA.record(gene.locus_tag, get(:product, ""), aaseq))
+        aaseq = GenomicAnnotations.sequence(gene; translate = true)
+        write(w, FASTA.Record(gene.locus_tag, get(:product, ""), aaseq))
     end
 end
 ```
@@ -55,12 +63,12 @@ delete!(@genes(chr, :pseudo))
 delete!(@genes(chr, length(gene) <= 60))
 ```
 
-Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `printgbk(io, chr)` can be used to write them to a file. `printgff(io, chr)` prints the annotations as GFF3, in which case the GenBank header is lost.
+Individual genes, and `Vector{Gene}`s are printed in GBK format. To include the GBK header and the nucleotide sequence, `write(::GenBank.Writer, chr)` can be used to write them to a file. Use `GFF.Writer` instead to print the annotations as GFF3, in which case the GenBank header is lost.
 ```julia
 println(chr.genes[1])
 println(@genes(chr, CDS))
 
-open("updated.gbk", "w") do f
-    printgbk(f, chr)
+open(GenBank.Writer, "updated.gbk") do w
+    write(w, chr)
 end
 ```
diff --git a/docs/src/io.md b/docs/src/io.md
@@ -1,19 +1,27 @@
 # I/O
 
 ## Input
-Annotation files are read with `readgbk(input)` or `readgff(input)`. Currently these assume that the file follows either standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2), or [GFF3](https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/). Any metadata in GFF3 files, apart from the header, is ignored.
+Annotation files are read with `GenBank.Reader` and `GFF.Reader`. Currently these assume that the file follows either standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2), or [GFF3](https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/). Any metadata in GFF3 files, apart from the header, is ignored.
+```julia
+open(GenBank.Reader, "example.gbk") do reader
+    for record in reader
+        do_something()
+    end
+end
+```
+`readgbk(input)` and `readgff(input)` are aliases for `collect(open(GenBank.Reader, input))` and `collect(open(GFF.Reader, input))`, respectively.
 
 ```@docs
-readgbk
-readgff
+GenBank.Reader
+GFF.Reader
 ```
 
 ## Output
-Annotations can be printed with GenBank formatting using `printgbk`, and as GFF3 with `printgff`. Headers are not automatically converted between formats; `printgff` only prints the header of the first `Chromosome`, and only if it starts with a `#`, while `printgbk` prints a default header if the stored one starts with `#`.
+Annotations can be printed with GenBank formatting using `GenBank.Writer`, and as GFF3 with `GFF.Writer`. Headers are not automatically converted between formats; `GFF.Writer` only prints the header of the first `Record`, and only if it starts with a `#`, while `GenBank.Writer` prints a default header if the stored one starts with `#`.
 
 ```@docs
-printgbk
-printgff
+GenBank.Writer
+GFF.Writer
 ```
 
 In the REPL, instances of `Gene` are displayed as they would be in the annotation file.