Skip to content

Commit

Permalink
Allow readgbk to take an IOStream
Browse files Browse the repository at this point in the history
  • Loading branch information
kdyrhage committed Dec 12, 2019
1 parent 7ca1c04 commit fa47b8a
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 32 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,16 @@
GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank format.

## Installation
GenomicAnnotations depends on [BioSequences](https://github.com/BioJulia/BioSequences.jl), which is registered in [BioJuliaRegistry](https://github.com/BioJulia/BioJuliaRegistry). To install it you must first add the registry to Julia's package manager:
```julia
julia>]
pkg> registry add https://github.com/BioJulia/BioJuliaRegistry.git
pkg> add GenomicAnnotations
```
or
```julia
using Pkg
Pkg.add("GenomicAnnotations")
```


## Usage
GenBank files are read with `readgbk(gbkfile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
GenBank files are read with `readgbk(input)`, which returns a vector of `Chromosome`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
```julia
chr = readgbk("test/example.gbk")[1]
```
Expand Down
9 changes: 3 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,16 @@ GenomicAnnotations is a package for reading, modifying, and writing genomic anno


## Installation
GenomicAnnotations depends on [BioSequences](https://github.com/BioJulia/BioSequences.jl), which is registered in [BioJuliaRegistry](https://github.com/BioJulia/BioJuliaRegistry). To install it you must first add the registry to Julia's package manager:
```julia
julia>]
pkg> registry add https://github.com/BioJulia/BioJuliaRegistry.git
pkg> add GenomicAnnotations
```
or
```julia
using Pkg
Pkg.add("GenomicAnnotations")
```


## Examples
GenBank files are read with `readgbk(pathtofile)`, which returns a vector of `Chromosome`s. `gbkfile` can be gzipped as long as the filename ends in ".gz". If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
GenBank files are read with `readgbk(input)`, which returns a vector of `Chromosome`s. `input` can be an `IOStream` or a file path. GZipped data can be read by setting the keyword `gunzip` to true, which is done automatically if a filename ending in ".gz" is passed as `input`. If we're only interested in the first chromosome in `example.gbk` we only need to store the first element.
```julia
chr = readgbk("test/example.gbk")[1]
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/io.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# I/O

## Input
Annotation files are read with `readgbk(pathtofile)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2).
Annotation files are read with `readgbk(input)`. Currently this assumes that the file follows standard [GenBank format](http://www.insdc.org/files/feature_table.html#7.1.2).

```@docs
readgbk(file)
Expand Down
46 changes: 27 additions & 19 deletions src/readgbk.jl
Original file line number Diff line number Diff line change
Expand Up @@ -175,28 +175,36 @@ end


"""
readgbk(filename)
readgbk(input, G::Type = Gene; gunzip = false)
Parse GenBank-formatted file `filename`, returning a `Vector{Chromosome}`.
Parse GenBank-formatted file, returning a `Vector{Chromosome}`. `input` can be a file path or an `IOStream`. File names ending in ".gz" are assumed to be gzipped and are decompressed. Setting `gunzip` to `true` forces this behaviour.
The type of `AbstractGene` to be used can be specified with `G`, though currently the only option is `Gene`.
"""
function readgbk(filename, G::Type = Gene)
function readgbk(filename::AbstractString, G::Type = Gene; gunzip = false)
gz = filename[end-2:end] == ".gz"
finished = false
chrs = Chromosome{G}[]
if gz
f = GZip.open(filename)
if gz || gunzip
GZip.open(f -> readgbk(f, G), filename)
else
f = open(filename)
open(f -> readgbk(f, G), filename)
end
lines = readlines(f)
currentline = 1
while !finished
if currentline >= length(lines)
break
end
i, chr = parsechromosome(lines[currentline:end], G)
currentline += i
push!(chrs, chr)
end
return chrs
end

function readgbk(input::IO, G::Type = Gene; gunzip = false)
finished = false
chrs = Chromosome{G}[]
if gunzip
lines = readlines(gzdopen(input))
else
lines = readlines(input)
end
currentline = 1
while !finished
if currentline >= length(lines)
break
end
i, chr = parsechromosome(lines[currentline:end], G)
currentline += i
push!(chrs, chr)
end
chrs
end

0 comments on commit fa47b8a

Please sign in to comment.