New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use GTFs in TnSeq analysis #9

Open

jsa-aerial opened this issue May 16, 2018 · 0 comments

Owner

jsa-aerial commented May 16, 2018

calc_fitness and aggregate both use gbk parsing to determine annotations but this has two problems:

It introduces dependency on bioperl and biopython - nothing else in them uses this
neither of them properly parse gbks with multiple locus entries - say for whole genome and some associated plasmids

Using GTFs:

eliminates these dependencies - making installation simpler
simplifies the 'parse' - basically it is just csv read and pick fields
easy to create GTFs with multiple locus entries (the 'chromosome' field) from multiple gbks
gbks can be kept simple - single locus per gbk
runs involving a strain with whole genome and associated plasmids become simple to accommodate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment