Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of a new sub-module 'status' to assembly module. #334

Merged
merged 46 commits into from
Apr 10, 2024

Conversation

ens-LCampbell
Copy link
Member

This submodule allows for users to:

  • track the assembly 'status' of INSDC accessions.
  • Status results obtained from assembly summary reports (JSON)
  • Users can request status checks by providing either A) list of core(s) dbs; or B) list of GCA/GCF accessions
  • Status tracking is achieved via use of containerised ncbi 'datasets' and integration of singularity to python (via spthon)
  • ncbi datasets is set up to perform batched searches, can be customised. Currently set to batches of n=100.
  • The results of said status check is stored for user inspection as TSV file.
  • Status of given INSDC accessions also retrieves (where it exists) the associated 'paired' assembly

This PR comprises the first iteration of assembly status tracking (i.e. version 1.0). With expanded functionality planned in later versions.

For other details on functionality see Jira ticket ENSMETAZOA-167

Copy link
Contributor

@MatBarba MatBarba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well and fast, nice one!

Lots of comments but mostly nitpicking and suggestions.

The main issue I had is that you need singularity installed, is there a way to run the system datasets instead?

containers/ncbi_datasets_v16.10.0.def Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
containers/ncbi_datasets_v16.10.0.def Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
ens-LCampbell and others added 4 commits March 26, 2024 13:51
Code refinement, superflous condition

Co-authored-by: Matthieu Barba <mbarba@ebi.ac.uk>
refinement on condition

Co-authored-by: Matthieu Barba <mbarba@ebi.ac.uk>
Fix typos

Co-authored-by: Matthieu Barba <mbarba@ebi.ac.uk>
typo fix

Co-authored-by: Matthieu Barba <mbarba@ebi.ac.uk>
Copy link
Contributor

@JAlvarezJarreta JAlvarezJarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a extremely useful functionality added, nice work.

I may want to revisit at some point in the future if we want to preserve this as part of our library or as a separate item for the single reason that this will not work inside a container (container within container). But for now, welcome to GenomIO!

containers/ncbi_datasets_v16.10.0.def Outdated Show resolved Hide resolved
containers/ncbi_datasets_v16.10.0.def Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
Copy link
Contributor

@JAlvarezJarreta JAlvarezJarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor edits and suggestions.

src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
src/python/ensembl/io/genomio/assembly/status.py Outdated Show resolved Hide resolved
ens-LCampbell and others added 13 commits April 10, 2024 10:20
Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
add newline

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
change shorthand qry -> query

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
qry -> query

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
qry->query

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
capitalisation to formated string

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
Warning message improvement

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
Remove batch size comment, not needed default set

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
shorthand wording fix

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
wording change

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
fix to formatted string and list parentheses

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
runtime error instead of critical logging

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
ens-LCampbell and others added 6 commits April 10, 2024 11:45
Improvement to print_json call()

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
shorthand wording fix

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
allow empty dict and check in TSV creation instead

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
py3.8 dictionary dict -> Dict declaration

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
punctuation

Co-authored-by: J. Alvarez-Jarreta <jalvarez@ebi.ac.uk>
@ens-LCampbell ens-LCampbell merged commit b483b5b into main Apr 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants