Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Use duckdb as backend instead of sqlite #59

Merged
merged 9 commits into from
Apr 3, 2024

Conversation

thomas-reimonn
Copy link
Contributor

@thomas-reimonn thomas-reimonn commented Apr 2, 2024

This PR adds a backend (default: sqlite, optional: duckdb). Also it forces ordering by columns requested in order.

Benchmarking:

Running tests using DuckDB backend
pytest 40.40s user 7.76s system 196% cpu 24.544 total
Running tests using Sqlite3
pytest 39.02s user 10.77s system 97% cpu 51.043 total

DuckDB is twice as fast because it used two cores, but about the same number of cpu cycles.

@thomas-reimonn
Copy link
Contributor Author

thomas-reimonn commented Apr 2, 2024

Resolves issue #58

@ivirshup ivirshup added the performance 🐌 go brrrrr label Apr 3, 2024
@ivirshup
Copy link
Member

ivirshup commented Apr 3, 2024

It looks like duckdb isn't consistent about the ordering even when it's called the same way, I think that's going to cause bugs in peoples code.

I'd suggest that we order the results before returning them.

I also want to discuss scope of this a bit, but will continue that in #58

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a question about what the right way to order is, but we could probably merge this before figuring that out.

Do the bioconductor packages enforce a particular ordering?

@@ -295,7 +308,7 @@ def _build_query(
else:
query = self.db.table(table)
# add filter
query = query.filter(filter.convert()).select(cols)
query = query.filter(filter.convert()).select(cols).order_by(cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvictus does this ordering look reasonable? Or is there a specific thing we should be ordering by?

@ivirshup
Copy link
Member

ivirshup commented Apr 3, 2024

I'm going to merge this an open an issue to discuss ordering

@ivirshup ivirshup merged commit 1c2a953 into scverse:main Apr 3, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants