Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Dask: sort and argsort #239

Merged
merged 1 commit into from
Jan 26, 2025
Merged

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Jan 23, 2025

Crude implementation of sort and argsort for dask.array, which is functionally correct but can be extremely memory and network-intensive.

A better solution would be to implement these two functions in dask.array itself, on top of the shuffle subsystem which is already used for dask.dataframe.DataFrame.sort_values.

FYI @fjetter @phofl @hendrikmakait

@crusaderky crusaderky force-pushed the dask_sort branch 4 times, most recently from 1816c16 to 1a7316f Compare January 23, 2025 10:20
@crusaderky
Copy link
Contributor Author

FYI @lucascolley @lithomas1

@crusaderky crusaderky force-pushed the dask_sort branch 3 times, most recently from c0f8617 to 8500867 Compare January 23, 2025 11:21
@crusaderky
Copy link
Contributor Author

@ev-br @lucascolley ready for review and merge.

@crusaderky crusaderky mentioned this pull request Jan 23, 2025
Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like to get feedback from a Dask expert before we merge this? Or are you confident that it is at least good enough for now?

@crusaderky
Copy link
Contributor Author

I'm confident that this is tolerable at least as a temporary crutch. It will work for some geometries and will go OOM for others, which IMHO is vastly better than not having anything at all.

@lucascolley lucascolley merged commit fa558f2 into data-apis:main Jan 26, 2025
42 checks passed
@crusaderky crusaderky deleted the dask_sort branch January 26, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants