-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write efficiency: direct I/O? #53
Comments
Yes! This looks great and I would probably use it myself. Thanks for putting together such a nice little demo.
I am also interested in this, but I have been lazily waiting for support from |
Great to hear! I'll work on a minimal PR as a basis for discussion.
Yeah, integrating this different style of I/O generically into such libraries is no small task, I imagine. |
Thanks for your work on this @sk1p! xref direct IO PRs and comments related to compressed writing/potential further improvements: |
Thank you for the discussion and reviews! To pull the discussion back here, from the closed PR(s): (#58):
Interesting. I'll need to have a closer look. Next on my TODO list is a full integration into our software, then I can see how well the approach works in practice - I'll write an update in this issue if I don't forget :) If there is a need, I could also work on adding |
I'm currently adding a zarr writer to our project, which can be roughly described as data acquisition and live processing framework for electron microscopy. I'm trying to make the writing operation as low-overhead as possible, to make room for actual data processing in the same data pipeline. (I'm also interested in offering compressed writing, which of course has a different CPU vs I/O profile, for users with a beefier system, but that's a different topic).
One approach I've used in the past was to use direct I/O bypass the page cache, resulting in much better performance (much closer to what the hardware can actually deliver).
I've built a demo repository that compares
zarrs
uncompressed write speed with a small prototype that writes the chunks "manually" using direct I/O: https://github.com/LiberTEM/zarr-dio-proto/On the system I have available (AMD EPYC 7F72, 2x KCM61VUL3T20 NVMe SSD in a RAID0), the direct I/O approach is about 5x faster than the buffered I/O approach. There's also a branch that directly puts the data into a page-size-aligned buffer, which is a bit less realistic but still interesting (it writes the 32GiB in ~4s, which is about the limit of the SSDs). This is all on a single core.
Is there interest in integrating a
FilesystemStore
with direct I/O capabilities intozarrs
? Getting this working as fast as the prototype would require some structural changes, too, which probably have to be done incrementally.I'm also interested in trying an
io_uring
implementation, which would be the modern way for high-performance I/O on Linux systems.The text was updated successfully, but these errors were encountered: