Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle range header client-side #370

Closed
2 tasks
eeroel opened this issue Nov 14, 2023 · 4 comments
Closed
2 tasks

Handle range header client-side #370

eeroel opened this issue Nov 14, 2023 · 4 comments
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue

Comments

@eeroel
Copy link

eeroel commented Nov 14, 2023

Describe the feature

Would it be possible to avoid the HeadObject requests when doing a GET range request? I noticed this comment but I wonder if it's something that's feasible, or in the plans?

* For the range header value could be parsed client-side, doing so presents a number of

Use Case

When reading data in Parquet format (e.g. data lake applications), the file footer needs to be read first, so an implementation that reads from S3 needs to start with a HeadObject request and thus already knows the object size. The data itself may then be read in several small range requests, so making redundant HeadObject requests for each of those adds up latency. I understand that this library is optimized for throughput, but it would be great if there was a way to have those performance benefits without introducing latency in cases where the amount of data read is small.

Proposed Solution

I'm not familiar with the internals of the auto-range request implementation, but maybe the first request could be made to the last range (at the end of the object) so that an Unsatisfiable error will be returned if the range is out of bounds?

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change
@eeroel eeroel added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 14, 2023
@eeroel eeroel changed the title Handle range heade client-side Handle range header client-side Nov 14, 2023
@jmklix
Copy link
Member

jmklix commented Nov 15, 2023

This is something that we would like to add support for, but this is not currently a high priority.

@jmklix jmklix added p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 15, 2023
@waahm7
Copy link
Contributor

waahm7 commented Dec 28, 2023

@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?

@waahm7 waahm7 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 28, 2023
@eeroel
Copy link
Author

eeroel commented Dec 28, 2023

@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?

Nice, yes it does!

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 28, 2023
@waahm7
Copy link
Contributor

waahm7 commented Dec 28, 2023

@eeroel Thanks, this is resolved in https://github.com/awslabs/aws-c-s3/releases/tag/v0.4.8.

@waahm7 waahm7 closed this as completed Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

3 participants