-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GetObject request with Range Header to Skip HeadRequest - Part 3 #397
Conversation
@@ -53,6 +53,12 @@ struct aws_s3_auto_ranged_get { | |||
} synced_data; | |||
|
|||
uint32_t initial_message_has_range_header : 1; | |||
|
|||
bool initial_message_has_start_range; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bool because we can not have a pointer to bit field https://stackoverflow.com/questions/13547352/c-cannot-take-address-of-bit-field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By putting these new variables between two other bit-fields, it prevents those bit-fields from getting packed together, and the only point of a bit-field is to get packed together and reduce the overall size of a struct.
IMHO, the bit-fields in aws-c-s3 aren't worth having because they make the code uglier (need to type != 0
instead of just treating them like a normal bool), and these structs are not a big contributor to memory usage. But if we're going to have bit-fields, we should get some benefit from them. Having bitfields that make your code ugly AND getting no benefit from them... is dumb.
I see how you just wanted to put related variables together though. Consider doing 1 of these:
- keep all bit-fields together at the end of the struct (the traditional place to put them)
- stop using bit-fields altogether in this struct (find/replace ugly code that does
!= 0
) - make this a bit-field (together with other bit-fields), but yeah you can't pass it by pointer so you'll need to make a temp variable
bool initial_message_has_start_range_tmp
and pass THAT to your function by pointer, and then set the bit-field to the tmp variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed description, I have updated it to option 1 for now. I would have preferred option 2 but don't want to include that refactor in this PR.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## refactor-empty-file #397 +/- ##
=======================================================
+ Coverage 88.79% 88.82% +0.03%
=======================================================
Files 21 21
Lines 6078 6141 +63
=======================================================
+ Hits 5397 5455 +58
- Misses 681 686 +5
|
source/s3_util.c
Outdated
} | ||
|
||
/* verify that start-range <= end-range */ | ||
if (*out_initial_message_has_end_range && *out_initial_start_range > *out_initial_end_range) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we perform a HeadRequest
with startRange > endRange
such as bytes=20-10
, S3 does not add a Content-Range
header in the HeadRequest
response, and Content-Length
reflects the total length of the object. The CLI downloads the entire object. The current CRT behavior is that it fails with AWS_ERROR_S3_MISSING_CONTENT_RANGE_HEADER
at https://github.com/awslabs/aws-c-s3/blob/main/source/s3_auto_ranged_get.c#L510C24-L510C66. According to the HTTP spec, “An int-range is invalid if the last-pos value is present and less than the first-pos.” . This changes it to AWS_ERROR_S3_INVALID_RANGE_HEADER
error at request time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many trivial comments
@@ -53,6 +53,12 @@ struct aws_s3_auto_ranged_get { | |||
} synced_data; | |||
|
|||
uint32_t initial_message_has_range_header : 1; | |||
|
|||
bool initial_message_has_start_range; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By putting these new variables between two other bit-fields, it prevents those bit-fields from getting packed together, and the only point of a bit-field is to get packed together and reduce the overall size of a struct.
IMHO, the bit-fields in aws-c-s3 aren't worth having because they make the code uglier (need to type != 0
instead of just treating them like a normal bool), and these structs are not a big contributor to memory usage. But if we're going to have bit-fields, we should get some benefit from them. Having bitfields that make your code ugly AND getting no benefit from them... is dumb.
I see how you just wanted to put related variables together though. Consider doing 1 of these:
- keep all bit-fields together at the end of the struct (the traditional place to put them)
- stop using bit-fields altogether in this struct (find/replace ugly code that does
!= 0
) - make this a bit-field (together with other bit-fields), but yeah you can't pass it by pointer so you'll need to make a temp variable
bool initial_message_has_start_range_tmp
and pass THAT to your function by pointer, and then set the bit-field to the tmp variable
source/s3_auto_ranged_get.c
Outdated
// TODO: Should we align the first part? I need to run some benchmarks before re-adding first | ||
// part alignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another benefit of aligning the first part is: it makes it more likely that we get checksums, due to aligning with UploadPart boundaries
but oof, yeah, it would suck to send 2 requests when we could have gotten away with 1
just throwing out thoughts, you've got more of this in your head right now than I do...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will never get the checksum of the first partial part request, even if we align it to part_size, so we can't validate the checksum. I observed no difference in performance between aligning the first request or not aligning it. Therefore, I have removed the logic for aligning the first part.
source/s3_auto_ranged_get.c
Outdated
object_range_end = object_size - 1; | ||
if (auto_ranged_get->initial_message_has_end_range) { | ||
object_range_end = aws_min_u64(object_range_end, auto_ranged_get->initial_object_range_end); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utterly trivial: reads better as if/else
object_range_end = object_size - 1; | |
if (auto_ranged_get->initial_message_has_end_range) { | |
object_range_end = aws_min_u64(object_range_end, auto_ranged_get->initial_object_range_end); | |
} | |
if (auto_ranged_get->initial_message_has_end_range) { | |
object_range_end = aws_min_u64(object_range_end, auto_ranged_get->initial_object_range_end); | |
} else { | |
object_range_end = object_size - 1; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated object_range_end
is used in aws_min_u64
. I have changed it to an if/else structure and updated that line to directly use object_size - 1
.
tests/s3_util_tests.c
Outdated
bool initial_message_has_start_range; | ||
bool initial_message_has_end_range; | ||
uint64_t initial_object_range_start; | ||
uint64_t initial_object_range_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the bools "initial_message" but the ints "initial_object"? Is there some subtle difference I'm not picking up on?
(personally, I would omit the "initial_X" prefix from all of them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference; I just kept it similar to existing initial_message_has_range_header
, object_range_start
, and object_range_end
. I agree that it is confusing, especially initial_object_*
. I have updated it to the following:
bool initial_message_has_start_range;
bool initial_message_has_end_range;
uint64_t initial_range_start;
uint64_t initial_range_end;
I think range_start
and range_end
would be confusing alongside object_range_start
and object_range_end
, so I have kept the initial prefix since initial_range_end
and object_range_end
might differ. Plus, object_range_end
is part of the synced block, which we don't need for initial_range_end
.
Co-authored-by: Michael Graeb <graebm@amazon.com>
Co-authored-by: Michael Graeb <graebm@amazon.com>
# Conflicts: # source/s3_auto_ranged_get.c
@@ -1355,7 +1357,7 @@ static int s_s3_meta_request_incoming_body( | |||
} | |||
|
|||
if (request->send_data.response_body.capacity == 0) { | |||
if (request->has_part_size_response_body) { | |||
if (request->has_part_size_response_body && successful_response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the buffer ticket, as the error response might not fit into the reserved buffer.
Issue #, if available:
#370
Description of changes:
GetObject
request withRange
header logic to be parsed on the client side, as long as the Range includes a start-range.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.