Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

Merged
merged 14 commits into from
Jun 20, 2023

Conversation

TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Jun 19, 2023

Issue #, if available:

  • S3-Transfer-Manager Streaming #317
  • We used the array list of struct aws_byte_buf to store the checksum. And point to the checksum buffer in the list while we read from the stream and calculate the checksum. However, while we are using the buffer, the other thread from complete part callback can allocate a new part when we finish reading from the other part and may result in expand the array_list, which frees the old memory we are using, and lead to a crash.
    • here is where the list may grow
    • here is where we read from the list and pass it down to the input stream read

Description of changes:

  • use array list of struct aws_byte_buf* instead. However, the downside is we need to allocate the buffer every time we create a new one, which will add N allocation time, and may impact the performance (?)

TODO:

  • the checksum buffer will be initialized and updated by the aws_chunk_stream while we read from the input. So, if we use dynamic array with pre-allocated buffer, there will be a race condition between the expansion of the array to copy the buffer around and the buffer gets updated from the other thread. To prevent that:
    • a lock when we initialize the checksum buffer here, which I don't like by adding lock around.
    • Instead of aws_chunk_stream initialize the buffer, it just fill the given one without reallocate, which sounds like a workable solution if we have performance issue from extra allocation by this PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TingDaoK TingDaoK marked this pull request as ready for review June 19, 2023 20:51
@TingDaoK TingDaoK changed the title fix the checksum array-list use aws_array_list<aws_byte_buffer*> for encoded_checksum_list Jun 19, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2023

Codecov Report

Merging #318 (8c48d23) into main (cabcef7) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
- Coverage   88.91%   88.83%   -0.08%     
==========================================
  Files          17       17              
  Lines        4934     4936       +2     
==========================================
- Hits         4387     4385       -2     
- Misses        547      551       +4     
Impacted Files Coverage Δ
source/s3_auto_ranged_put.c 92.64% <100.00%> (+0.11%) ⬆️
source/s3_request_messages.c 74.88% <100.00%> (ø)

... and 1 file with indirect coverage changes

source/s3_auto_ranged_put.c Outdated Show resolved Hide resolved
source/s3_auto_ranged_put.c Outdated Show resolved Hide resolved
source/s3_auto_ranged_put.c Outdated Show resolved Hide resolved
Comment on lines 1240 to 1252
struct aws_byte_buf **checksum_buf_pointer = NULL;
aws_array_list_get_at_ptr(
&auto_ranged_put->synced_data.encoded_checksum_list, (void **)&checksum_buf, request->part_number - 1);
&auto_ranged_put->synced_data.encoded_checksum_list,
(void **)&checksum_buf_pointer,
request->part_number - 1);
/* Clean up the buffer in case of it's initialized before and retry happens. */
aws_byte_buf_clean_up(checksum_buf);
checksum_buf = *checksum_buf_pointer;
if (checksum_buf) {
aws_byte_buf_clean_up(checksum_buf);
} else {
checksum_buf = aws_mem_calloc(meta_request->allocator, 1, sizeof(struct aws_byte_buf));
*checksum_buf_pointer = checksum_buf;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is accessing synced_data.encoded_checksum_list without holding the lock

also, it's so damned confusing, it took me 20+ minutes to wrap my head around why we need to sometimes clean_up(), and other times allocate (but not aws_byte_buf_init())

I'm torn between suggesting comments that explaining what's going on here

or suggesting ways to make this code less confusing (stop allocating the byte-buf that the weird stream will initialize later)

maybe we can move this 1 block of code around to make more sense:

  1. delete everything here. s_s3_prepare_upload_part_finish() should not touch synced_data.encoded_checksum_list
  2. move the aws_mem_calloc(..., sizeof(struct aws_byte_buf)) up into the function above this, where we're already holding a lock to resize synced_data.encoded_checksum_list. But instead of setting NULL, set the allocated byte-buf
  3. move the aws_byte_buf_clean_up(checksum_buf); up into s_s3_prepare_upload_part() in the same else block with the comment /* Not the first time preparing request (e.g. retry).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but i think we should address it in 2 steps. let fix the issue first with the way it currently works and lets discuss how to refactor it better and we can do that in follow up pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, me and Dengke just had a chat and arrived at the same conclusion.
I can revisit this when I let the user pass in checksum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. Also, we discussed offline, instead of moving things around, I'll just add a lock here. But, still, we are updating the checksum buffer from the chunk stream without lock, which we should address later

source/s3_auto_ranged_put.c Outdated Show resolved Hide resolved
source/s3_auto_ranged_put.c Outdated Show resolved Hide resolved
@TingDaoK TingDaoK enabled auto-merge (squash) June 20, 2023 19:53
@TingDaoK TingDaoK merged commit 28f7b87 into main Jun 20, 2023
@TingDaoK TingDaoK deleted the fix-unknown-content-length branch June 20, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants