-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3-Transfer-Manager Streaming #317
Comments
Hey @djchapm , thanks for reporting the issue. I did reproduce the crash and 400 sometimes and it's a bug from our side, and I have a possible fix for it. But for the jump in time to complete the upload, I am not sure I get what you mean.
So, the time really varies for me during my tests. I saw time between 23 sec to 40 sec during my test with 1.28 GB object. Edit: I'll look into why there is a such difference after the read complete to the whole upload completes to get more info. |
I think the time difference you saw is irrelevant with the unknown content-length change. I used aws-crt@0.21.17, which is before the unknown content-length stuff merged. I had a run with 128MB object, and result in:
In this run, it took 18.9 secs. And from the CRT log: we can see that the last upload part request was made at
However, the response has not been received until
So, I think the time issue you saw is really just internet or something with the server side. To add more info, I also tested the latest change with unknown content length, the last upload part is created basically the same time as the body read completes, and the whole object upload completes basically matches when the last upload part response receives. I saw the major difference is the S3 response to the request varies, it took some time from 7 secs as bellow.
and the other time as 20 secs, logs:
|
Add more: |
Thanks @TingDaoK for the analysis. Regarding with/without minimum part size - it was random for me but it seemed I could push more data before hitting an error when it was unset. May not be true as it was somewhat random. I just mentioned in case that meant something because more often than not it was better with unknown size. On the response time - some background. This whole solution (being able to upload without knowing content length) is being worked around by us - we buffer up to an estimated part size. Once it's full we push the part and start a new buffer. We keep track of parts and overall size so that we can create the completeMultipartUpload request at the end. Not sure if this is what you are doing or not. So this is our old code in AWS S3 V1 SDK and I'm using it to compare to the new features in aws-c-s3+TransferManger V2. My throughput is consistently 2-3 times higher with the old solution, and it's much more consistent on the response times. Maybe we're buffering more and sending in larger chunks than you, or maybe you're making more calls than we are for the final processing - I'm not sure. We'd like to get to the point where we see comparable performance. I'll see if I can get some numbers together on that. |
Figured out the discrepancy - with our manual workaround we are using compression which makes a huge difference. Now the times are more comparable. Below times are with a 15MB part size In order to use compression - I copied "BlockingOutputStreamAsyncRequestBody.java" from aws-core and wrapped it in a GZipOutputStream - for some reason they lock this up and make it package private - I'll submit request to open that up. Attached is the updated test code. |
The fix of the crash and error has been merged and released, I don't think the discrepancy is related to CRT. |
Please use the latest aws-crt-java@v0.22.2 https://github.com/awslabs/aws-crt-java/tree/v0.22.2. And let us know if you still have issues |
@TingDaoK what is the difference between https://github.com/awslabs/aws-crt-java and https://github.com/aws/aws-sdk-java-v2/tree/master/http-clients/aws-crt-client ? |
@TingDaoK , when we run in AWS, we are seeing this stack trace using the code that @djchapm attached. It works without issue on Windows. software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: An input string was passed to a parser and the string was incorrectly formatted. |
@rickyeng127 are you modifying the sample in any way? that is a common parser error and in s3 case its probably error in parsing host header or endpoint override. one possible reason for it working on windows, but not on linux is due to using backslashes in the path @campidelli |
@DmitriyMusatkin, I am not modifying the sample. The following is the sample code that is working in Windows but not in AWS. `public class S3StreamLoader { private Logger _logger = LogManager.getLogger(S3StreamLoader.class); private String _bucketName; private long _bytesWritten; public S3StreamLoader(String bucketName, String objectKey) throws InterruptedException { public void initiateRequest() throws InterruptedException {
} public void write(byte[] buf) { public S3AsyncClient getS3AsyncClient() { public S3TransferManager getTransferManager(S3AsyncClient s3AsyncClient) { public CompletedUpload finish() { public class ConsumerFeed { private Logger _logger = LogManager.getLogger(ConsumerFeed.class); private Consumer _dataFeed; Flux _flux = Flux.create(sink -> { public Flux getFlux() { public CountDownLatch getSubscription() { public Consumer getDataFeed() { void complete() { public class S3TransferListener implements TransferListener { private Logger _logger = LogManager.getLogger(S3TransferListener.class); final String _resource; public S3TransferListener(String objectKey) { @OverRide private void status(long l) { @OverRide @OverRide @OverRide public class TestClass { public static void main(string[] args) { |
Hey @TingDaoK - I've run a couple tests today using aws-crt-java@v0.22.2, uploading files of > 17G with part size of 15M without issue. |
Hi @TingDaoK , Can you provide sample code on how you ran these tests? |
@rickyeng127 I just used the sample provided by @djchapm . And I don't think i made any change. edit: Oh, the exception actually comes from us, as Dmitriy said, to enable the log will probably help. @djchapm Sounds good! We can also update the issue from the java sdk side. |
@rickyeng127 i would recommend turning trace logging in sdk to see if it will provide any more info on whats failing. you can use one of the init methods here to do that - https://awslabs.github.io/aws-crt-java/software/amazon/awssdk/crt/Log.html. I agree we should close this issue. @rickyeng127 if you continue facing the issue i would recommend you open a separate one with sample code and trace log |
Hi @DmitriyMusatkin, I enabled trace debugging and in fact see an error in the uri parser. Note, I am attempting to write to a file in a top level bucket (with no prefix). Here is the error trace log showing the error. Is this an error in AWS or my code? As noted earlier, this works when connected to AWS remotely but fails when running natively in AWS. [TRACE] [2023-07-05T18:11:01Z] [00007fdbc6242700] [event-loop] - id=0x7fdbe42c5a80: detected more scheduled tasks with the next occurring at 0, using timeout of 0. |
@rickyeng127, it seems that you have an invalid proxy configuration in your AWS settings, which is causing the issue. Please verify the proxy configuration and also check the |
@waahm7, these requests are being initiated from an ECS container to S3. Do I still need to configure a HTTPS_PROXY when running in this mode? |
@rickyeng127 It is not required to configure proxy. Your environment has some invalid proxy configuration which is causing the issue as the S3 client tries to parse the proxy url and fails.
|
@waahm7, When I debug these system properties, they all return null, meaning they haven't been set. How is the https_proxy environment being detected?
|
@rickyeng127 The environment variable name is |
Describe the bug
I've been testing combination of latest aws-c-s3 update regarding #285 and the ability to stream data of unknown size to S3 using a publisher.
I have 2 concerns aside from errors I'm getting -
Platform: Macbook Pro, Ventura 13.4, 32GB, Apple M1 Max ARM64 processor.
Aws CLI: aws-cli/2.12.1 Python/3.11.4 Darwin/22.5.0 source/arm64
TargetThroughput: 20.0Gbps
Minimum Part Size: 1000000L (I think this causes issues after 128M)
I've mocked up data which is just a string of 128 Bytes that I send over and over in a ByteBuffer.
Errors include failed response (400) from awssdk, missing checksums for parts, and SIGSEGV on libobjc.A.dylib... I've also received a SIGABRT which doesn't even give me a dump.
Attaching a simple Java project to test - configure whichever credentials and a bucket name and execute - as you increase num lines you'll start to see issues. I create the crt log in whatever your work directory happens to be.
Expected Behavior
Files uploaded successfully and can manage at least 100G S3 Objects.
Closing the stream doesn't grow with file size so drastically as it does now.
Current Behavior
Crashes, Failed uploads, heavy delay on completing the upload.
Reproduction Steps
s3AsyncTest.zip
Possible Solution
No response
Additional Information/Context
No response
aws-c-s3 version used
aws.sdk.version 2.20.79, aws.crt.version 0.22.1
Compiler and version used
openjdk 17.0.3 2022-04-19
Operating System and version
Darwin US10MAC44VWYPKH 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64 arm Darwin
The text was updated successfully, but these errors were encountered: