Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRT HTTP Client: HTTP Connections are not properly released #4815

Closed
i-volkov opened this issue Jan 5, 2024 · 8 comments
Closed

CRT HTTP Client: HTTP Connections are not properly released #4815

i-volkov opened this issue Jan 5, 2024 · 8 comments
Assignees
Labels
bug This issue is a bug. crt-client p2 This is a standard priority issue

Comments

@i-volkov
Copy link

i-volkov commented Jan 5, 2024

Describe the bug

We are using AwsCrtAsyncHttpClient for DynamoDB API calls.
Everything was working fine, however after we upgraded to the new version of SDK we noticed, that over the time LeasedConcurrency metric value is growing steadily and when it reaches maximum capacity all API requests starting to fail with timeout exception apparently because client is not able to acquire new connections from the pool.
Another observation is that the speed of increase in LeasedConcurrency is correlated to the RetryCount, so it might be that connections from the failed requests are not properly released.

Expected Behavior

All connections are properly released and connection pool always has available connections

Current Behavior

Some connections are not properly released which leads to the lack of available connections

Reproduction Steps

var clientBuilder = DynamoDbAsyncClient.builder()
clientBuilder.overrideConfiguration(ClientOverrideConfiguration.builder()
   .apiCallAttemptTimeout(Duration.ofMillis(dynamoDbConfiguration.getApiCallAttemptTimeout().toMillis()))
   .apiCallTimeout(Duration.ofMillis(dynamoDbConfiguration.getApiCallTimeout().toMillis()))
   .retryPolicy(RetryPolicy.builder()
      .numRetries(dynamoDbConfiguration.getNumRetries())
      .retryCondition(RetryCondition.defaultRetryCondition())
      .build())
   .addMetricPublisher(new DynamoDbDataDogMetricPublisher())
   .build());

var client = clientBuilder.httpClient(AwsCrtAsyncHttpClient.builder()
   .connectionMaxIdleTime(httpClientConfiguration.getConnectionMaxIdleTime())
   .connectionTimeout(httpClientConfiguration.getConnectionTimeout())
   .maxConcurrency(httpClientConfiguration.getMaxConcurrency())
   .build());

Possible Solution

No response

Additional Information/Context

The last working SDK version is 2.18.34, so theoretically the change which caused the regression should be here:
2.18.34...2.18.35
The issue is reproducible on the latest SDK version 2.22.10.

AWS Java SDK version used

2.18.35

JDK version used

openjdk version "17.0.7" 2023-04-18 LTS OpenJDK Runtime Environment Zulu17.42+19-CA (build 17.0.7+7-LTS)

Operating System and version

Linux 6.1.61-85.141.amzn2023.x86_64

@i-volkov i-volkov added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 5, 2024
@debora-ito
Copy link
Member

Hi @i-volkov thank you for reaching out.

  • Was there any change in code or in the environment, besides upgrading the SDK version?
  • Which DynamoDB API are you using?
  • Do you have the logs with the retry errors that correlates to the increased LeasedConcurrency? They might help in reproducing the issue on our side

@debora-ito debora-ito added investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jan 9, 2024
@debora-ito debora-ito self-assigned this Jan 9, 2024
@debora-ito debora-ito added the p2 This is a standard priority issue label Jan 9, 2024
@i-volkov
Copy link
Author

i-volkov commented Jan 10, 2024

Hi @i-volkov thank you for reaching out.

  • Was there any change in code or in the environment, besides upgrading the SDK version?
  • Which DynamoDB API are you using?
  • Do you have the logs with the retry errors that correlates to the increased LeasedConcurrency? They might help in reproducing the issue on our side

Thank you for the resposne.
Answering your questions:

  • Only SDK version was updated
  • GetItem and PutItem
  • By errors I mean exceptions caused by the exceeded api attempt timeout. With 50 ms limit for attempt there are more retires than with 150 ms and LeasedConcurrency is growing more slowly. We don't have logs for that, because these exceptions are logged by AWS SDK using DEBUG logging level as I understand

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 10, 2024
@debora-ito
Copy link
Member

debora-ito commented Jan 13, 2024

@i-volkov I think I 'm able to reproduce, still need to confirm if it's the same problem you are running into.

If I set a low apiCallAtemptTimeout like 20ms, some requests will succeed, but some will fail and will be retried. When using version 2.18.34 (and aws-crt-client 2.18.34-PREVIEW) and calling GetItem in a loop, LeasedConcurrency is 2 at max. Running the same code with version 2.18.35 (and 2.18.35-PREVIEW for aws-crt-client), the LeasedConcurrency jumps to 50.

We still need to investigate the root cause.

@debora-ito debora-ito removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jan 13, 2024
@i-volkov
Copy link
Author

i-volkov commented Jan 14, 2024

@debora-ito
Yes, this is pretty much the same behaviour we are observing, the only difference is that in our case it eventually leads to lack of available connections in the pool and all requests start to time out.

With 50 ms limit for attempt there are more retires than with 150 ms and LeasedConcurrency is growing more slowly

The above sentence from my previous post is incorrect, the right way of phrasing is: the more retries we have the faster LeasedConcurrency metric grows. Sorry for the confusion.

@joviegas
Copy link
Contributor

joviegas commented Jan 18, 2024

The issue was occurring because of PR-3603 added in 2.18.35. Prior to this fix in 2.18.34, since we did not return when Future.isCompletedExceptionally complted, the connection used to get closed in successResponseHandler.onStream(publisher), which is right below it.
However, the latest fix PR-4825 introduced a new ResponseHandlerHelper that releases the connection in these cases. I tried with Version 2.23.2 and no longer saw the issue.

The fix is available in version 2.23.2. I would recommend using the latest version.

@i-volkov
Copy link
Author

The issue was occurring because of PR-3603 added in 2.18.35. Prior to this fix in 2.18.34, since we did not return when Future.isCompletedExceptionally complted, the connection used to get closed in successResponseHandler.onStream(publisher), which is right below it. However, the latest fix PR-4825 introduced a new ResponseHandlerHelper that releases the connection in these cases. I tried with Version 2.23.2 and no longer saw the issue.

The fix is available in version 2.23.2. I would recommend using the latest version.

Thank you! We did a quick test using version 2.23.2 and the issue is gone

@debora-ito
Copy link
Member

@i-volkov thank you for confirming the fix is working. Closing this.

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. crt-client p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants