-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrement in_use_connections when a connection is not reusable #267
Conversation
Thanks for looking at this @pcorey ! @oliveigah do you mind taking a quick look at this? I'm surprised this seems to fix the issue. To me it looks like this could cause the values to become negative, but maybe I'm not thinking this through enough 🤔 |
Sure! Gonna take a look at this today after work. When I have more info I will post it here. |
Thank you so much! No rush at all 🙏 |
After gathering some context I agree with @sneako! It's not clear to me that this change solves the problem, in fact I did a test considering idle connection termination and this changes breaks it. The test is in this gist: https://gist.github.com/oliveigah/cbe7fa1ecb711a6006af5dd05c6803e7 The assert that breaks is the third call to assert {:ok,
[
%PoolMetrics{
pool_index: 1,
pool_size: 10,
available_connections: 2,
in_use_connections: 8
}
]} = Finch.get_pool_status(finch_name, shp) The result with the proposed changes are: {:ok, [%Finch.HTTP1.PoolMetrics{available_connections: 10, in_use_connections: 0, pool_index: 1, pool_size: 10}]} This happens because when the connection is idle, every time the user makes a checkout 2 calls to {:ok, [%Finch.HTTP1.PoolMetrics{available_connections: 18, in_use_connections: -8, pool_index: 1, pool_size: 10}]} In summary, I think this change does not address the real problem we might have here. About the root cause of the problem my thoughts are:
Gonna try to emulate some other scenarios to see if I can find any other possible root cause. @pcorey did this pool have a high throughput? I'm also suspecting about some weird interaction between metrics and pool termination, can you check if you have pool termination events in this pool? |
Thanks for looking into this @oliveigah! To add to the confusion, my "fix", as you guessed, didn't actually work. I was conflating metrics from three different tenants. Two of those tenants are behaving normally; The third tenant is the one having issues. Here's a plot of it's This tenant's Finch pool does experience higher load than the other two. The current request pattern is ~1 hour of 30-50 requests/minute followed by ~20 minutes of 1-5 requests/minute:
I'm not seeing any strange or unexpected errors in my logs. The only thing that may possibly be related is that all three of my tenants get connection closed errors from the upstream host (
I haven't had any pool termination events ( |
Thanks for the details @pcorey! The only problem I can see is some unexpected error on the Lines 115 to 122 in fb52686
I've done a test that artificially raise when calling it and then I got the same behaviour you described! The test is here: https://gist.github.com/oliveigah/a7cf1c7931440ba6c647866fd3e99d74 The best way to solve this is to add a new callback to nimble pool when the request is canceled https://github.com/dashbitco/nimble_pool/blob/deae8ffdb52ce5265a68299eea3e5074f1e8d5a7/lib/nimble_pool.ex#L443-L444. I've implemented a naive version of this new callback locally and it solves the problem. Gonna open an issue on nimble pool in order to gather some thoughts about this new callback. Then we can use it here and see if it solves your problem! Is it ok to you @sneako? |
PR was merged on nimble_pool. dashbitco/nimble_pool#44 Since the maintainer asked for us try this fix before releasing a new version of nimble pool, I've implemented it on my fork https://github.com/oliveigah/finch using it from master instead of hex. All of my previously broken tests have been fixed with the changes, but if you could test it on your system and report the results would be awesome @pcorey! Let me know if I can help with anything. |
@oliveigah I'm deploying with your fork now. I'll let you know how things look first thing Monday morning. Thanks for doing all of this! |
@oliveigah Everything is looking great in my metrics. It looks like your fix worked! Should I close this PR and you can submit your own? |
Awesome @pcorey, thanks for the feedback! Let's wait for a new nimble pool's relelase and then I can subimit the PR with the |
I've opened the new PR with the implementation of the new callback and added tests. I think we can cotinue this conversation over there. #268 |
Fixes #257