Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1019 Fix redis transport socket timeout #1113

Merged
merged 4 commits into from
Oct 23, 2019
Merged

Conversation

kmctown
Copy link
Contributor

@kmctown kmctown commented Oct 22, 2019

Patch from @jschwartzentruber #1019 (comment)

Fixes #1019

Wanted to get the ball rolling.

I've got my branch running in a staging environment and can report that it is working well for me. Let me know if there's anything else I can do to help. cc: @thedrow

@auvipy
Copy link
Member

auvipy commented Oct 22, 2019

can you check the failing test?

@auvipy
Copy link
Member

auvipy commented Oct 22, 2019

 not_found.append(kall)

    if not_found:

        raise AssertionError(

            '%r not all found in call list' % (tuple(not_found),)
      ) from cause

E AssertionError: ('', (10, ), {}) not all found in call list

/opt/python/3.6.7/lib/python3.6/unittest/mock.py:860: AssertionError

@kmctown
Copy link
Contributor Author

kmctown commented Oct 22, 2019

 not_found.append(kall)

    if not_found:

        raise AssertionError(

            '%r not all found in call list' % (tuple(not_found),)
      ) from cause

E AssertionError: ('', (10, ), {}) not all found in call list

/opt/python/3.6.7/lib/python3.6/unittest/mock.py:860: AssertionError

Thanks. Looking into it...

@codecov
Copy link

codecov bot commented Oct 22, 2019

Codecov Report

Merging #1113 into master will decrease coverage by 0.06%.
The diff coverage is 28.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1113      +/-   ##
==========================================
- Coverage    88.5%   88.44%   -0.07%     
==========================================
  Files          64       64              
  Lines        6716     6723       +7     
  Branches      811      814       +3     
==========================================
+ Hits         5944     5946       +2     
- Misses        677      682       +5     
  Partials       95       95
Impacted Files Coverage Δ
kombu/transport/redis.py 88.33% <28.57%> (-0.59%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b51d1d6...daa8380. Read the comment docs.

@auvipy
Copy link
Member

auvipy commented Oct 22, 2019

raise AssertionError(

            '%s call not found' % expected_string
      ) from cause

E AssertionError: rpush('george', '{"body": "BODY2", "headers": {"redelivered": true, "x-funny": 1}}') call not found

on python 3.5, I have restarted that build, let's check that again

@kmctown
Copy link
Contributor Author

kmctown commented Oct 22, 2019

Great, thanks. That one doesn't look related. Stepping through the code, I don't see where it overlaps with these changes.

@kmctown
Copy link
Contributor Author

kmctown commented Oct 22, 2019

We do need to bump the minimum py-redis version though.

Good call. Bumped to 3.3.11.

@auvipy auvipy merged commit 2d92827 into celery:master Oct 23, 2019
@ppbdrinker
Copy link

Can we have a release for this beauty please?

@kmctown
Copy link
Contributor Author

kmctown commented Oct 28, 2019

Have an update: it seems to have mostly fixed it but I am still getting some occasional broken pipes, so we are not out of the woods yet. It is greatly reduced, though, at least on my workload. I think the 30 timeout might be too high, and we might want 20-25 per @andymccurdy recommendation over on redis-py. Seems like the best solution is to make the configurable broker_transport_options update to control health_check_interval and reduce the default to 20 or 25.

@andymccurdy
Copy link
Contributor

The health_check_interval should definitely be configurable. It's going to vary based on the running environment. If the Redis server or other network appliances between the client and server are configured with timeouts, the user will want to set this value to something less than the min of all of those.

@auvipy
Copy link
Member

auvipy commented Oct 28, 2019

Can we have a release for this beauty please?

new kombu and celery RC release is due in next 2/3 days

@ppbdrinker
Copy link

10 days ago
new kombu and celery RC release is due in next 2/3 days

Any news?

@Ashish-Bansal
Copy link
Contributor

Ashish-Bansal commented Nov 11, 2019

I tried using the latest package it didn't seem to reduce error rate for me in the production. I tried to investigate it. With the health check enabled, it should be sending the PING call to the server to get PONG reply.

redis-cli monitor | ag PING

^but I don't get expected output from it after every 30 seconds. So, for me health checks ain't really working.

I checked this PR and found that we are calling check_health on the client.

https://github.com/andymccurdy/redis-py/blob/29f4aa43aec50ae0e28df4d3fadfcb17cebe2a2f/redis/client.py#L3193

In redis-py's code, conn.health_check_interval is 0 by default. So, it won't actually send the ping.

So, I think we also need to set health_check_interval parameter before calling that function. Anything less than 30(our hardcoded value) should work.

Also I cannot pass health_check_interval into broker_transport_options because of hardcoded connection parameters in the kombu

connparams = {

I believe kombu should allow passing params to the underlying transport blindly but may be there were reasons why it wasn't done in that way.

Anyway, I guess for the time being, I'll try to do monkey-patching.

@auvipy
Copy link
Member

auvipy commented Nov 11, 2019

did you try celery==4.4.0rc4?

@Ashish-Bansal
Copy link
Contributor

Ashish-Bansal commented Nov 11, 2019

@auvipy So you mean we are passing health_check_interval in celery==4.4.0rc4 ?

I think I did try it but still let me check once again.

Edit:

Quick search in celery doesn't show any results : https://github.com/celery/celery/search?q=health_check_interval but anyway I'll try to use celery 4.4.0rc4.

@Ashish-Bansal
Copy link
Contributor

Alright I just tried using celery==4.4.0rc4, it's not working.

By setting client.connection.health_check_interval = 5 in kombu, I'm able to see ping messages.

@auvipy
Copy link
Member

auvipy commented Nov 11, 2019

are you up for any fix as PR?

@Ashish-Bansal
Copy link
Contributor

are you up for any fix as PR?

Hm, sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Error 110 while writing to socket. Connection timed out." With kombu 4.4.0/4.5.0 and redis 3.2.0/3.2.1
5 participants