Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Optimization] Use read instead of readfull #1026

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

grcooper
Copy link
Collaborator

@grcooper grcooper commented Feb 7, 2025

The latest versions of Ruby us the underlying read(2) non-blocking socket command under the hood. We no longer need to use read_full manually and interrupt for simple reads from the socket as Ruby will do that for us.

This might be a slight performance negative to people using threaded models on older versions of ruby.

This was mostly used in the get command and you can see the biggest performance difference there:

Using read(count) this PR:
                                            user     system      total        real
set:plain:dalli                         0.142571   0.102867   0.245438 (  0.455319)
setq:plain:dalli                        0.065390   0.035589   0.100979 (  0.101959)
set:ruby:dalli                          0.149596   0.099115   0.248711 (  0.440472)
get:plain:dalli                         0.148761   0.103029   0.251790 (  0.450469)
get:ruby:dalli                          0.146692   0.102902   0.249594 (  0.448702)
multiget:ruby:dalli                     0.125370   0.056108   0.181478 (  0.192209)
missing:ruby:dalli                      0.129112   0.101122   0.230234 (  0.424371)
mixed:ruby:dalli                        0.300936   0.199855   0.500791 (  0.884262)
mixedq:ruby:dalli                       0.429666   0.295973   0.725639 (  1.203018)
incr:ruby:dalli                         0.048352   0.032395   0.080747 (  0.146401)

using read_full (main)
                                            user     system      total        real
set:plain:dalli                         0.179813   0.113714   0.293527 (  0.494273)
setq:plain:dalli                        0.067975   0.036163   0.104138 (  0.105059)
set:ruby:dalli                          0.186732   0.110262   0.296994 (  0.482279)
get:plain:dalli                         0.187633   0.116741   0.304374 (  0.483481)
get:ruby:dalli                          0.189053   0.115436   0.304489 (  0.485613)
multiget:ruby:dalli                     0.125182   0.056756   0.181938 (  0.192077)
missing:ruby:dalli                      0.168511   0.114001   0.282512 (  0.459913)
mixed:ruby:dalli                        0.375702   0.226242   0.601944 (  0.976100)
mixedq:ruby:dalli                       0.521129   0.335840   0.856969 (  1.321618)
incr:ruby:dalli                         0.062385   0.036716   0.099101 (  0.158343)

I have left read_available as it is still needed when we have multiple servers we are selecting from in a get_multi using IO.select.

@petergoldstein
Copy link
Owner

I like this change, and I don't think a slight performance impact on older rubies is a material concern.

@grcooper grcooper force-pushed the use-regular-io-gets branch from 5dcf1b4 to 4ebd9f8 Compare February 7, 2025 15:35
@petergoldstein
Copy link
Owner

@grcooper Do we know why we're seeing repeated test failures?

@grcooper
Copy link
Collaborator Author

grcooper commented Feb 11, 2025

@petergoldstein

On our fork in Shopify/Dalli we fixed this by removing the usages of memcached_mock. I'm not exactly sure the reason behind it, but I think the process is hanging in some cases, whereas when we used non-blocking io previously it would just return with a wait_readable.

I have another branch where I am thinking of replacing memcached_mock with either Toxiproxy (https://github.com/Shopify/toxiproxy-ruby) to handle timeouts etc. And also some minitest mocking.

We also noticed a (kind of) bug that I think is leading to this behaviour not being as apparent, as everything is returned as a Dalli::RingError instead of the proper timeout error, network error, etc, becuase we do retries even when we have a hard down.

See: Shopify#31

So I think in order to merge this I want 3 other prs:

  1. Add Toxiproxy as a replacement for any network errors (down errors, timeouts)
  2. Replace memcached_mock's final use cases which are mostly just returning data from a socket (and add mocha mocking to do so)

Those two should help this PR along.

However, then we should also add the PR above (31 from our fork) to have more appropriate errors being raised and not just RingError every time. But this is possibly a breaking change depending on what people are rescuing.
3. Merge Shopify#31 into main

@grcooper
Copy link
Collaborator Author

I started the rollout of some of the above fixes here: #1030

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants