Skip to content
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.

Perfectioning the Rate Limiter in Multi-Process environment #2

Closed
iann838 opened this issue Jul 19, 2020 · 1 comment
Closed

Perfectioning the Rate Limiter in Multi-Process environment #2

iann838 opened this issue Jul 19, 2020 · 1 comment
Labels
performance Problem related to performance

Comments

@iann838
Copy link
Owner

iann838 commented Jul 19, 2020

Issue presented by maintainer

Currently the rate limiter of cassiopeia and django-cassiopeia has small problems dealing with rate limiting in multiprocess environments. The issues caused is mainly caused by:

  • When cass is not imported in root level, therefore creating new instances for each execution (Fixed by django-cassiopeia in 1.0.0)
  • In a scenario where multi-process is involved, one of these two is happening:
    1. When a function is called in separate threads, the rate limiter is not sync-ed because there are multiple instances of the rate limiter. Example of 2 threads/processes:
      1. Current rate limit header at riot side is x:x, 99:100.
      2. process A is updated the limiter to x:x, 100:100.
      3. process B could not get the copy from A therefore it is still x:x, 99:100.
      4. next call is received by B, ratelimited is evaluated to False, then B executes it, and 429 is returned.
    2. If no new instance of the rate limiter is created (only one instance exists), then it is not totally atomic:
      1. Current rate limit header at riot side is x:x, 99:100.
      2. process A and B both are incrementing the limiter context by 1.
      3. process A updated the limiter context to x:x, 100:100.
      4. process B is lost in the atomic context because A didn't update at time.
      5. process B limiter context remains x:x, 99:100.
      6. next call is received by B, ratelimited is evaluated to False, then B executes it, and 429 is returned.

Temporary fix (or rather state)

Currently when a 429 is returned, the next calls will stop automatically because the root level cass defined by django-cassiopeia puts a lock (this is still unclear, but visually this is happening). Enough to prevent a black list.

  • Severity: Low to Middle
  • Priority: High

Definitive Fix is scheduled together with the next Django big release 3.1

Initial thoughts: I have in mind to actually replace this rate limiter entirely, since the actual version uses a python object that is not thread accessible. It can be replaced ... by another cache (sorry if cache is already overwhelming your head), a cache that can do atomic ++ and --, and set an error pass lower to your limit (e.g. rate limits stops at 98/100) due to latency (as small as 10ms). This might still fail if you have multiple servers that distances each other with high latency, and the only solution is to keep lower the error pass.

@iann838 iann838 added the performance Problem related to performance label Jul 19, 2020
@iann838 iann838 pinned this issue Jul 19, 2020
@iann838
Copy link
Owner Author

iann838 commented Sep 30, 2020

Closing issue

Due to the nature of Cassiopeia code, it is very hard to implement such rate limiter without impacting performance, since a minimun 429 fault tolerance is allowed, I am closing this issue

If you want a better rate limiter that is cross process safe, consider using Pyot instead.

@iann838 iann838 closed this as completed Sep 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Problem related to performance
Projects
None yet
Development

No branches or pull requests

1 participant