Perfectioning the Rate Limiter in Multi-Process environment #2

iann838 · 2020-07-19T22:38:23Z

Issue presented by maintainer

Currently the rate limiter of cassiopeia and django-cassiopeia has small problems dealing with rate limiting in multiprocess environments. The issues caused is mainly caused by:

When cass is not imported in root level, therefore creating new instances for each execution (Fixed by django-cassiopeia in 1.0.0)
In a scenario where multi-process is involved, one of these two is happening:
1. When a function is called in separate threads, the rate limiter is not sync-ed because there are multiple instances of the rate limiter. Example of 2 threads/processes:
  1. Current rate limit header at riot side is x:x, 99:100.
  2. process A is updated the limiter to x:x, 100:100.
  3. process B could not get the copy from A therefore it is still x:x, 99:100.
  4. next call is received by B, ratelimited is evaluated to False, then B executes it, and 429 is returned.
2. If no new instance of the rate limiter is created (only one instance exists), then it is not totally atomic:
  1. Current rate limit header at riot side is x:x, 99:100.
  2. process A and B both are incrementing the limiter context by 1.
  3. process A updated the limiter context to x:x, 100:100.
  4. process B is lost in the atomic context because A didn't update at time.
  5. process B limiter context remains x:x, 99:100.
  6. next call is received by B, ratelimited is evaluated to False, then B executes it, and 429 is returned.

Temporary fix (or rather state)

Currently when a 429 is returned, the next calls will stop automatically because the root level cass defined by django-cassiopeia puts a lock (this is still unclear, but visually this is happening). Enough to prevent a black list.

Severity: Low to Middle
Priority: High

Definitive Fix is scheduled together with the next Django big release 3.1

Initial thoughts: I have in mind to actually replace this rate limiter entirely, since the actual version uses a python object that is not thread accessible. It can be replaced ... by another cache (sorry if cache is already overwhelming your head), a cache that can do atomic ++ and --, and set an error pass lower to your limit (e.g. rate limits stops at 98/100) due to latency (as small as 10ms). This might still fail if you have multiple servers that distances each other with high latency, and the only solution is to keep lower the error pass.

The text was updated successfully, but these errors were encountered:

iann838 · 2020-09-30T16:52:59Z

Closing issue

Due to the nature of Cassiopeia code, it is very hard to implement such rate limiter without impacting performance, since a minimun 429 fault tolerance is allowed, I am closing this issue

If you want a better rate limiter that is cross process safe, consider using Pyot instead.

iann838 added the performance Problem related to performance label Jul 19, 2020

iann838 pinned this issue Jul 19, 2020

iann838 closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfectioning the Rate Limiter in Multi-Process environment #2

Perfectioning the Rate Limiter in Multi-Process environment #2

iann838 commented Jul 19, 2020

iann838 commented Sep 30, 2020

Perfectioning the Rate Limiter in Multi-Process environment #2

Perfectioning the Rate Limiter in Multi-Process environment #2

Comments

iann838 commented Jul 19, 2020

Issue presented by maintainer

Temporary fix (or rather state)

Definitive Fix is scheduled together with the next Django big release 3.1

iann838 commented Sep 30, 2020

Closing issue