Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event queue loops infinitely with warnings in Revolt\EventLoop\TimerQueue #104

Open
georgebarbarosie opened this issue Feb 25, 2025 · 5 comments

Comments

@georgebarbarosie
Copy link

This is very difficult to reproduce, and I am still actively trying to create a testcase with TimerQueue used in isolation that reproduces the underlying issue otuside the event loop itself. Until then, maybe this is going to be obvious to somebody more knowledgeable of the TimerQueue implementation.

After hours (sometimes days) of running the event loop, there is a chance that the event loop blocks and throws an apparently endless stream of warnings of the following nature:

Warning: Undefined array key 321 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 108
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 108
Warning: Undefined array key 321 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 109
Warning: Undefined array key 322 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 109
Warning: Undefined array key 322 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 109
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 109
Warning: Undefined array key 322 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 113
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 113
Warning: Undefined array key 322 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 129
Warning: Undefined array key 322 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 130
Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 130
Warning: Undefined array key 306 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 143
Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 144
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 108
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 113
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 68Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 142
Warning: Undefined array key 306 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 143
Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 144
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 108
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 113
Warning: Undefined array key 306 in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 143
Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 144
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 149
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 95Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 133
Warning: Attempt to read property "expiration" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 95
Warning: Attempt to read property "id" on null in /app/vendor/revolt/event-loop/src/EventLoop/Internal/TimerQueue.php on line 133

And then a seemingly endless stream of almost the same, complaining at infinite about array key 306 on line 143, then a bunch of read property on null ('expiration', 'id') on lines 144, 149, 95, 133.

As soon as I, and if, can reliably reproduce this outside of a production environment, I will update. I am currently untangling the code to the best of my abilities.

This looks like a pure code problem, but if it matters:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
PHP 8.2.27 (cli) (built: Feb 25 2025 02:25:39) (NTS)

Using Revolt/event-loop 1.0.7 (was same on 1.0.6) as part of amphp/amp 3.0.2.

@kelunik
Copy link
Member

kelunik commented Feb 25, 2025

Any specific changes that you made to your application that triggered this?

@georgebarbarosie
Copy link
Author

Unfortunately nothing that can be easily pinpointed, we're developing the solution quite organically and we've now noticed this issue after stabilizing the solution. We run 25 processes in production with at least 10 of them getting fairly heavy "traffic" (redis events being listened to, I/O is AmPHP RedisClient, fully-asynchronous pgsql, AmPHP HttpServer and HttpClient for aggregating external APIs). And it's still a chance event that we observe one of them hanging like that. We're testing switching to the pecl/ev driver and if we succeed we won't see this show up in production anymore (as far as I can see TimerQueue is used exclusively in StreamSelectDriver). I will try my best to come up with a repeatable stress test directly for the TimerQueue, in isolation. I think this is my best chance of catching the issue.

@georgebarbarosie
Copy link
Author

I've run as heavy a stress test of TimerQueue as I could think of , it's not failing under any circumstances. Assuming the same TimerCallback doesn't get added twice in the queue, I found no way of triggering the same bug. I haven't tested what happens if the assertion in the insert() method fails (in production we keep zend.assertions disabled). I cannot easily discern a codepath in AbstractDriver + StreamSelectDriver that would make that assertion fail, hence why I haven't tested such a case. In case anybody wants to try, or enhance, the stress-test I've written, I'm providing it here. Feel free to close this issue.

use Revolt\EventLoop\Internal\TimerQueue;
use Revolt\EventLoop\Internal\TimerCallback;

$queue = new TimerQueue();

// Global error handler to catch and display warnings
set_error_handler(function ($errno, $errstr, $errfile, $errline) {
    global $queue;
    echo "ERROR: $errstr in $errfile on line $errline\n";
    echo "Stack trace:\n";
    debug_print_backtrace();
    echo "\n-----\n\n";
    var_dump($queue);
    // Halt execution on first sign of trouble
    die;
});

$extracted = 0;
$removed = 0;
$added = 0;
$lifetimeAdded = 0;
$nextId = 'a';
$callbackFn = fn() => null;
$iterations = 0;
$lastReport = microtime(true);

while (true) {
    $iterations++;
    // in a loop, insert a bunch of callbacks with expirations in the very near future (0-500ms) or past (0-100ms).
    // Out of the bunch, select a few randomly (20% chance) to remove using $queue->remove($callback)
    $toRemove = [];
    for ($i = random_int(2, 100); $i > 0; $i--) {
        $callback = new TimerCallback($nextId++, 0, $callbackFn, microtime(true) + (random_int(-100, 500) / 1000));
        // 20% chance to remove the callback
        if (random_int(1, 5) === 1) {
            $toRemove[] = $callback;
        }
        $queue->insert($callback);
        $added++;
    }

    foreach ($toRemove as $callback) {
        $queue->remove($callback);
        $removed++;
    }

    // // sleep 2ms
    // usleep(2000);

    // pop all expired callbacks off the queue
    
    while ($callback = $queue->extract(microtime(true))) $extracted++;;

    // every 5 seconds, report status
    if (microtime(true) - $lastReport > 5) {
        $lastReport = microtime(true);
        $lifetimeAdded += $added;
        echo "Added $added callbacks (total: $lifetimeAdded)\n";
        echo "Extracted $extracted callbacks\n";
        echo "Removed $removed callbacks\n";
        echo "Current iteration: $iterations, nextID: $nextId\n";
        // echo $queue->dump_contents();
        echo "---------------------------------\n\n";
        $added = 0;
        $removed = 0;
        $extracted = 0;
    }
}

@kelunik
Copy link
Member

kelunik commented Feb 26, 2025

Thank you for the additional details! If you can't run with assertions on production, maybe you can modify the code to run these assertions as normal ifs instead of them being assertions?

@georgebarbarosie
Copy link
Author

We've switched to the pecl/ev driver on production (which naturally eliminated the problem), I will be pushing a code change to make the driver and zend.assertions configurable, and will switch all our staging environments to use streamselect driver with assertions. The amount of traffic run through staging is significantly lower, so I expect it will take a long time until we observe the bug again. I'll keep you guys posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants