Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL connection exceptions #2038

Open
adrianbj opened this issue Feb 7, 2025 · 1 comment
Open

MySQL connection exceptions #2038

adrianbj opened this issue Feb 7, 2025 · 1 comment

Comments

@adrianbj
Copy link

adrianbj commented Feb 7, 2025

Hi @ryancramerdesign - I can't find it at the moment, but I know some time ago there was some discussion about PW retrying connections automatically which had you add this:

https://github.com/processwire/processwire/blob/f22739a54c8fd8d1ce3e41a9b95e37129f6376b1/wire/core/DatabaseQuery.php#L732-L743

The problem is that I don't think it's inclusive of enough scenarios. I recently had a failure with a Digital Ocean managed database connection - even with a standby node - apparently "There was a failover event that involved the underlying node switching to the standby, which may have been triggered by either a degraded primary node or a missed heartbeat signal from the primary."

The error was "PDOException SQLSTATE[08S01]: Communication link failure: 1053 Server shutdown in progress" on $query->execute();

I am not sure exactly what would be involved in handling this, but would it be as simple as removing these checks?

$retry = $code === 2006 || stripos($msg, 'MySQL server has gone away') !== false;
if($retry && $numTries < $options['maxTries']) {

Also is there really a good reason for maxTries to be set, let alone to something as low as 3? I just worry that if a database connection isn't responding then it could be a substantial period of time before it's available again. In my case it was running background tasks and most of them failed because they were in a large loop.

Would love if you think this could be made more robust to handle all exception types and not just the classic 2006 gone away.

@adrianbj
Copy link
Author

@ryancramerdesign - a little more info from Digital Ocean.

They promoted my standby node to the new master and spun up a new standby node. They didn't provide an exact time that this took, but apparently it should not have taken more than 8 minutes.

But, they also noted "Note that our platform would trigger a node replacement when there's 180 seconds period of unavailability, so you should account for at least that amount of time when you're setting your retry and timeout values."

So I am not sure exactly what the potential downtime could be, but I do that think PW's retry needs to be more robust. Either that, or I need to add a custom trycatch that doesn't give up after 3 tries and continues the foreach loop once a connection is available again.

Would love your thoughts on whether you'd be willing to improve this is the core or if you think it's something I need to handle.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant