-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Server hangs on changelevel #138
Comments
The database usually waits until all queries are finished, which is usually something you'd want. Does the problem you mention only occur if queries are launched after the change level has been run? Or is there a particular reason why the busy queries are split into two parts? |
The code was a quick example by warden potato, its a dirty way of replicating it but at this point im confident a query is being stuck in a "pending" state, basically making the server hang, waiting forever. Since adding the work around code my server has stopped crashing, but upon checking the logs every instance says the queuesize is 0 - so unless Ive just been super lucky and not hit the crash scenario since adding it (unlikely as it was like clockwork) or the query is stuck in some sort of state where its pending but also isnt in the queue. |
I've noticed these methods in the mysqloo table: Would these be beneficial to add to my debug prints prior to my work-around so you can get an idea of perhaps where it is stuck? |
I have added the following code to log some data: local data = {}
data["time"] = os.date("%c")
data["timestamp"] = os.time()
data["queueSize"] = Prometheus.DB.Module.DB:queueSize()
data["preAbort"] = {
allocationCount = mysqloo.allocationCount(),
deallocationCount = mysqloo.deallocationCount(),
objectCount = mysqloo.objectCount(),
referenceCreatedCount = mysqloo.referenceCreatedCount(),
referenceFreedCount = mysqloo.referenceFreedCount(),
}
Prometheus.DB.Module.DB:abortAllQueries()
data["postAbort"] = {
allocationCount = mysqloo.allocationCount(),
deallocationCount = mysqloo.deallocationCount(),
objectCount = mysqloo.objectCount(),
referenceCreatedCount = mysqloo.referenceCreatedCount(),
referenceFreedCount = mysqloo.referenceFreedCount(),
}
Prometheus.DB.Module.DB:disconnect(false)
data["postDisconnect"] = {
allocationCount = mysqloo.allocationCount(),
deallocationCount = mysqloo.deallocationCount(),
objectCount = mysqloo.objectCount(),
referenceCreatedCount = mysqloo.referenceCreatedCount(),
referenceFreedCount = mysqloo.referenceFreedCount(),
}
local jsonString = util.TableToJSON(data, true)
local fileName = "prometheus_mysqloo_log.txt"
if file.Exists(fileName, "DATA") then
file.Append(fileName, "\n"..jsonString)
else
file.Write(fileName, jsonString)
end I'll check back in a few days if there's anything that sticks out. EDIT: Added logging for postDisconnect (just in case this is what actually fixed it, not sure if it was the abort or disconnect atm) |
@FredyH |
Yeah so it looks like there are still some queries running. |
Great news! I saw that commit and figured it probably be related. Two quick questions.. will it call the fail callback and is there a default timeout? It isnt my code thats failing and would be great if I could avoid editing |
Hi, I released a beta version with the new connection timeouts. |
Yes, this is exactly what i've been experiencing for some years now, server hangs on timeout after attempting to change maps due to a no TCP connection. Although the lost connection for me has always started straight from the previous map change, all the way through to the upcoming map change (which it never gets to do due to the infinite loop you speak of I think). I will try the beta version now with a reasonable timeout, even though that means data loss :/ |
Please let me know if it worked for you! |
My server is suffering from a weird issue where it just hangs when changelevel occurs, no actual crash, just unresponsive. Some research has led me to this module being the cause..
This is related to: #118
Big shoutout to Warden Potato on the gmod discord for making me aware and providing this code to replicate:
NOTE: The db should be remote, and the query should be somewhat slow, this ensures issue replication.
A simple workaround is:
However, this could result in data loss, plus, you'd have to specifically add an instance for each DB, unless there's a way to grab all DB objects.
The text was updated successfully, but these errors were encountered: