You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ Analyzed from Firestorm viewer logs (https://jira.firestormviewer.org/browse/FIRE-35085) but I'm duplicating it in viewer as something similar is very likely present in the long poll code. ]
This is likely a bug in the SL base code but I'm reporting via FS Jira as there may be special handling in Firestorm. There will be a GHI on secondlife/viewer as well.
The following came in from logs from a resident so I'm editing to limit information leakage. But the gist should be clear. This stems from a mix of the EventQueueGet work underway and the eternal search for better TPs/RCs. Resi reported a TP failure, I decide to dig. Here's the discovery.
Working backwards, resi had a teleport failure with disconnect from the grid. Examining Firestorm.log indicated a missing TeleportFinish message was the likely cause.
And that will end a session. So, why was the EventQueue filled? Was it not being read? Hitting a different set of logs for the session, I found that the last event-get long-poll to the simulator was around Jan 22 19:46, almost five hours earlier.
Bouncing back to the viewer,
2025-01-22T19:46:54Z WARNING #CoreHttp# llcorehttp/_httppolicy.cpp(404) LLCore::HttpPolicy::stageAfterCompletion : HTTP request <handle1> failed after 0 retries. Reason: SSL connect error (Easy_35)
2025-01-22T19:46:54Z WARNING #CoreHttp# llcorehttp/_httppolicy.cpp(404) LLCore::HttpPolicy::stageAfterCompletion : HTTP request <handle2> failed after 0 retries. Reason: SSL connect error (Easy_35)
2025-01-22T19:46:54Z WARNING #CoreHttp# llcorehttp/_httppolicy.cpp(404) LLCore::HttpPolicy::stageAfterCompletion : HTTP request <handle3> failed after 0 retries. Reason: SSL connect error (Easy_35)
2025-01-22T19:46:54Z WARNING #CoreHttp# llcorehttp/_httppolicy.cpp(404) LLCore::HttpPolicy::stageAfterCompletion : HTTP request <handle4> failed after 0 retries. Reason: SSL connect error (Easy_35)
2025-01-22T19:46:54Z WARNING #CoreHttp# llcorehttp/_httppolicy.cpp(404) LLCore::HttpPolicy::stageAfterCompletion : HTTP request <handle5> failed after 0 retries. Reason: SSL connect error (Easy_35)
...
2025-01-22T19:46:54Z WARNING #CoreHTTP# llmessage/llcorehttputil.cpp(275) LLCoreHttpUtil::HttpCoroHandler::onCompleted : Possible failure [Easy_35] cannot POST url 'https://simhost-0f7e4be4373667b6e.agni.secondlife.io:12043/cap/<cap1>' because SSL connect error
2025-01-22T19:46:54Z WARNING #CoreHTTP# llmessage/llcorehttputil.cpp(275) LLCoreHttpUtil::HttpCoroHandler::onCompleted : Possible failure [Easy_35] cannot POST url 'https://simhost-0fb0a28082d21b6f3.agni.secondlife.io:12043/cap/<cap2>' because SSL connect error
2025-01-22T19:46:54Z WARNING #CoreHTTP# llmessage/llcorehttputil.cpp(275) LLCoreHttpUtil::HttpCoroHandler::onCompleted : Possible failure [Easy_35] cannot POST url 'https://simhost-0bea8ada1571717de.agni.secondlife.io:12043/cap/<cap3>' because SSL connect error
2025-01-22T19:46:54Z WARNING #CoreHTTP# llmessage/llcorehttputil.cpp(275) LLCoreHttpUtil::HttpCoroHandler::onCompleted : Possible failure [Easy_35] cannot POST url 'https://simhost-006b1b658c5a1fbc9.agni.secondlife.io:12043/cap/<cap4>' because SSL connect error
2025-01-22T19:46:54Z WARNING #CoreHTTP# llmessage/llcorehttputil.cpp(275) LLCoreHttpUtil::HttpCoroHandler::onCompleted : Possible failure [Easy_35] cannot POST url 'https://simhost-0b61d1438e744c2e7.agni.secondlife.io:12043/cap/<cap5>' because SSL connect error
...
2025-01-22T19:46:54Z WARNING #LLEventPollImpl# newview/lleventpoll.cpp(243) LLEventPolling::Details::LLEventPollImpl::eventPollCoro : Critical error from poll request returned from libraries. Canceling coroutine.
2025-01-22T19:46:54Z WARNING #LLEventPollImpl# newview/lleventpoll.cpp(243) LLEventPolling::Details::LLEventPollImpl::eventPollCoro : Critical error from poll request returned from libraries. Canceling coroutine.
2025-01-22T19:46:54Z WARNING #LLEventPollImpl# newview/lleventpoll.cpp(243) LLEventPolling::Details::LLEventPollImpl::eventPollCoro : Critical error from poll request returned from libraries. Canceling coroutine.
2025-01-22T19:46:54Z WARNING #LLEventPollImpl# newview/lleventpoll.cpp(243) LLEventPolling::Details::LLEventPollImpl::eventPollCoro : Critical error from poll request returned from libraries. Canceling coroutine.
2025-01-22T19:46:54Z WARNING #LLEventPollImpl# newview/lleventpoll.cpp(243) LLEventPolling::Details::LLEventPollImpl::eventPollCoro : Critical error from poll request returned from libraries. Canceling coroutine.
So, certain spurious/transitory errors are leading to the abandonment not of the full circuit but just of the HTTP half found in EventQueue. For both main and child regions. The EventQueue continues to function on the simulator side - there's been no takedown of the circuit. But the EventQueue fills, blocks further entries, and only critical operations, such as TeleportFinish, lead to a synchronization event which, unfortunately, ends the session. Sometimes hours after the causal failure.
Reproduction steps
Run a viewer
Wait a long time
The text was updated successfully, but these errors were encountered:
Dan, I filed this. It's a viewer issue in the case of FS. I haven't verified that it's identical in our viewer but suspect it probably is. Not a server issue. Bad client-side behavior.
To secure some related server-side historical metrics, attaching the following which you can get in the CloudWatch dashboards.
One side-effect of the viewer issue is that events will queue up and get dropped on the simulator side. That happens for many reason and this chart is the result (up to 8E6 events lost per day).
Environment
Firestorm 7.1.11.76496 Release
Description
[ Analyzed from Firestorm viewer logs (https://jira.firestormviewer.org/browse/FIRE-35085) but I'm duplicating it in viewer as something similar is very likely present in the long poll code. ]
This is likely a bug in the SL base code but I'm reporting via FS Jira as there may be special handling in Firestorm. There will be a GHI on secondlife/viewer as well.
The following came in from logs from a resident so I'm editing to limit information leakage. But the gist should be clear. This stems from a mix of the EventQueueGet work underway and the eternal search for better TPs/RCs. Resi reported a TP failure, I decide to dig. Here's the discovery.
Working backwards, resi had a teleport failure with disconnect from the grid. Examining Firestorm.log indicated a missing TeleportFinish message was the likely cause.
Moving to server, I found:
And that will end a session. So, why was the EventQueue filled? Was it not being read? Hitting a different set of logs for the session, I found that the last event-get long-poll to the simulator was around Jan 22 19:46, almost five hours earlier.
Bouncing back to the viewer,
So, certain spurious/transitory errors are leading to the abandonment not of the full circuit but just of the HTTP half found in EventQueue. For both main and child regions. The EventQueue continues to function on the simulator side - there's been no takedown of the circuit. But the EventQueue fills, blocks further entries, and only critical operations, such as TeleportFinish, lead to a synchronization event which, unfortunately, ends the session. Sometimes hours after the causal failure.
Reproduction steps
The text was updated successfully, but these errors were encountered: