-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected long time until server session with subscriptions times out despite keepalive #1577
Comments
I just made a similar test with current develop HEAD and coap-server & -client:
It takes over 5 minutes in this case until the session is closed but there is no keepalive failure. |
The (unmodified) coap-server sets the keepalive timeout to 30 secs, not the 180 secs you were referring to. A keep-alive is only transmitted if there is no other traffic active on the session for the keepalive timeout. In this case, a 'time' response is actively transmitting, but not getting any response and hence the retransmission n times. Here, the libcoap logic says after retransmission 4 plus 30secs - time to send a keep-alive - ah - transmission active but not been a response for 30 seconds and then closes the session as you are seeing. This is instead of trying to send a keep-alive (which is not allowed to do as active CON transmissions are 1 which is the value of NSTART). So, I think the code is doing the correct thing here for this scenario. The default time to when the subscription is deleted is (2+4+8+16)*1.5 + 30 = 75 seconds - in your case this was 74.286 (there needs to be an element of randomness here) The default time for an idle server session to get deleted is 300 secs (configurable via [Upping the value of NSTART is generally not recommended as there are lots of potential side effects.] I am trying to emulate your initial scenario to see what is happening. |
I think this is your issue. A server side session will have a reference count of 0 when created. An Observe request (that succeeds) will bump the reference count. When the subscription is deleted, the reference count is decremented. The fact that when the server session has a reference count of 0 means it will eventually time out and be deleted. However if there is any new traffic that comes in this session will get used for handling the request/response, but starting from a reference count of 0. So, calling This is different however for a client side session - here, the client side session starts off with a reference count of 1 and if the application calls However, server side, you can do |
Thank you for the detailed explanation! So - in theory - a observed resource could decide to publish a new value right before the keepalive timeout and postpone the timeout for another ~75 seconds and if I have I will try your suggestion and change the session type. |
Thinking about things, it is unsafe to remove a session in an event handler as it is possible / likely that the code that called the event handler will continue to reference the session. So don't try my suggestion. The libcoap code that calls the event handler with COAP_EVENT_KEEPALIVE_FAILURE goes on to delete all the observe subscriptions and then forces the removal of the session (done in
For a particular session, if COAP_EVENT_KEEPALIVE_FAILURE event is triggered, then all should be cleaned up and things not continue as you are seeing. Certainly the |
Please try #1578 with your original scenario and let us know how it goes. |
I was just able to do a quick test and it looks better now with less than 5 minutes until the COAP_EVENT_SERVER_SESSION_DEL. I will take a closer look tomorrow. Thank you!
|
Thanks for the update. I note that there are 2 different session if the logs you recently added. |
After some more tries it looks like the COAP_EVENT_KEEPALIVE_FAILURE event is now emitted around 1.5 to 2 minutes after killing the client:
In my event handler, I now call
and finally:
I will check if I am doing something wrong here. |
Definitely something I have missed here, but not sure what
This is done in Looking at coap_session_server_keepalive_failed(()
all the resources are iterated through, deleting any observer subscriptions for the session for that resource - but you are only seeing some of them deleted - as if there are multiple subscriptions active for the same resource, and this logic is only deleting the first one per resource. It may help to have visibility of your client code you are using for testing. |
Ok, I will remove it again.
This is indeed what is happening. The client in this case is a development tool that observes all available resources in one window and the relevant resources for a chosen topic in another and (due to limited development time) every window observes the resources itself. I will try it again with only single observation per resource tomorrow. Thank you! |
Multiple observations per resource needs to be tested out (as well as the single observer case). I have just pushed a minor change which should handle multiple observations per resource for further testing. |
With the new commit, I see a lot more subscription removals after the keepalive failure but the session still stays alive:
It looks like the last subscriptions are resources that are handled by the unknown resource handler. If I am not mistaken, the handler creates resources dynamically for these requests though. |
Interesting thought. However, when these resources are created then added, they are added into the list in the same way that the initial resources are created by using |
I have just pushed a change that reports on the resource URI for the subscription being deleted to try to help establish why they are not all getting deleted on keep-alive failure for the session. |
As the initial request already activates observation, is this handled in a special way? I found this note: "It is not possible to observe the unknown_resource" Line 3471 in edd5f37
but in the following it looks like coap_add_observer() is called nonetheless. Maybe the subscription is added to the unknown_resource of the context? |
The remaining subscriptions are all for paths that were initially handled by the unknown resource handler. (Edit: better wording) |
Good to establish that. I see that Should we allow observation of the unknown resource - we do now track the subscription by However, if the unknown resource handler spawns off a new resource matching the I am inclined to disable |
See #1583 for disabling |
In my case it would be good if the newly created resource takes over the subscriptions from the unknown resource. On my device, I use one CoAP server which handles all incoming traffic and it forwards certain URIs to other processes on the device which handle requests that require elevated privileges. To avoid close coupling of these parts (especially during boot), the CoAP server/reverse proxy queries the If you think this is doable I can try to come up with an implementation. If an observe can only be triggered by coap_resource_notify_observers(), maybe disabling this function for the unknown resource would be good. |
OK. So, if I understand what you are saying, an Observe request come in, You would like that subscription currently associated with the unknown resource to be moved across to the newly created resource. Correct? I guess my question is - who should be generating the unsolicited observe response - the reverse proxy or the back end server? I would be expecting the Observe Establish to be passed through the reverse proxy to the backend server without the reverse-proxy having to know about the observe.
Need to double check the code that they only are triggered by |
Yes. This would also solve this last issue, I guess?
At the moment, the reverse proxy uses async to "delay" the response to the client and then sends an own (mostly copied) request to the backend. If the response is received, it triggers the async and returns the response (code and payload) to the client. If the initial request has |
Yes. Not sure yet how to trap the adding of a new resource in application's unknown handler to move the subscription across. It may be that the application has to add in the subscription request.
Have you looked at See coap_proxy_forward_request(3), It seems the man page is broken - I will get that fixed. Then there would be no need to call |
Maybe some sort of
Ah, I saw it but I thought this is maybe for another use case as I did not understand what I should put into Thank you for your help! |
#1586 should now clear down all observations on keepalive timeout. |
If you have internal servers A, B, C and D, then you will need 4 different So for server A, it could be reverse_proxyA
etc. I have updated the documentation in #1587 to make usage clearer as well as fixed an issue with COAP_PROXY_REVERSE_STRIP. |
The session is now completely shutdown after a COAP_EVENT_KEEPALIVE_FAILURE, thank you! I read the new proxy documentation and it is much clearer to me now, thanks. I received some additional tasks hence it might take a few days until I can replace my async logic with these proxy functions. |
Hello,
I am currently experimenting with the new server-side keepalive feature and it looks to me like the session takes too long to time out. In my server context, I set a keepalive interval of 180 seconds as I just want to get notified if a (mobile) client vanishes and release resources. My client observes a lot of resources and, in this experiment, I kill it hard after starting a block transfer. Below, the last block before the kill was received (the RTC was not synchronized during this run):
Afterwards, I see occasional retransmits:
Single removals of a subscription:
And after over ten minutes, COAP_EVENT_KEEPALIVE_FAILURE is emitted:
and lots of other subscriptions are removed:
In my
COAP_EVENT_KEEPALIVE_FAILURE
handler, I callcoap_session_release()
.Interestingly, I still see debug output with the session's port 49738 afterwards:
And after I stop the server application with CTRL+C there is more traffic:
The long delay is not a problem for me, I just wanted to ask if this is to be expected for some reason in this case? Regarding the traffic after the event, I guess that I should check if I have the session referenced somewhere else or is there a way to force close a session after a keepalive failure so my
COAP_EVENT_*_CLOSED
handlers release all my corresponding structures.The server is using the fairly recent commit ba09d0f.
Thank you!
The text was updated successfully, but these errors were encountered: