NACKs are being sent by a receiver when the sender goes idle #69
-
I have an application that sends a series of messages, goes idle for a period of time, sends another series of messages, goes idle for a period of time, and so on in that pattern. The receiving application receives all of the expected messages, but it starts to send NACKs if the idle period is more than about 30 seconds. I turned on trace and debug messages, and I can see in those messages that the receiver received all of the expected application messages, but still starts sending NACKs during the idle period. The sender seems to ignore the NACKs, possibly because it no longer has the data indicated in the NACKs; I’m not sure about that. I’m including the log files from the sending and receiving applications to see if you can determine why the NACKs are occurring. In this test run, the sender sends 50 messages and is then idle for 60 seconds, and repeats that pattern 3 times (at the end of the third series, the application ends instead of going idle). In my real application, the number of messages in a series is variable and the period of idleness is also variable. Perhaps there is some configurable value in the API that I should be setting that I don’t know about, to avoid this behavior with NACKs during idleness. Also, there is another behavior I have noticed that I would like to ask about. When my sender application is idle (from the application’s perspective), there are CMD(CC) messages being sent (by the thread(s) created by the NORM API), and the receiver responds with ACK(CC) messages. Even though these messages are being sent, the receiver gets “Remote Sender Inactive” events. The “Inactive” event seems to occur just before the CMD(CC) message. After the CMD(CC) message, the receiver gets a “Remote Sender Active” event. I’m just making guesses here, but it seems as if those “Inactive” events shouldn’t occur if the CMD(CC) messages are being received. But it seems like there is a timing issue. I don’t have logs showing the Inactive/Active events in relation to the CMD(CC) messages, but I could reproduce that situation if it would help you check to see if there is an issue in the API. I appreciate any insight you can provide regarding these two scenarios. I haven't attached files here before; hope I've done it correctly. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
I was able to look at your log files posted. I think the two things you mention here may be related. By default, a receiver will time out an "inactive" sender based on measured RTT, etc. The sender when it has no data to send does reduce the rate of it NORM_CMD(CC) probe (used for measuring RTT). So, I think there could be a sort of race condition here where the receiver declares a sender "inactive" in a shorter time frame than those NORM_CMD(CC) probes are sent. It's somewhat benign (but annoying) by itself. Another default behavior here is the receiver will drop buffering state on a sender that has gone inactive. The idea here is that in a group with multiple senders, the receiver is only allocating buffer space for senders it is actually needed for. I think what's happening possibly is when the sender become "active" again, the receiver is sending a NACK to "re-sync" to the sender's transmission in case there was an outage that had led to the sender's apparent inactivity. Do you happen to be using the NORM_SYNC_ALL or NORM_SYNC_STREAM sync policy? I mention these as default behaviors since there may some API usage that can adjust the behavior. I will have to look into it since I can't recall off the top of my head and outline of how your sender/receiver is using the API can be helpful. I think you are using NORM_OBJECT_STREAM delivery, right? |
Beta Was this translation helpful? Give feedback.
-
I have checked in a change that I think will resolve your extraneous NACKing issue. I updated the NormSenderNode::RepairCheck() to use a new "BLIND_CHECK" check level for use on sender inactivity timeout or reactivation to inspect current repair state to determine the right NACK approach. For your case, I think that will yield the desired behavior. BTW, the reason the sender was not responding to the NACKs was because the receiver was NACKing for a portion of the stream the sender had not yet sent. Please let me know if this resolves that issue. The "sender inactive / idle" notification at the receiver is just a cue to your application that the remote sender isn't actively sending data. In a multicast group with multiple senders possibly sending data, this lets the application free up memory usage from senders that aren't actively sending content (i.e., via the NormNodeFreeBuffers() call that can be made at that point) |
Beta Was this translation helpful? Give feedback.
I have checked in a change that I think will resolve your extraneous NACKing issue. I updated the NormSenderNode::RepairCheck() to use a new "BLIND_CHECK" check level for use on sender inactivity timeout or reactivation to inspect current repair state to determine the right NACK approach. For your case, I think that will yield the desired behavior.
BTW, the reason the sender was not responding to the NACKs was because the receiver was NACKing for a portion of the stream the sender had not yet sent.
Please let me know if this resolves that issue. The "sender inactive / idle" notification at the receiver is just a cue to your application that the remote sender isn't actively sending data. I…