-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: ensure xDS node ID in included in NACK and connectivity errors #8103
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8103 +/- ##
==========================================
- Coverage 82.24% 82.21% -0.04%
==========================================
Files 387 387
Lines 38967 38947 -20
==========================================
- Hits 32048 32019 -29
- Misses 5599 5606 +7
- Partials 1320 1322 +2 |
// To prevent indefinite blocking during xDS client close, which is caused | ||
// by a blocking backoff channel write, cancel the test context early given | ||
// that the test is complete. | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain this more why we need to cancel explicitly for this test even though we have defer cancel() above? Why don't we need to do it on line 152 and line 133 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the ADS stream implementation works is as follows:
- The
runner
goroutine runs the following logic with backoff. See: https://github.com/grpc/grpc-go/blob/master/xds/internal/xdsclient/transport/ads/ads_stream.go#L249. It uses thebackoff.RunF
method.- Creates a new ADS stream
- Sends the stream on a channel that is being read by the
send
goroutine, to send messages on the stream - Calls the
recv
method to read messages from the stream
- Typically, the
backoff
function returns immediately, but returns a duration for which to backoff. - The
backoff.RunF
method has a select on the context that it was passed and a timer created for the backoff duration. So, when the context is canceled while it is waiting for backoff to expire, the select will ensure that the function exits immediately (instead of waiting for backoff to expire). - So, when the xdschannel is being closed, it triggers a close of the ADS stream implementation, which cancels the context passed to the
backoff.RunF
method. - Therefore things exit cleanly even when in backoff.
But in our test, the backoff function is actually blocked on a write to a channel, which is read by the test. This means that the the backoff.RunF
method is blocked on calling the passed backoff function and not in the select. So, if we don't ensure that the backoff function passed to backoff.RunF
returns early (which can be done by canceling the context that is uses), then this test will run for defaultTestTimeout
duration.
I've changed the test to create a separate context for the backoff function and I only cancel that here. Let me know if this sounds ok to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for explaining.
This PR modifies existing tests for NACK and connectivity errors, to ensure that the node ID is reported to the watcher in these cases.
Addresses bullet points 2 and 3 of #7931.
RELEASE NOTES: none