Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition (also a regression of the PR 19139) #19221

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Jan 17, 2025

Fix #19172

Please review this PR commit by commit.

Three high level thoughts,

  • There are multiple levels of goroutines. The grandparent ( StartEtcd ) creates multiple child goroutines ( client listeners, peer listeners and metrics listeners). The client listeners creates some grandson goroutines (see the first commit). Each one should only manage their immediate children.
  • For sync.WaitGroup, we should always call wg.Add and wg.Wait in the same goroutine.
  • When stopping etcd, we should close all listeners and context firstly, afterwards close the etcdserver.

cc @serathius @fuweid @ivanvc @jmhbnz @joshuazh-x

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 84.37500% with 5 lines in your changes missing coverage. Please review.

Project coverage is 68.77%. Comparing base (2f37f48) to head (3fa96c8).

Files with missing lines Patch % Lines
server/embed/serve.go 81.25% 1 Missing and 2 partials ⚠️
server/embed/etcd.go 87.50% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
server/embed/etcd.go 75.68% <87.50%> (-0.18%) ⬇️
server/embed/serve.go 59.38% <81.25%> (+1.58%) ⬆️

... and 23 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19221      +/-   ##
==========================================
- Coverage   68.80%   68.77%   -0.04%     
==========================================
  Files         420      420              
  Lines       35650    35665      +15     
==========================================
- Hits        24529    24528       -1     
- Misses       9694     9713      +19     
+ Partials     1427     1424       -3     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f37f48...3fa96c8. Read the comment docs.

@ahrtr ahrtr force-pushed the race-20250117 branch 2 times, most recently from c76bbeb to b1e5ebc Compare January 17, 2025 19:29
@ahrtr
Copy link
Member Author

ahrtr commented Jan 17, 2025

@fuweid @ivanvc @jmhbnz @serathius

This PR fixed a regression caused by #19139. So let's get this merged and backport to 3.5 and probably 3.4. We need to get it included in 3.5.18

@ahrtr
Copy link
Member Author

ahrtr commented Jan 17, 2025

/test pull-etcd-integration-1-cpu-arm64

ahrtr added 3 commits January 18, 2025 10:12
…te before it returns

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
… the errc

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@k8s-ci-robot
Copy link

@ahrtr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-e2e-arm64 3fa96c8 link true /test pull-etcd-e2e-arm64
pull-etcd-e2e-386 3fa96c8 link true /test pull-etcd-e2e-386
pull-etcd-e2e-amd64 3fa96c8 link true /test pull-etcd-e2e-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Race condition when closing the embedded etcd
2 participants