Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuration of waiting times #110

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

deric
Copy link

@deric deric commented Feb 7, 2025

Default execution timeout is 300 seconds, with 15 retries it could stall puppet for 45 minutes if api server is not running.

Default exection timeout is 300 seconds, with 15 retries it could stall
puppet for 45 minutes.
@ananace
Copy link
Member

ananace commented Feb 7, 2025

Note that kubectl itself has its own execution timeout, so a 45 minute duration is entirely impossible. But with 15 retries and a server that's broken enough to accept connection but not actually finish transferring the response, you could maybe reach 10 minutes.

I'd prefer if you don't lower the retry count though, on most of my tests I'm seeing retry counts around 8-10 for the first startup, before Kubernetes has finished initial setup and can start accepting the core resources that need to be installed. So reducing it below that limit would mean you'd have to apply another catalog just to get the basic cluster online.

@ananace ananace added the enhancement New feature or request label Feb 7, 2025
@deric
Copy link
Author

deric commented Feb 7, 2025

@ananace Sure, the point is to allow configuration of waiting times. I've updated the default, please test it for your use-case. The problem is, that the exec is running also with --noop flag.

@deric
Copy link
Author

deric commented Feb 7, 2025

With 15 retries, puppet run timeouts after 40 minutes. IMHO this it should take that long.

error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49380->127.0.0.1:6443: read: connection reset by peer
Error: /Stage[main]/K8s::Server::Resources/Kubectl_apply[kubeconfig-in-cluster]: Could not evaluate: Execution of '/usr/bin/kubectl --namespace kube-system --kubeconfig /root/.kube/config get ConfigMap kubeconfig-in-cluster --output json' returned 1: E0207 16:54:36.422891   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49222->127.0.0.1:6443: read: connection reset by peer"
E0207 16:54:57.115619   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:38800->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:17.678958   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:52732->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:38.273354   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:60918->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:58.952072   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:37034->127.0.0.1:6443: read: connection reset by peer"
error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:37034->127.0.0.1:6443: read: connection reset by peer
Error: /Stage[main]/K8s::Server::Resources::Bootstrap/K8s::Server::Bootstrap_token[puppet]/Kubectl_apply[bootstrap-token-puppet]: Could not evaluate: Execution of '/usr/bin/kubectl --namespace kube-system --kubeconfig /root/.kube/config get Secret bootstrap-token-puppet --output json' returned 1: E0207 16:56:19.391543   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:51516->127.0.0.1:6443: read: connection reset by peer"
E0207 16:56:40.117858   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49754->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:00.764906   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49988->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:21.445790   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:34018->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:42.176439   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:44482->127.0.0.1:6443: read: connection reset by peer"
error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:44482->127.0.0.1:6443: read: connection reset by peer
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 2427.20 seconds

@ananace
Copy link
Member

ananace commented Feb 9, 2025

That particular issue you're seeing is exactly what the wait_online type is there to prevent, i.e. that the server isn't up and that therefore every kubectl_apply individually ends up slowly failing.

@deric
Copy link
Author

deric commented Feb 9, 2025

That particular issue you're seeing is exactly what the wait_online type is there to prevent

Well, obviously it isn't working.

@ananace
Copy link
Member

ananace commented Feb 9, 2025

Which is definitely odd, I can't get it to build a catalog where that relationship isn't in place, so that particular part is very unlikely to be the issue.

Any catalog application that requires the apiserver to be (re)started will trigger the wait for it to go online, and any catalog application where the apiserver is already running should already be up and responding to requests.

The only way I can think of where your scenario occurs is if the catalog starts the apiserver and waits for it to go online - or has it already started as the catalog is being applied, and then has the apiserver crash out while Puppet moves on to start applying resources onto it.
Or if it's broken enough that it successfully starts, but never actually reaches the point of starting to accept requests, so Puppet assumes it to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants