Support configuration of waiting times #110

deric · 2025-02-07T12:48:49Z

Default execution timeout is 300 seconds, with 15 retries it could stall puppet for 45 minutes if api server is not running.

Default exection timeout is 300 seconds, with 15 retries it could stall puppet for 45 minutes.

ananace · 2025-02-07T13:01:50Z

Note that kubectl itself has its own execution timeout, so a 45 minute duration is entirely impossible. But with 15 retries and a server that's broken enough to accept connection but not actually finish transferring the response, you could maybe reach 10 minutes.

I'd prefer if you don't lower the retry count though, on most of my tests I'm seeing retry counts around 8-10 for the first startup, before Kubernetes has finished initial setup and can start accepting the core resources that need to be installed. So reducing it below that limit would mean you'd have to apply another catalog just to get the basic cluster online.

deric · 2025-02-07T15:11:57Z

@ananace Sure, the point is to allow configuration of waiting times. I've updated the default, please test it for your use-case. The problem is, that the exec is running also with --noop flag.

deric · 2025-02-07T17:10:58Z

With 15 retries, puppet run timeouts after 40 minutes. IMHO this it should take that long.

error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49380->127.0.0.1:6443: read: connection reset by peer
Error: /Stage[main]/K8s::Server::Resources/Kubectl_apply[kubeconfig-in-cluster]: Could not evaluate: Execution of '/usr/bin/kubectl --namespace kube-system --kubeconfig /root/.kube/config get ConfigMap kubeconfig-in-cluster --output json' returned 1: E0207 16:54:36.422891   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49222->127.0.0.1:6443: read: connection reset by peer"
E0207 16:54:57.115619   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:38800->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:17.678958   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:52732->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:38.273354   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:60918->127.0.0.1:6443: read: connection reset by peer"
E0207 16:55:58.952072   11214 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:37034->127.0.0.1:6443: read: connection reset by peer"
error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:37034->127.0.0.1:6443: read: connection reset by peer
Error: /Stage[main]/K8s::Server::Resources::Bootstrap/K8s::Server::Bootstrap_token[puppet]/Kubectl_apply[bootstrap-token-puppet]: Could not evaluate: Execution of '/usr/bin/kubectl --namespace kube-system --kubeconfig /root/.kube/config get Secret bootstrap-token-puppet --output json' returned 1: E0207 16:56:19.391543   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:51516->127.0.0.1:6443: read: connection reset by peer"
E0207 16:56:40.117858   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49754->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:00.764906   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:49988->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:21.445790   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:34018->127.0.0.1:6443: read: connection reset by peer"
E0207 16:57:42.176439   11527 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://localhost:6443/api?timeout=32s\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:44482->127.0.0.1:6443: read: connection reset by peer"
error: Get "https://localhost:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: read tcp 127.0.0.1:44482->127.0.0.1:6443: read: connection reset by peer
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 2427.20 seconds

ananace · 2025-02-09T09:33:05Z

That particular issue you're seeing is exactly what the wait_online type is there to prevent, i.e. that the server isn't up and that therefore every kubectl_apply individually ends up slowly failing.

deric · 2025-02-09T10:29:21Z

That particular issue you're seeing is exactly what the wait_online type is there to prevent

Well, obviously it isn't working.

ananace · 2025-02-09T11:00:57Z

Which is definitely odd, I can't get it to build a catalog where that relationship isn't in place, so that particular part is very unlikely to be the issue.

Any catalog application that requires the apiserver to be (re)started will trigger the wait for it to go online, and any catalog application where the apiserver is already running should already be up and responding to requests.

The only way I can think of where your scenario occurs is if the catalog starts the apiserver and waits for it to go online - or has it already started as the catalog is being applied, and then has the apiserver crash out while Puppet moves on to start applying resources onto it.
Or if it's broken enough that it successfully starts, but never actually reaches the point of starting to accept requests, so Puppet assumes it to be working.

Support configuration of waiting times

5f80736

Default exection timeout is 300 seconds, with 15 retries it could stall puppet for 45 minutes.

ananace added the enhancement New feature or request label Feb 7, 2025

Use 15 retries

586a860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support configuration of waiting times #110

Support configuration of waiting times #110

deric commented Feb 7, 2025

ananace commented Feb 7, 2025

deric commented Feb 7, 2025

deric commented Feb 7, 2025

ananace commented Feb 9, 2025 •

edited

Loading

deric commented Feb 9, 2025

ananace commented Feb 9, 2025 •

edited

Loading

Support configuration of waiting times #110

Are you sure you want to change the base?

Support configuration of waiting times #110

Conversation

deric commented Feb 7, 2025

ananace commented Feb 7, 2025

deric commented Feb 7, 2025

deric commented Feb 7, 2025

ananace commented Feb 9, 2025 • edited Loading

deric commented Feb 9, 2025

ananace commented Feb 9, 2025 • edited Loading

ananace commented Feb 9, 2025 •

edited

Loading

ananace commented Feb 9, 2025 •

edited

Loading