Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When executor of Spark3 application has a lifecycle the webhook wil allways deny it #2457

Open
1 task done
pvbouwel opened this issue Mar 5, 2025 · 0 comments
Open
1 task done
Labels
kind/bug Something isn't working

Comments

@pvbouwel
Copy link
Contributor

pvbouwel commented Mar 5, 2025

What happened?

  • ✋ I have searched the open/closed issues and my issue is not listed.

Our spark applications were not getting executors.

Reproduction Code

I do not have a simple reproduction ready but in my understanding any spark spec with spark3 and a lifecycle on the executor will reproduce this.

Expected behavior

Normal spark executions. Executors coming up like if there were no lifecycle hook defined.

Actual behavior

Only the drivers became available. The spark-operator webhook showed it was denying them:

{"level":"INFO","ts":"2025-03-04T16:03:16.787Z","caller":"webhook/sparkpod_defaulter.go:93","msg":"Denying Spark pod","name":"my-app-bad7229561dd2950-exec-40","namespace":"spark","errorMessage":"Spark container executor not found in pod my-app-bad7229561dd2950-exec-40"}

Environment & Versions

  • Kubernetes Version: 1.25.5
  • Spark Operator Version: 2.0.2
  • Apache Spark Version: 3.5.2

Additional context

I have patched the code and was able to get our environment operating as expected so I will create a PR soon. This will clarify the problem in the code which seems to have been solved for a lot of 'adders' but the lifecycle one seemed to be forgotten.

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@pvbouwel pvbouwel added the kind/bug Something isn't working label Mar 5, 2025
pvbouwel added a commit to pvbouwel/spark-operator that referenced this issue Mar 5, 2025
Before this fix if you have  a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue [2457](kubeflow#2457)

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>
google-oss-prow bot pushed a commit that referenced this issue Mar 6, 2025
* bugfix: A lifecycle on a spark3 executor should not fail

Before this fix if you have  a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue [2457](#2457)

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

* tests: Add coverage for spark3 executor with a lifecycle

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

* make go-fmt

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>

---------

Signed-off-by: pvbouwel <463976+pvbouwel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant