Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition Issue: Assign Order of Execution to Certain Components. #5820

Open
acarlstein opened this issue Dec 4, 2024 · 5 comments
Open
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@acarlstein
Copy link

acarlstein commented Dec 4, 2024

Summary

There are occations where the order of deployment of components (of kind such as Job, CloudFunctionsFunction, etc.) matters. Find a way to indicate in the kustomize.yaml file in which order certain components should be deployed.

Description

Lets assume you're trying to deploy a Cloud Function using the CRD CloudFunctionsFunction of apiVersion cloudfunctions.cnrm.cloud.google.com/v1beta1.

This component requires that either:

  • You provide an url to a repository where the code reside,
  • or you to provide an url of a storage bucket where the code resides inside a ZIP file.

Regredably, the repository where the code resides isn't accessable by CloudFunctionsFunctions; therefore, you can only follow the "Zip file" approach. This increases the complexity because we want to have everything in one place.

We tried a solution by:

  1. Storing the code inside a ConfigMap of apiVersion v1
  2. Use a Job of apiVersion batch/v1 to (1) copy the code inside a zip file and (2) save the zip file into a storage bucket.
  3. The CloudFunctionFunction uses the zip file from the storage bucket

The Problem

The problem is the order of execution. Kustomize sometimes deployes the CloudFunctionsFunctions prior the Job zippnig and storing the code from the ConfigMap. The CloudFunctionsFunction will deploy "succesfully" but fail to run due the ZIP file been missing, then it doesn't try again to get it. This is a race condition issue.

Proposed Solutions

The following are some solutions to this issue:

  1. Following the example of systems such as "Terraform" and "Blueprints" introduce the argument dependsOn.
  2. Allow the use of annotations used for the purpose to indicate which components should run first. Example:
    job.yaml: Order: "1"

Definition of Done

Provide a mechanism that allows to indicate the order of deployment of all or certain components. Ensuring that some components will be deployed prior to other components.

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 4, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@acarlstein acarlstein changed the title Assign Order of Execution to Certain Components. Race Condition Issue: Assign Order of Execution to Certain Components. Dec 4, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2025
@acarlstein
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2025
@isarns
Copy link
Contributor

isarns commented Mar 4, 2025

Honestly, I don’t think this is doable with just Kustomize. It’s great for templating and patching, but managing deployment order isn’t really in its thing. If you really need to enforce order, you might look at using Helm—which has some hooks for ordering—or Argo CD, where you can set up sync waves to control what gets deployed when. A native dependsOn solution in Kustomize seems out of scope.

@DanInProgress
Copy link

^I generally agree with @isarns . It seems more like the job for an operator or scheduling plugin of some sort. Some slightly out-of-the-box suggestions that might suit use-cases similar to yours:

Job creates depending resource

  1. kustomize creates configMap with a CloudFunctionsFunction template
  2. kustomize creates Job specifying serviceAccountName and configMap
    1. Job uploads zip to GCS
    1. Job creates CloudFunctionsFunction with appropriate URL
  3. CloudFunctionsFunction successfully created referencing zip from previous step

Job creates required resource

this is unlikely to work for CloudFunctionsFunction because the zip ref is immutable, but it may work for similar uses

  1. kustomize creates Job specifying serviceAccountName
  2. kustomize creates <DependingResource> with a field like ...configMapRef set to name: archive-url
  3. <DependingResource> status/condition is updated to a pending state because configMap archive-url is not found
  4. Job uploads zip to GCS
  5. Job creates configMap archive-url containing the GCS URL
  6. <DependingResource> status/condition is updated by its controller and continues execution.

I've personally used a very similar pattern to gracefully handle sync delays for secrets from a shared vault in a home-grown controller. For Pod resources, envFrom.*.secretRef causes the pod to enter condition MountFailed until the secret exists. (this would also work with pods created by deployment or job)

Homegrown controller

If this is a common pattern for you or your org, you could write a custom controller that handles both the zip upload and the creation of the CloudFunctionsFunction in a standard way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

5 participants