Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): configurable log level for driver / launcher images #11278

Merged
merged 9 commits into from
Feb 21, 2025

Conversation

CarterFendley
Copy link
Contributor

@CarterFendley CarterFendley commented Oct 8, 2024

IMPORTANT

Design changes have been made since the original description, see this comment for updated usage



Description of your changes:

This PR adds the ability to change log level in the driver / launcher containers. This is implemented in a similar pattern as the overrides for driver / launcher images. Specifically, you can add the following environment variables to the ml-pipeline deployments:

    spec:
      containers:
      - env:
        - name: DRIVER_LOG_LEVEL
          value: "3"
        - name: LAUNCHER_LOG_LEVEL
          value: "3"

Note: A numerical value such as the literal 3 not "3" here will be invalid deployment spec and validation on the spec will fail causing kubectl edit to reject it with the message: error: deployments.apps "ml-pipeline" is invalid.

Other minor alterations

  1. In this commit two locations were updated to use the workflowCompiler.driverImage and workflowCompiler.launcherImage attributes which are populated here. This is a very minor change but seemed better to invoke only once and match other such usages (in importer.go and dag.go). If there are reasons this should be re-invoked, please let me know.
  2. The --copy flag were moved into the arguments block, to match other implementations. Again, lmk if this is not wanted.

Feedback wanted

The environment variable for this is similar to the V2_LAUNCHER_IMAGE and V2_DRIVER_IMAGE but without the V2_ prefix. If anyone has preferences here, I do not, so happy to take any path.

Checklist:

Copy link

Hi @CarterFendley. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hbelmiro
Copy link
Contributor

hbelmiro commented Oct 8, 2024

/ok-to-test

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Oct 9, 2024
@HumairAK HumairAK added this to the KFP 2.4.0 milestone Oct 9, 2024
Copy link
Contributor

@droctothorpe droctothorpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for tackling this, @CarterFendley!

@rimolive
Copy link
Member

rimolive commented Oct 9, 2024

Maybe worth adding just one unit test to verify if setting both env vars will generate the Workflow yaml with the new flags.

@CarterFendley
Copy link
Contributor Author

Maybe worth adding just one unit test to verify if setting both env vars will generate the Workflow yaml with the new flags.

Will do :)

@CarterFendley CarterFendley force-pushed the carter/log-level branch 2 times, most recently from 3f73d4c to 714901f Compare October 15, 2024 21:46
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Oct 15, 2024
@CarterFendley
Copy link
Contributor Author

CarterFendley commented Oct 15, 2024

Okay in this commit I have updated the compiler tests with logic to optional take in environment variables and set them:

if tt.envVars != nil {
    for _, envVar := range tt.envVars {
	    parts := strings.Split(strings.ReplaceAll(envVar, " ", ""), "=")
	    os.Setenv(parts[0], parts[1])
    
	    // Unset after test cases has ended
	    defer os.Unsetenv(parts[0])
    }
}

To test cases and golden files have been added to test the logic included in this PR.

{
	jobPath:          "../testdata/hello_world.json",
	platformSpecPath: "",
	argoYAMLPath:     "testdata/with_logging/hello_world.yaml",
	envVars:          []string{"DRIVER_LOG_LEVEL=5", "LAUNCHER_LOG_LEVEL=5"},
},
{
	jobPath:          "../testdata/importer.json",
	platformSpecPath: "",
	argoYAMLPath:     "testdata/with_logging/importer.yaml",
	envVars:          []string{"DRIVER_LOG_LEVEL=5", "LAUNCHER_LOG_LEVEL=5"},
},

@github-actions github-actions bot added ci-passed All CI tests on a pull request have passed and removed ci-passed All CI tests on a pull request have passed labels Oct 15, 2024
@rimolive
Copy link
Member

/lgtm

Copy link
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @CarterFendley.
Looks good. I just left a few minor comments.

@@ -303,8 +318,9 @@ func (c *workflowCompiler) addContainerExecutorTemplate(refName string) string {
InitContainers: []wfapi.UserContainer{{
Container: k8score.Container{
Name: "kfp-launcher",
Image: GetLauncherImage(),
Command: []string{"launcher-v2", "--copy", component.KFPLauncherPath},
Image: c.launcherImage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this change besides optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not really, just made it a bit concise to add flags and follows the pattern used in the other Container definitions for driver / launcher

Copy link
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
Signed-off-by: carter.fendley <carter.fendley@gmail.com>
launcher

Signed-off-by: carter.fendley <carter.fendley@gmail.com>
@CarterFendley
Copy link
Contributor Author

Modifications have been made to address this issue found by @gregsheremeta, thanks for pointing that out! Since the instance was one where the driver sets the log level of the launcher a design change was made to have one unified PIPELINE_LOG_LEVEL env var to prevent the somewhat confusing implementations of passing the LAUNCHER_LOG_LEVEL as a command line argument to the driver (or similar implementations).

The new usage is to update the environment variable on the ml-pipelines deployment to the following.

    spec:
      containers:
      - env:
        - name: PIPELINE_LOG_LEVEL
          value: "3"

Importantly, as before, it is important that a numerical value such as the literal 3 not "3" here will be invalid deployment spec and validation on the spec will fail causing kubectl edit to reject it with the message: error: deployments.apps "ml-pipeline" is invalid.

After these updates, the main container now also runs the launcher with configured log level.
Screenshot 2025-02-19 at 7 34 27 PM

@hbelmiro @HumairAK, or any others: Please let me know if you have any additional feedback on this PR. Apologies for the delay in the patch!

Signed-off-by: carter.fendley <carter.fendley@gmail.com>
@HumairAK
Copy link
Collaborator

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: droctothorpe, HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit d2c0376 into kubeflow:master Feb 21, 2025
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants