Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Updated manager.py for reading secrets to check for the existence of … #495

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

raviranjan0309
Copy link

Issue:
When someone runs Kubeflow Faring from Kubeflow Pipeline, he/she will get below error:
IOError: [Errno 2] No such file or directory: ‘/etc/secrets/user-gcp-sa.json’

Reason:
When Kubeflow Fairing tries to launch the pod for model training like LightGBM distributed parallel training, it tries to check the presence of the secret by using list instead of get and when list fails due to lack of permission, it just does not attach any secrets to the pod. This causes the whole Kubeflow pipeline to fail.

Description of your changes:
The secret_exists function is incorrectly using list_namespaced_secret (

secrets = client.CoreV1Api().list_namespaced_secret(namespace)
) to check for the existence of a secret instead of read_namespaced_secret.
The default pipeline-runner SA used by Kubeflow Pipelines does not have the list permission for secrets, so fairing fails to check that the secret exists and fails to attach it. kubeflow/pipelines#3742
This helped me run Kubeflow fairing from the Kubeflow pipeline successfully.

…a secret instead of listing

Issue:
When someone runs Kubeflow Faring from Kubeflow Pipeline, he/she will get below error:
IOError: [Errno 2] No such file or directory: ‘/etc/secrets/user-gcp-sa.json’

Reason:
When Kubeflow Fairing tries to launch the pod for model training like LightGBM distributed parallel training, it tries to check the presence of the secret by using list instead of get and when list fails due to lack of permission, it just does not attach any secrets to the pod. This causes the whole Kubeflow pipeline to fail.

Description of your changes:
The secret_exists function is incorrectly using list_namespaced_secret (https://github.com/kubeflow/fairing/blob/e0d44e870b467bbd773f836a8b1b648b79019dd1/kubeflow/fairing/kubernetes/manager.py#L220) to check for the existence of a secret instead of read_namespaced_secret. 
The default pipeline-runner SA used by Kubeflow Pipelines does not have the list permission for [secrets](https://github.com/kubeflow/manifests/blob/9e6ef8681003c195560fb7b3b75211830c19b7a0/pipeline/pipelines-runner/base/cluster-role.yaml#L9), so fairing fails to check that the secret exists and fails to attach it. kubeflow/pipelines#3742
This helped me run Kubeflow fairing from the Kubeflow pipeline successfully.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jinchihe
You can assign the PR to them by writing /assign @jinchihe in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @raviranjan0309. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@raviranjan0309
Copy link
Author

/assign @jinchihe Please verify my changes and provide your feedback.

@raviranjan0309
Copy link
Author

/assign @jinchihe

Please look into my PR and let me know for more details

@raviranjan0309
Copy link
Author

raviranjan0309 commented May 23, 2020

@karthikv2k @jinchihe @iancoffey Please review this pull request. I am working on building ML platform using Kubeflow. This change is crucial to run Kubeflow fairing from the pipeline.

@jinchihe
Copy link
Member

/ok-to-test
/assign

@k8s-ci-robot
Copy link
Contributor

@raviranjan0309: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-fairing-presubmit 363af5b link /test kubeflow-fairing-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@raviranjan0309
Copy link
Author

@jinchihe I see the test case failed. I just made a single line code change and checked-in the same. Do I need to perform any extra step to make this test run success?

@@ -217,7 +217,7 @@ def secret_exists(self, name, namespace):
:returns: bool: True if the secret exists, otherwise return False.

"""
secrets = client.CoreV1Api().list_namespaced_secret(namespace)
secrets = client.CoreV1Api().read_namespaced_secret(namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read_namespaced_secret function should be like this read_namespaced_secret(name, namespace, pretty=pretty, exact=exact, export=export) from docs https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#read_namespaced_secret. The name should be required args.

@jinchihe
Copy link
Member

@raviranjan0309 we need to ensure the CI test pass. And I'm not sure what the different permission requirement between list_namespaced_secret and read_namespaced_secret? if you are sure that works fine, would you please show more? Thanks.

@jinchihe
Copy link
Member

@raviranjan0309 Would you please update this PR? Thanks.

@raviranjan0309
Copy link
Author

@jinchihe I am working on updating secrets permissions in pipeline runner under Kubeflow manifest. If that works then there will not be any need to make the change here.

@jinchihe
Copy link
Member

@jinchihe I am working on updating secrets permissions in pipeline runner under Kubeflow manifest. If that works then there will not be any need to make the change here.

OK. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants