Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Adding CEL validations on v2 TrainJob CRD #2260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

akshaychitneni
Copy link

@akshaychitneni akshaychitneni commented Sep 16, 2024

What this PR does / why we need it:
This PR relates to #2209 adding CEL validations on TrainJob CRD. I will followup with validations implemented in webhook in separate PR

cc @andreyvelich @tenzen-y

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.
I left my first feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you implement integration testing if these validations should work?
We can implement those tests in https://github.com/kubeflow/training-operator/tree/126110fd4d76439bd04ca9fdf96bafb7ea3b6910/test/integration/webhook.v2.

@coveralls
Copy link

coveralls commented Sep 17, 2024

Pull Request Test Coverage Report for Build 11003550343

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on cel-crd at 100.0%

Totals Coverage Status
Change from base Build 11001381410: 100.0%
Covered Lines: 66
Relevant Lines: 66

💛 - Coveralls

@tenzen-y
Copy link
Member

/hold

@tenzen-y
Copy link
Member

Additionally, could you sign DCO?

Signed-off-by: Akshay Chitneni <achitneni@apple.com>
@andreyvelich
Copy link
Member

/ok-to-test
/rerun-all

@andreyvelich andreyvelich changed the title Adding CEL validations on v2 TrainJob CRD KEP-2170: Adding CEL validations on v2 TrainJob CRD Sep 30, 2024
@andreyvelich
Copy link
Member

/assign @saileshd1402 @varshaprasad96

Copy link

@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402, varshaprasad96.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @saileshd1402 @varshaprasad96

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @akshaychitneni!
I left my initial comments.
/assign @kubeflow/wg-training-leads

@@ -56,6 +56,7 @@ type TrainJobList struct {
}

// TrainJobSpec represents specification of the desired TrainJob.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Contributor

@varshaprasad96 varshaprasad96 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a type scoped rule, making sure that it is not removed once set. Not sure if this is necessary as a default is being set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we set it here ?

// +kubebuilder:validation:XValidation:rule="self == oldSelf", message="ManagedBy value is immutable"

APIGroup *string `json:"apiGroup,omitempty"`

// Kind of the runtime being referenced.
// It must be one of TrainingRuntime or ClusterTrainingRuntime.
// Defaults to ClusterTrainingRuntime.
// +kubebuilder:default="ClusterTrainingRuntime"
// +kubebuilder:validation:XValidation:rule="self in ['ClusterTrainingRuntime', 'TrainingRuntime']", message="Kind must be ClusterTrainingRuntime or TrainingRuntime if set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Any thoughts on this validation given that we want to accept user's defined CRDs in the runtime reference as part of the runtime framework: #2248 ?

@@ -56,6 +56,7 @@ type TrainJobList struct {
}

// TrainJobSpec represents specification of the desired TrainJob.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set"
type TrainJobSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshaychitneni Do we want to add validations/defaults for other pars of TrainJob (e.g. Trainer, DatasetConfig, ModelConfig) as part of this PR ?

@@ -0,0 +1,155 @@
package cel_v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can add those integration tests as part of /test/integration/trainjob_controller_test.go, similar to JobSet: https://github.com/kubernetes-sigs/jobset/blob/main/test/integration/controller/jobset_controller_test.go#L49
WDYT @akshaychitneni @tenzen-y ?

Comment on lines +46 to +47
apiGroup := "kubeflow.org"
kind := "TrainingRuntime"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a new module constants under /pkg/constants/constants.go, that we will use for common constants like: Kind, APIGroup, etc. We will use them in different places.
WDYT @tenzen-y @akshaychitneni ?

Copy link
Contributor

@varshaprasad96 varshaprasad96 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add these under respective APIs groupversion_info.go instead, to make sure the import paths are cleaner while calling these constants for both the API versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I think, the GroupVersion is already set here: https://github.com/kubeflow/training-operator/blob/master/pkg/apis/kubeflow.org/v2alpha1/groupversion_info.go#L29, but not the Kind.
@varshaprasad96 @tenzen-y Where do you think we should put the Kind constants ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have Kind defined in there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants