Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-3687: Enhancement proposal for setting EIPs for Ingress Controller via installer #1688

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

miheer
Copy link
Contributor

@miheer miheer commented Sep 26, 2024

Enhancement proposal for setting EIPs for Ingress Controller via installer

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 26, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 26, 2024

@miheer: This pull request references CORS-3687 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@miheer
Copy link
Contributor Author

miheer commented Sep 26, 2024

@mtulio @patrickdillon @r4f4 PTAL

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 26, 2024

@miheer: This pull request references CORS-3687 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Enhancement proposal for setting EIPs for Ingress Controller via installer

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

platform:
aws:
region: <AWS region>
lbType: NLB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the installer have to validate the lbType is NLB when eipAllocations are specified? What happens when lbType: Classic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can check this. I will add a CEL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this question, I don't see this in the EP yet.

Also, the installer doesn't use CEL, but I think it's fine to provide it in this proposal as validation guidelines.

aws:
region: <AWS region>
lbType: NLB
networkLoadBalancer:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
networkLoadBalancer:
ingressNetworkLoadBalancer:

So it's not confused with the one created by the Installer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes made

```

In the `IngressController` status, check the status for the following:
- Error messages for invalid eips or eips not present in the subnet of the VPC is `The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound:` for status type `LoadBalancerReady` and `Available`.
Copy link
Contributor

@sadasu sadasu Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this error occurs, how does the customer/user remedy the situation? Restart installation with correct EIPs or can the user edit the ingressNetworkLoadBalancer list within the Ingress Config spec so that CCM can successfully reconcile the creation of the Ingress LB or both? Any preference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadasu can you please check point 3 under section ### Implementation Details/Notes/Constraints

Please let me know WYT.

Copy link
Contributor Author

@miheer miheer Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if an error is invoked we can either stop or ask for the correct input from user. I need to check the installer code if we can add a loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadasu can you please check point 3 under section ### Implementation Details/Notes/Constraints

Please let me know WYT.

@miheer OK with that section.
Probably out-of-scope for this enhancement, when Installer determines the predicted LB subnet count, would it useful for the Installer to pass the predicted LB subnets to the AWS CCM via a manifest? That way we will not be duplicating this logic of figuring out the correct subnets to use. /cc @patrickdillon @mtulio

@miheer miheer force-pushed the eip-installer branch 4 times, most recently from 9f2d0ae to c39762e Compare October 21, 2024 06:55
@miheer
Copy link
Contributor Author

miheer commented Oct 21, 2024

@mtulio @patrickdillon @r4f4 @sadasu @gcs278 @Miciah PTAL especially the validation part. I need help installer team to review the point 3 under section ### Implementation Details/Notes/Constraints

(i.e. any subnet without another cluster's kubernetes.io/cluster/<cluster-id> tag).
We can call this Predicted LB Subnet Count.

We can examine the following scenarios:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Predicted LB Subnet count < BYO Subnet count? If not, these are the possible scenarios:

  1. EIP Allocations count < BYO Subnet count: error
  2. EIP Allocations count < Predicted LB Subnet count: error

Is is an issue if extra EIP Allocations are supplied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the count of eip allocation should exactly match the number of subnets. It can't be less or greater than and must be equal to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @r4f4 is referring to the case where the number of BYO subnets is unknown, and what constitutes an invalid situation in that case.

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on reviewing. Only a couple of comments for now, will come back later.

Comment on lines 17 to 28
creation-date: 2024-05-29
last-updated: 2024-09-04
tracking-link:
- https://issues.redhat.com/browse/CORS-3440
see-also:
- "enhancements/ingress/lb-subnet-selection-aws.md"
replaces:
- "enhancements/installer/aws-customer-provided-subnets.md"
superseded-by:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to update some of the fields after a copy/paste

Suggested change
creation-date: 2024-05-29
last-updated: 2024-09-04
tracking-link:
- https://issues.redhat.com/browse/CORS-3440
see-also:
- "enhancements/ingress/lb-subnet-selection-aws.md"
replaces:
- "enhancements/installer/aws-customer-provided-subnets.md"
superseded-by:
creation-date: 2024-?
last-updated: 2024-?
tracking-link:
- https://issues.redhat.com/browse/CORS-3687
see-also:
replaces:
superseded-by:


This enhancement extends the OpenShift Installer's install-config, enabling cluster admins to
configure EIPs for AWS NLB load balancer created for their default NLB IngressController at install time.
This proposal allows the install-time configuration of subnets for the `default` IngressController.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proposal allows the install-time configuration of subnets for the `default` IngressController.

## Motivation

### User Stories
- As a cluster administrator using installer, I want to configure default NLB IngressController to use EIPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you just defined the default IngressController above, but then don't use the backticks. Just want to make sure indeed talking about the default IngressController.

Suggested change
- As a cluster administrator using installer, I want to configure default NLB IngressController to use EIPs.
- As a cluster administrator using installer, I want to configure `default` NLB IngressController to use EIPs.

Here's one recommended option:

The Installer should count all LB subnets by predicting what subnets be chosen by the AWS CCM
(i.e. any subnet without another cluster's kubernetes.io/cluster/<cluster-id> tag).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only subnets without another cluster's kubernetes.io/cluster/<cluster-id> tag, but the subnet won't be selected if the load balancer is external and the subnet is private.

See https://github.com/openshift/enhancements/pull/1634/files#diff-ffcfdf0d21ba360a17e0ac9846e83eec8ecf0ba8b15d429d42e7de2dbd0bfaf7R109-R116

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

which uses the value from the field `eipAllocations` of `IngressController` CR.

2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster.
We will be comparing the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth an explanation why: because the cluster will select 1 subnet per AZ, and the number of EIPs must be equal to subnets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


We can examine the following scenarios:

##### BYO Subnet Count != EIPs Allocations && BYO Subnet Count == Predicted LB Subnet count && Predicted LB Subnet count != EIPs Allocations:
Copy link
Contributor

@gcs278 gcs278 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to eliminate all of this complex validation if:

But it's a tough call. Maybe we have to do this validation in the beginning so you can release the EIP Allocations feature, but when the new subnets API comes out, you can get rid of all of this complexity. I think that would be a massive win as far as a maintenance burden.

I would keep this validation here for now, but I will let you know if there are any updates.

@gcs278
Copy link
Contributor

gcs278 commented Oct 24, 2024

/assign

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good information in here. I think validation is the biggest item of contention that we need to solve.

### Non-Goals
- Creation of EIPs in AWS.
- Static IP usage with NLBs for OpenShift API server, DNS, Nat Gateways, LBs, Instances.
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty obvious, but maybe worth a mention that this default is not for user-created IngressControllers.

Suggested change
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts.
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts.
- Set default EIPs for user-created IngressControllers

- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts.

## Proposal
This enhancement adds API fields in the installer and the IngressController specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this proposal is adding any APIs to the IngressController right?

Suggested change
This enhancement adds API fields in the installer and the IngressController specification
This enhancement adds API fields in the installer and the Ingress Config specification

### API Extensions

#### Installer Updates
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `NetworkLoadBalancerParameters` is added as an optional field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this field name.

Suggested change
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `NetworkLoadBalancerParameters` is added as an optional field.
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `eipAllocations` is added as an optional field.

Comment on lines 87 to 88
// eipAllocations holds eipAllocations for an default AWS
// NLB IngressController.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created this as an struct of EIPAllocations, good idea, so it may be extended in the future if anything else needs EIPs. But shouldn't this go doc reflect that it's a generic structure? Does this reflect your idea of the API?:

Suggested change
// eipAllocations holds eipAllocations for an default AWS
// NLB IngressController.
// eipAllocations contains Elastic IP (EIP) allocations for AWS resources
// within the cluster.

Comment on lines 94 to 95
// EIPAllocations holds configuration parameters for an
// default AWS NLB IngressController. For Example: Setting AWS EIPs https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here. I think the link is unnecessary, but keep it if you like it.

Suggested change
// EIPAllocations holds configuration parameters for an
// default AWS NLB IngressController. For Example: Setting AWS EIPs https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
// EIPAllocations contains Elastic IP (EIP) allocations for AWS resources
// within the cluster.


## Open Questions
- Q: As per [EP](https://github.com/openshift/enhancements/pull/1634), old subnets field will be deprecated. So, shall we skip the validation for checking
number of `BYO Subnets` provided in the `install-config.yaml` with the number of eipAllocations ? Or shall we compare the old subnets field with the eipAllocations ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50/50 on doing validation. It's going to be painful to write and maintain, but also it sucks when I install a cluster and realize 45 minutes later that I didn't add enough EIPs. Instant feedback is much better, as long as we get it right.

Like I said in another comment, I'm pushing for a simplification in the new subnets field, where this whole "predicited LB subnet count" goes away. IF that's introduced, validation will be trivial len(subnets) == len(eips). I yield to the installer team on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Open Questions
- Q: As per [EP](https://github.com/openshift/enhancements/pull/1634), old subnets field will be deprecated. So, shall we skip the validation for checking
number of `BYO Subnets` provided in the `install-config.yaml` with the number of eipAllocations ? Or shall we compare the old subnets field with the eipAllocations ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding one more that I'm curious about? CC: @JoelSpeed

Suggested change
- Q: Should we split Ingress Config into defaulting for the default IngressController and defaulting for user-created IngressControllers?

the Predicted LB Subnets != BYO Subnet Count scenario as not valid? And possibly block future installs as a resolution
to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic.

4. #### Validation to check if EIPs are not already assigned to resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if the EIP exists? I don't think you explicitly mention that in these sections.

Copy link
Contributor Author

@miheer miheer Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean if the eip does not exists because we need the eips to be present but unassociated right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made this change
4. #### Validation to check if EIPs exist and are not already assigned to resources.

EIPs can be assigned to many resource types, like Nat Gateways, *LBs, Instances, etc. The attribute associationId will be set when the EIP is already associated.
To mitigate this we could add that validation, at least on installer, to provide quick-feedback (fail when validate install-config) to the user when the provided EIP is already associated to another resource.
It would be nice to have a validation before setting the annotation to CCM, keeping the operator degraded before disrupting the service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a small section on the defaulting mechanics for the IngressController? It's definitely an implementation detail, but an important one that feels important enough to mention here. Something @Miciah or @JoelSpeed might be interested in commenting on. I wonder if doing status "right" was a bit of a mistake, because now things are becoming inconsistent in the API. Either way, that that ship sailed in 4.17.

Suggested change
#### IngressConfig EIP Allocation Defaulting Mechanics for Ingress Controller
Traditionally, the Ingress Operator has populated default values from the Ingress Config into the `status`, making `status` effectively reflect the desired state of the IngressController. However, since `eipAllocations` in `status` represents the **actual** state, not the **desired** state, the default `eipAllocations` values must be set in the `spec` when the Ingress Operator initially admits the IngressController.
This approach is new. The Ingress Operator does not typically set default values in `spec` for load balancer configurations if the user hasn’t explicitly provided them. While this defaulting pattern is more consistent with Kubernetes conventions for `spec` and `status` (and is also our only option in this situation), it's important to acknowledge that this inconsistency in defaulting behavior could cause confusion for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcs278 @miheer if we were to make all the status fields of the Ingress Config consistent and populate them with the actual state, is the issue that there is no field that reflects the defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IngressConfig is no longer relevant.


### Goals
- Users are able to use EIPs for a default NLB `IngressController` at install time.
- Check for unassociated EIPs before passing to CCM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should make this more generic? Otherwise, you are missing a couple of validations here.

Suggested change
- Check for unassociated EIPs before passing to CCM.
- Add validation to the installer to prevent invalid EIP configurations

// +listType=atomic
// +kubebuilder:validation:XValidation:rule=`self.all(x, self.exists_one(y, x == y))`,message="eipAllocations cannot contain duplicates"
// +kubebuilder:validation:MaxItems=10
IngressNetworkLoadBalancer []EIPAllocation `json:"ingressNetworkLoadBalancer"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the existing PublicIpv4Pool field be used instead to satisfy the requirements?

https://github.com/openshift/installer/blob/694d083d3332bd5b892f0098a91be6c206b18fce/pkg/types/aws/platform.go#L119-L122

I think the answer is no, IIUC users want to setup firewall rules in advance, so they need to specify explicit IP addresses. A pool won't suffice. It may be worth drawing a distinctoin between your proposal and this existing field. Would be interested in @mtulio's take on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio PTAL .

Copy link
Contributor

@mtulio mtulio Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon is correct: No. Those are different use cases. The PublicIpv4Pool field will expect a pool ID advertised to AWS (Public IPv4 pool owned by the User) and the installer (CAPA) will allocate any IPs (EIPs) from that pool. The EIP proposal is requiring pre-allocated EIPs and presenting to the installer, where those can be consumed from any existing pool option: amazon-provided, user-pool (BYO Public IPv4), or customer-owner IP (CoIP/Outposts).

Comment on lines 350 to 360
I think we need some feedback from installer-team, @patrickdillon or @mtulio or @sadasu on this type of validation. Should the installer team consider
the Predicted LB Subnets != BYO Subnet Count scenario as not valid? And possibly block future installs as a resolution
to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCPBUGS-17432 is closed, but I believe the problem is not actually resolved until RFE-1717 is implemented, correct?

Are you proposing enforcing this validation only when EIP allocations are specified? I'm certainly open to that. If we do it in all cases (i.e. when EIP allocations are not specified) that might be more tricky, as the install config that "worked" before, now starts failing...

But in the case of EIP Allocations, this is new functionality so we won't break any existing workflows. If we did throw a validation error when EIP Allocation Count != Predicted LB Subnets, then users would be able to resolve the issue on their own, right? Say by adding the unmanaged tag to other subnets in the VPC...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcs278 ^^ WDYT ?

Copy link
Contributor

@gcs278 gcs278 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCPBUGS-17432 is closed, but I believe the problem is not actually resolved until RFE-1717 is implemented, correct?

Correct. It was closed as "not a bug", but I do think it could still be considered a bug, or at least, it's confusing UX that has lead to other RFEs like https://issues.redhat.com/browse/RFE-2816, and it was the reason why Service Delivery opened the subnet RFE https://issues.redhat.com/browse/RFE-1717 in the first place. Everything comes back to OCPBUGS-17432...

Are you proposing enforcing this validation only when EIP allocations are specified?

Right. Doing it in all cases (with no-EIPS), would be something I could implement in the subnets EP #1634.

If we do it in all cases (i.e. when EIP allocations are not specified) that might be more tricky, as the install config that "worked" before, now starts failing...

Yes, in the context of this EP, but I don't think that applies if we deprecate and add a new subnetConfig field like the suggestion in #1634. Behavior can change since customers have to explicitly opt into the new field. I've recently realized this and suggested as a improvement in UX with making this new field. My suggestion, rather than error out when the subnet counts aren't equal, is to just make Cluster Subnets == IngressController subnets (and completely bypass the AWS CCM subnet discovery logic). But this is a discussion for that EP, so feel free to jump in there.

If we did throw a validation error when EIP Allocation Count != Predicted LB Subnets, then users would be able to resolve the issue on their own, right? Say by adding the unmanaged tag to other subnets in the VPC...

Right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or why not mark the operator as Degraded until the issue is straightened out by the admin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure how the installer will do this. I think this out of scope for this EP. This will require logic in CIO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can pursue this when Grant's EP for subnet is under progress.

@candita
Copy link
Contributor

candita commented Nov 13, 2024

I will look into the validation part on this.
/assign @candita

// +listType=atomic
// +kubebuilder:validation:XValidation:rule=`self.all(x, self.exists_one(y, x == y))`,message="eipAllocations cannot contain duplicates"
// +kubebuilder:validation:MaxItems=10
IngressNetworkLoadBalancer []EIPAllocation `json:"ingressNetworkLoadBalancer"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is IngressNetworkLoadBalancer an existing field name? If it's a new name, it doesn't seem suitable to describe a list of EIP allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name was already discussed with the installer team. I am not able to find the thread.


### Implementation Details/Notes/Constraints

1. #### Set EIP through installer for the default IngressController:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. #### Set EIP through installer for the default IngressController:
1. #### Configuring EIP for the default IngressController at installation time:

Also, the numbers 1 - 6 scattered through this section aren't helpful. It makes it seem like an ordered list of steps. I suggest using the # formatting to distinguish sections and subsections.

3. then creating a service of service type load balancer with the annotation `service.beta.kubernetes.io/aws-load-balancer-eip-allocations`,
which uses the value from the field `eipAllocations` of `IngressController` CR.

2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the Validation subsections belong in one section entitled "Field Validation".

I'm not exactly sure what you mean here, but maybe this:

Suggested change
2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster.
#### Field Validation
##### The number of Availability Zones much match the number of `eipAllocations`
The installer must ensure this when installing in managed VPC (full-automated). It can compare the number of `eipAllocations` to the discovered zones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Field Validation

which uses the value from the field `eipAllocations` of `IngressController` CR.

2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster.
We will be comparing the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml` because the cluster will select 1 subnet per AZ, and the number of EIPs must be equal to subnets..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We will be comparing the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml` because the cluster will select 1 subnet per AZ, and the number of EIPs must be equal to subnets..
Add a function to the installer code to compare the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml`. This is required because the cluster will select one subnet per AZ, so the number of EIPs must be equal to the number of subnets.

2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster.
We will be comparing the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml` because the cluster will select 1 subnet per AZ, and the number of EIPs must be equal to subnets..

3. #### Validation on installer when installing in unmanaged (BYO VPC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. #### Validation on installer when installing in unmanaged (BYO VPC)
##### The number of public subnets must match the number of `eipAllocations` in an unmanaged VPC (BYO VPC)

Comment on lines 328 to 329
We will be comparing if the number of public subnets added to the install-config matches with len(eipAllocations).
However, The problem is that the AWS CCM can select subnets that aren't provided in the BYO Subnets, see https://issues.redhat.com/browse/OCPBUGS-17432.
Copy link
Contributor

@candita candita Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what you mean?

Suggested change
We will be comparing if the number of public subnets added to the install-config matches with len(eipAllocations).
However, The problem is that the AWS CCM can select subnets that aren't provided in the BYO Subnets, see https://issues.redhat.com/browse/OCPBUGS-17432.
Add a function to the installer code to compare the number of public subnets added to the to the number of `eipAllocations`.
This validation is hindered by the problem is that the AWS CCM may add subnets it discovers independently, and which aren't contained in the BYO Subnets list.

The referenced bug is closed as Not a Bug, and rather wordy, so you should summarize it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part has been removed.

3. #### Validation on installer when installing in unmanaged (BYO VPC)
We will be comparing if the number of public subnets added to the install-config matches with len(eipAllocations).
However, The problem is that the AWS CCM can select subnets that aren't provided in the BYO Subnets, see https://issues.redhat.com/browse/OCPBUGS-17432.
Here's one recommended option:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to call this out and commit to it as a solution or omit it and find some other way.

Suggested change
Here's one recommended option:
The solution is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part has been removed.

Comment on lines 332 to 334
The Installer should count all LB subnets by predicting what subnets be chosen by the AWS CCM
(i.e. any subnet without another cluster's kubernetes.io/cluster/<cluster-id> tag and the load balancer is not external and the subnet is not private).
We can call this Predicted LB Subnet Count.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't think this is great solution as described. Can you explain why you can't provide a more certain outcome instead of a prediction? Why can't you say - the first n subnets are chosen such that the subnet is not already claimed by another cluster, load balancer is not external, and subnet is not private, where n equals the number of eipAllocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predicted part has been removed from this EP.

Comment on lines 340 to 341
Throw a simple error that just says, EIP != Provided Subnets:
The number of EIP Allocations does not equal the number of provided Subnets, the cluster will fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the pattern is to update a cluster operator status as Degraded while the counts don't match. Then the cluster would fail after a timeout for that degraded operator. The admin can get a chance to correct the situation with a proper error message. cc @wking, who might have ideas about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are installer validations, so if the validation fails we throw an error and don't start the install.

Throw a simple error that just says, EIP != Provided Subnets:
The number of EIP Allocations does not equal the number of provided Subnets, the cluster will fail.

##### BYO Subnet Count == EIPs Allocations && BYO Subnet Count != Predicted LB Subnet count && Predicted LB Subnet count != EIPs Allocations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even have to check any further if the BYO Subnet Count matches the eipAllocation count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prediction part has been removed.

Comment on lines 351 to 352
This is an odd scenario. The user got the # of a EIPs == Predicted LB Subnet count, I suppose because they anticipated the AWS CCM's generous
selection of subnets. This is valid scenario for no error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is an odd scenario. The user got the # of a EIPs == Predicted LB Subnet count, I suppose because they anticipated the AWS CCM's generous
selection of subnets. This is valid scenario for no error message.
This is a valid scenario. The user managed to match the number of EIPs to the Predicted LB Subnet count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prediction part has been removed


##### BYO Subnet Count == EIPs Allocations && BYO Subnet Count == Predicted LB Subnet count && Predicted LB Subnet count == EIPs Allocations:

Obvious valid scenario.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Obvious valid scenario.
This is a valid scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prediction part has been removed.

the Predicted LB Subnets != BYO Subnet Count scenario as not valid? And possibly block future installs as a resolution
to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic.

4. #### Validation to check if EIPs exist and are not already assigned to resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. #### Validation to check if EIPs exist and are not already assigned to resources.
##### Ensure that EIPs exist and are not already assigned.

to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic.

4. #### Validation to check if EIPs exist and are not already assigned to resources.
EIPs can be assigned to many resource types, like Nat Gateways, *LBs, Instances, etc. The attribute associationId will be set when the EIP is already associated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EIPs can be assigned to many resource types, like Nat Gateways, *LBs, Instances, etc. The attribute associationId will be set when the EIP is already associated.
EIPs can be assigned to many resource types, like NAT gateways, load balancers, instances, etc. The attribute `associationId` will be set when the EIP is already assigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually associated is the right word so will just keep it rest I will follow your suggestion.

4. #### Validation to check if EIPs exist and are not already assigned to resources.
EIPs can be assigned to many resource types, like Nat Gateways, *LBs, Instances, etc. The attribute associationId will be set when the EIP is already associated.
To mitigate this we could add that validation, at least on installer, to provide quick-feedback (fail when validate install-config) to the user when the provided EIP is already associated to another resource.
It would be nice to have a validation before setting the annotation to CCM, keeping the operator degraded before disrupting the service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implmentation discussion. Couldn't you proactively declare that you will update the operator status, rather than mentioning it would be nice to have? If it's out of scope, then say it's out of scope.

Suggested change
It would be nice to have a validation before setting the annotation to CCM, keeping the operator degraded before disrupting the service.
It would be nice to have a validation before setting the annotation to CCM, keeping the operator Degraded before disrupting the service.

To mitigate this we could add that validation, at least on installer, to provide quick-feedback (fail when validate install-config) to the user when the provided EIP is already associated to another resource.
It would be nice to have a validation before setting the annotation to CCM, keeping the operator degraded before disrupting the service.

5. #### IngressConfig EIP Allocation Defaulting Mechanics for Ingress Controller
Copy link
Contributor

@candita candita Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. #### IngressConfig EIP Allocation Defaulting Mechanics for Ingress Controller
#### EIP Allocation Defaulting Mechanics for Ingress Controller

`eipAllocations` values must be set in the `spec` when the Ingress Operator initially admits the
IngressController.
This approach is new. The Ingress Operator does not typically set default values in `spec` for load balancer
configurations if the user hasn’t explicitly provided them. While this defaulting pattern is more consistent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin user has explicitly provided them, as a part of the installation config.

### The `default` IngressController

This proposal will refer to the IngressController that gets created automatically during installation and handles
the platform routes (console, auth, canary, etc.) as the `default` IngressController.
Copy link
Contributor

@candita candita Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention whether this feature makes the installation a UPI (user-provisioned) rather than IPI installation?

For example, will the configuration settings be listed in the installation config parameter docs:
https://docs.openshift.com/container-platform/4.17/installing/installing_aws/installation-config-parameters-aws.html#installation-config-parameters-aws

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made some modification :
This proposal will refer to the IngressController that gets created automatically during user provisioned installation and handles
the platform routes (console, auth, canary, etc.) as the default IngressController.

Comment on lines 378 to 255
6. #### Validation to check if lbType was to NLB when eipAllocations were provided in the installer
EIPs can provided only for `NLB` type `IngressController` so, the installer will be check for the lbType set to NLB when `eipAllocations` are provided
in the `install-config.yaml`. We can't add a CEL in the Platform API type of the installer so a validation will need to be
added in the installer code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be moved up to the "Validation" section.

Also, you didn't mention CEL in any other part of the proposal. Can you explain why we can't use CEL to validate this or other things, in a section at the beginning of the Implementation Details?

Suggested change
6. #### Validation to check if lbType was to NLB when eipAllocations were provided in the installer
EIPs can provided only for `NLB` type `IngressController` so, the installer will be check for the lbType set to NLB when `eipAllocations` are provided
in the `install-config.yaml`. We can't add a CEL in the Platform API type of the installer so a validation will need to be
added in the installer code.
##### Ensure `lbType` is set to NLB when `eipAllocations` are configured at install-time
EIPs are valid only for an `NLB` type `IngressController`. We can't add a CEL in the Platform API type of the installer so this validation will happend in the installer code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using CEL but we have added the logic in the installer code.

Comment on lines +389 to +264
won't get assigned EIPs, which will cause the LoadBalancer service to be in persistently pending state by the
Cloud Controller Manager (CCM). The reason for the persistently pending state is posted to the status of the IngressController.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Degraded state, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Degradation happens after sometime. Initially we see Pending state.

- Q: As per [EP](https://github.com/openshift/enhancements/pull/1634), old subnets field will be deprecated. So, shall we skip the validation for checking
number of `BYO Subnets` provided in the `install-config.yaml` with the number of eipAllocations ? Or shall we compare the old subnets field with the eipAllocations ?

- Q: Should we split Ingress Config into defaulting for the default IngressController and defaulting for user-created IngressControllers?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not relevant to this feature though, right? Because this feature is only for the default IC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing it as we no longer use Ingress config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept it but added a answer.


## Infrastructure Needed

This EP works in AWS environment as AWS EIPs work on AWS environment only.
Copy link
Contributor

@candita candita Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This EP works in AWS environment as AWS EIPs work on AWS environment only.
Because EIPs are AWS objects, this proposal is valid only for the AWS environment.

Copy link
Contributor

openshift-ci bot commented Dec 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from candita. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@miheer
Copy link
Contributor Author

miheer commented Dec 9, 2024

@candita PTAL. I have made the requested changes.

Copy link
Contributor

openshift-ci bot commented Dec 9, 2024

@miheer: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +241 to +250
##### EIP Allocation Defaulting Mechanics for Ingress Controller
Traditionally, the Ingress Operator has populated default values from the Ingress Config into the `status`,
making `status` effectively reflect the desired state of the IngressController. However, since
`eipAllocations` in `status` represents the **actual** state, not the **desired** state, the default
`eipAllocations` values must be set in the `spec` when the Ingress Operator initially admits the
IngressController.
This approach is new. The Ingress Operator does not typically set default values in `spec` for load balancer
configurations if the user hasn’t explicitly provided them. While this defaulting pattern is more consistent
with Kubernetes conventions for `spec` and `status` (and is also our only option in this situation), it's
important to acknowledge that this inconsistency in defaulting behavior could cause confusion for users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove this section now, we simplified to remove the Ingress Config like you mentioned, so no relevance to EIPAllocations.

Add a function to the installer code to compare the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml`. This is required because the cluster will select one subnet per AZ, so the number of EIPs must be equal to the number of subnets.

##### The number of public subnets must match the number of `eipAllocations` in an unmanaged VPC (BYO VPC)
We will be comparing if the number of public subnets added to the install-config matches with len(eipAllocations).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See openshift/installer#8204 (comment), it's not quite this simple of a comparison.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants