Skip to content

Commit

Permalink
Optional removal of node taint on successful IP assignment (#146)
Browse files Browse the repository at this point in the history
* feat: Remove taint key from node if provided

* refactor: Remove logger dependency from Tainter

* feat: Unit tests for Tainter implementation

* chore: Add section to README about Node Taints feature

* fix: Log warning when taint key not found on node

* fix: Add missing operator property to toleration in readme

* feat: Update Helm chart to support TAINT_KEY feature

* fix: Suppress linter
  • Loading branch information
RagnarHal authored May 13, 2024
1 parent a5d4618 commit f24b10b
Show file tree
Hide file tree
Showing 8 changed files with 423 additions and 1 deletion.
41 changes: 40 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ To enable IPv6 support, set the `ipv6` flag (or set `IPV6` environment variable)

### Kubernetes Service Account

KubeIP requires a Kubernetes service account with the following permissions:
KubeIP requires a Kubernetes service account with at least the following permissions:

```yaml
apiVersion: v1
Expand Down Expand Up @@ -129,6 +129,44 @@ spec:
value: "true"
```
### Node Taints
KubeIP can be configured to attempt removal of a Taint Key from its node once the static IP has been successfully assigned, preventing workloads from being scheduled on the node until it has successfully received a static IP address. This can be useful, for example, in cases where the workload must call resources with IP-whitelisting, to prevent race conditions between KubeIP and the workload on newly provisioned nodes.
To enable this feature, set the `taint-key` configuration parameter (See [How to run KubeIP](#how-to-run-kubeip)) to the taint key that should be removed. Then add a toleration to the KubeIP DaemonSet, so that it itself can be scheduled on the tainted nodes. For example, given that new nodes are created with a taint key of `kubeip.com/not-ready`:

```diff
kind: DaemonSet
spec:
template:
spec:
serviceAccountName: kubeip-service-account
+ tolerations:
+ - key: kubeip.com/not-ready
+ operator: Exists
+ effect: NoSchedule
containers:
- name: kubeip
image: doitintl/kubeip-agent
env:
+ - name: TAINT_KEY
+ value: kubeip.com/not-ready
```

The parameter has no default value, and if not set, KubeIP will not attempt to remove any taints. If the provided Taint Key is not present on the node, KubeIP will simply log this fact and continue normally without attempting to remove it. If the Taint Key is present, but removing it fails for some reason, KubeIP will release the IP address back into the pool before restarting and trying again.

Using this feature requires KubeIP to have permission to patch nodes. To use this feature, the `ClusterRole` resource rules need to be updated. **Note that if this configuration option is not set, KubeIP will not attempt to patch any nodes, and the change to the rules is not necessary.**

Please keep in mind that this will give KubeIP permission to make updates to any node in your cluster, so please make sure that this aligns with your security requirements before enabling this feature!

```diff
rules:
- apiGroups: [ "" ]
resources: [ "nodes" ]
- verbs: [ "get" ]
+ verbs: [ "get", "patch" ]
```

### AWS

Make sure that KubeIP DaemonSet is deployed on nodes that have a public IP (node running in public subnet) and uses a Kubernetes service
Expand Down Expand Up @@ -231,6 +269,7 @@ OPTIONS:
--project value name of the GCP project or the AWS account ID (not needed if running in node) [$PROJECT]
--region value name of the GCP region or the AWS region (not needed if running in node) [$REGION]
--release-on-exit release the static public IP address on exit (default: true) [$RELEASE_ON_EXIT]
--taint-key value specify a taint key to remove from the node once the static public IP address is assigned [$TAINT_KEY]
--retry-attempts value number of attempts to assign the static public IP address (default: 10) [$RETRY_ATTEMPTS]
--retry-interval value when the agent fails to assign the static public IP address, it will retry after this interval (default: 5m0s) [$RETRY_INTERVAL]
--lease-duration value duration of the kubernetes lease (default: 5) [$LEASE_DURATION]
Expand Down
4 changes: 4 additions & 0 deletions chart/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ metadata:
rules:
- apiGroups: [ "" ]
resources: [ "nodes" ]
{{- if .Values.rbac.allowNodesPatchPermission }}
verbs: [ "get", "patch" ]
{{- else }}
verbs: [ "get" ]
{{- end }}
- apiGroups: [ "coordination.k8s.io" ]
resources: [ "leases" ]
verbs: [ "create", "delete", "get" ]
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ spec:
fieldPath: spec.nodeName
- name: FILTER
value: {{ .Values.daemonSet.env.FILTER | quote }}
- name: TAINT_KEY
value: {{ .Values.daemonSet.env.TAINT_KEY | quote }}
- name: LOG_LEVEL
value: {{ .Values.daemonSet.env.LOG_LEVEL | quote }}
- name: LOG_JSON
Expand Down
2 changes: 2 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ serviceAccount:
# Role-Based Access Control (RBAC) configuration.
rbac:
create: true
allowNodesPatchPermission: false

# DaemonSet configuration.
daemonSet:
Expand All @@ -35,6 +36,7 @@ daemonSet:
kubeip: use
env:
FILTER: labels.kubeip=reserved;labels.environment=demo
TAINT_KEY: ""
LOG_LEVEL: debug
LOG_JSON: true
resources:
Expand Down
26 changes: 26 additions & 0 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,26 @@ func run(c context.Context, log *logrus.Entry, cfg *config.Config) error {
return errors.Wrap(err, "assigning static public IP address")
}

if cfg.TaintKey != "" {
logger := log.WithField("taint-key", cfg.TaintKey)
tainter := nd.NewTainter(clientset)

didRemoveTaint, err := tainter.RemoveTaintKey(ctx, n, cfg.TaintKey)
if err != nil {
logger.Error("removing taint key failed, releasing static public IP address")
if releaseErr := releaseIP(assigner, n); releaseErr != nil { //nolint:contextcheck
log.WithError(releaseErr).Error("releasing static public IP address after taint key removal failed")
}
return errors.Wrap(err, "removing node taint key")
}

if didRemoveTaint {
logger.Info("taint key removed successfully")
} else {
logger.Warning("taint key not present on node, skipped removal")
}
}

// pause the agent to prevent it from exiting immediately after assigning the static public IP address
// wait for the context to be done: SIGTERM, SIGINT
<-ctx.Done()
Expand Down Expand Up @@ -303,6 +323,12 @@ func main() {
Category: "Configuration",
Value: true,
},
&cli.StringFlag{
Name: "taint-key",
Usage: "specify a taint key to remove from the node once the static public IP address is assigned",
EnvVars: []string{"TAINT_KEY"},
Category: "Configuration",
},
&cli.StringFlag{
Name: "log-level",
Usage: "set log level (debug, info(*), warning, error, fatal, panic)",
Expand Down
3 changes: 3 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ type Config struct {
LeaseDuration int `json:"lease-duration"`
// LeaseNamespace is the namespace of the kubernetes lease
LeaseNamespace string `json:"lease-namespace"`
// TaintKey is the taint key to remove from the node once the IP address is assigned
TaintKey string `json:"taint-key"`
}

func NewConfig(c *cli.Context) *Config {
Expand All @@ -50,5 +52,6 @@ func NewConfig(c *cli.Context) *Config {
cfg.ReleaseOnExit = c.Bool("release-on-exit")
cfg.LeaseDuration = c.Int("lease-duration")
cfg.LeaseNamespace = c.String("lease-namespace")
cfg.TaintKey = c.String("taint-key")
return &cfg
}
73 changes: 73 additions & 0 deletions internal/node/tainter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package node

import (
"context"
"encoding/json"
"fmt"

"github.com/doitintl/kubeip/internal/types"
"github.com/pkg/errors"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
typesv1 "k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
)

type Tainter interface {
RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error)
}

type tainter struct {
client kubernetes.Interface
}

func deleteTaintsByKey(taints []v1.Taint, taintKey string) ([]v1.Taint, bool) {
newTaints := []v1.Taint{}
didDelete := false

for i := range taints {
if taintKey == taints[i].Key {
didDelete = true
continue
}
newTaints = append(newTaints, taints[i])
}

return newTaints, didDelete
}

func NewTainter(client kubernetes.Interface) Tainter {
return &tainter{
client: client,
}
}

func (t *tainter) RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error) {
// get node object from API server
n, err := t.client.CoreV1().Nodes().Get(ctx, node.Name, metav1.GetOptions{})
if err != nil {
return false, errors.Wrap(err, "failed to get kubernetes node")
}

// Remove taint from the node representation
newTaints, didDelete := deleteTaintsByKey(n.Spec.Taints, taintKey)
if !didDelete {
return false, nil
}

// Marshal the remaining taints of the node into json format for patching.
// The remaining taints may be empty, and that will result in an empty json array "[]"
newTaintsMarshaled, err := json.Marshal(newTaints)
if err != nil {
return false, errors.Wrap(err, "failed to marshal new taints")
}

// Patch the node with only the remaining taints
patch := fmt.Sprintf(`{"spec":{"taints":%v}}`, string(newTaintsMarshaled))
_, err = t.client.CoreV1().Nodes().Patch(ctx, node.Name, typesv1.MergePatchType, []byte(patch), metav1.PatchOptions{})
if err != nil {
return false, errors.Wrap(err, "failed to patch node taints")
}

return true, nil
}
Loading

0 comments on commit f24b10b

Please sign in to comment.