Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTAGENT-254 Add support for enhanced RBAC permissions for otel-agent #1693

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krlv
Copy link
Contributor

@krlv krlv commented Feb 7, 2025

What this PR does / why we need it:

Add support for additional RBAC permissions required by k8sattributes processor.

RBAC Permission Issues in otel-agent

The OpenTelemetry Collector’s k8sattributes processor requires specific RBAC permissions to connect to the Kubernetes API server in order to enrich telemetry data with Kubernetes metadata. However, the default Service Account created by the Helm chart doesn't have these permissions: node agent (deployed as DaemonSet) retrieves pod information from the Kubelet /pod endpoints. This approach avoids overloading the Kubernetes API server in large clusters, but leads to errors in otel-agent when k8sattributes enabled: (Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User <serviceAccount> cannot list resource "pods" in API group "" at the cluster scope)

Proposed Solution

Create the necessary RBAC ClusterRole and ClusterRoleBinding for the k8sattributes processor. The creation logic will be tied to a new boolean parameter – datadog.otelCollector.rbac.create, which, when set to true, enables the chart to inspect the OTel configuration. If any k8sattributes processor is set to passthrough: false (i.e., it needs to call the Kubernetes API), the Helm chart will generate the required RBAC resources.

Implementation Details

  1. New boolean parameter: datadog.otelCollector.rbac.create (true | false).
  2. Conditional RBAC generation:
    1. If agents.rbac.create: true and datadog.otelCollector.rbac.create: true, the Helm chart checks the OTel Collector configuration for any k8sattributes processors configured with passthrough option disabled (either by omitting it or by explicitly setting it passthrough: false).
    2. If found, the necessary K8s ClusterRole and ClusterRoleBinding are generated and associated with the default agent’s Service Account.
  3. New list parameter: datadog.otelCollector.rbac.rules: []
    1. If agents.rbac.create:true and datadog.otelCollector.rbac.create:true, and datadog.otelCollector.rbac.rules provided, these rules are added to the otel-agent ClusterRole.

Special notes for your reviewer:

Why Cluster Role for additional RBAC permissions?

With just a role binding, the k8sattributes processor cannot query metadata such as labels and annotations from k8s nodes and namespaces which are cluster-scoped objects. This also means that the processor cannot set the value for k8s.cluster.uid attribute if enabled, since the k8s.cluster.uid attribute is set to the uid of the namespace kube-system which is not queryable with namespaced rbac.

Why introduce a separate datadog.otelCollector.rbac parameter vs reusing agents.rbac?
  1. Granular Control: separate parameter allows enabling or disabling additional permissions independently.
  2. Clarity Around OTel Functionality: Tying these specific permissions to the section of the Helm chart that configures Embedded OTel Collector helps customers understand what feature is driving the need for the extra RBAC.

Note: technically, any ClusterRole and ClusterRoleBinding introduced under datadog.otelCollector.rbac will apply to the entire pod housing the Datadog Agent with Embedded OTel Collector. While this may not perfectly map to the container-based naming convention, it provides a convenient and explicit toggle for customers. Existing precedent – datadog.secretBackend.roles parameter: scoped to Secret Backend feature; changes affect the default Service Account, and thus additional permissions available to all containers in the node agent pod.

Checklist

  • Chart Version bumped
  • Documentation has been updated with helm-docs (run: .github/helm-docs.sh)
  • CHANGELOG.md has been updated
  • Variables are documented in the README.md

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, I think you have a typo, but that's about it.

@@ -803,6 +803,8 @@ helm install <RELEASE_NAME> \
| datadog.otelCollector.config | string | `nil` | OTel collector configuration |
| datadog.otelCollector.enabled | bool | `false` | Enable the OTel Collector |
| datadog.otelCollector.ports | list | `[{"containerPort":"4317","name":"otel-grpc"},{"containerPort":"4318","name":"otel-http"}]` | Ports that OTel Collector is listening |
| datadog.otelCollector.rbac.create | bool | `true` | If true, create RBAC resources for OTel Collector |
| datadog.otelCollector.rbac.rules | list | `[]` | A set of additional RBAC rules for OTel Collector |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules or roles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rules. A single ClusterRole should be enough for otel-agent. If user have components that requires additional RBAC permissions, they can add these as rules to the ClusterRole via datadog.otelCollector.rbac.rules param; no need to create and bind additional ClusterRole(s).

Some of the users of opentelemetry-helm-charts might be familiar with this approach. It's very similar to what they use for the components that require custom RBAC permissions (components that aren't covered by presets) -- user can explicitly list additional rules that will be added to ClutserRole:
https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml#L226-L237

@@ -0,0 +1,40 @@
{{- if and .Values.agents.rbac.create (eq (include "should-enable-otel-agent" .) "true") .Values.datadog.otelCollector.rbac.create -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: should-enable-otel-agent defined here

@@ -803,6 +803,8 @@ helm install <RELEASE_NAME> \
| datadog.otelCollector.config | string | `nil` | OTel collector configuration |
| datadog.otelCollector.enabled | bool | `false` | Enable the OTel Collector |
| datadog.otelCollector.ports | list | `[{"containerPort":"4317","name":"otel-grpc"},{"containerPort":"4318","name":"otel-http"}]` | Ports that OTel Collector is listening |
| datadog.otelCollector.rbac.create | bool | `true` | If true, create RBAC resources for OTel Collector |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear from name / description that it's for k8sattributes processor. The field that creates the rbac in the otel helm charts is named kubernetesAttributes (although that one also adds the processor to pipelines). I think either changing the name or description to make this clearer would be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've change description, hope it's clearer now:

  • datadog.otelCollector.rbac.create controls creation of additional ClusterRole for otel-agent required by Kubernetes Attributes processor.
  • datadog.otelCollector.rbac.rules can be used to support additional RBAC permissions required by OTel components that are not included by default with otel-agent.

I prefer not to tie the params specifically to k8sattributes processor for 2 reasons:

  1. As you mentioned, we're not replicating the preset behavior from OTel Helm chart (we don't automatically add processors to pipelines, we only provide the necessary RBAC permissions if the customer decides to use k8sattributes processor)
  2. Customers can re-use these params for other components we don't currently support. For example, although Datadog's Kubernetes integration already collects K8s events, metrics, and logs out-of-the-box, come customers may chose to continue using k8sobjects receiver. In that case, they will be able to use datadog.otelCollector.rbac.rules to define any additional permissions required to collect collect Kubernetes events.

By keeping the parameter names more general, we can cover a range of potential OTel Collector use cases instead of tying everything to just one processor.

@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch from c9ff915 to a9632ef Compare February 10, 2025 08:12
@github-actions github-actions bot added the chart/datadog This issue or pull request is related to the datadog chart label Feb 10, 2025
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch 2 times, most recently from 63669c7 to 8338179 Compare February 10, 2025 08:43
@krlv krlv force-pushed the krlv/OTAGENT-254_k8s_permissions branch from 8338179 to c27e362 Compare February 19, 2025 00:02
@krlv krlv requested a review from clamoriniere February 19, 2025 00:04
@krlv krlv marked this pull request as ready for review February 19, 2025 00:05
@krlv krlv requested a review from a team as a code owner February 19, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart/datadog This issue or pull request is related to the datadog chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants