-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
Extend the fluent forward exporter to support our TC -> LO use case.
We need a way to provide customization capability for resources the controller creates. The preferred way would be to use the typeoverride solution that we already have for SyslogNG
Currently only one collector can manage a tenant which we enforce through the tenant status. We want to allow however multiple different external or internal sources to implement the same tenancy rules. The idea to implement it is to dedicate the current Controller resource to the Kubernetes log collection use case and introduce separate CRDs for use cases such as receiving telemetry from external sources (where we process not just logs but metrics and traces as well). Even for the Kubernetes collector there is a use case we can think about where the one to many relationship implemented currently is too limited, because we would need multiple connector to be able to implement the global tenant configuration. (the use case is the multiple isolated node groups with a single global infra tenant)
The problem we are facing is that the current way to include pod labels (and other resource attributes) is suboptimal in certain cases
Resource SchemaURL:
Resource attributes:
-> k8s.container.name: Str(log-generator)
-> k8s.namespace.name: Str(tenant-demo-2)
-> k8s.pod.name: Str(log-generator-7ff5bb5c6f-624pp)
-> k8s.container.restart_count: Str(0)
-> k8s.pod.uid: Str(a39573bd-f899-491f-a37a-3e8e98c5b003)
-> k8s.pod.labels.app.kubernetes.io/instance: Str(log-generator)
-> k8s.pod.labels.app.kubernetes.io/name: Str(log-generator)
-> k8s.pod.start_time: Str(2024-03-21T10:38:02Z)
-> k8s.node.name: Str(loki)
-> k8s.pod.labels.pod-template-hash: Str(7ff5bb5c6f)
-> k8s.deployment.name: Str(log-generator)
-> loki.resource.labels: Str(k8s.pod.name, k8s.namespace.name)
- pending upstream fix:
Qs:
- buffer metrics?
- PVCs (or any alternative) with daemonsets: https://kubernetes.io/docs/concepts/storage/volumes/#local
When we deal with lots of outputs, one slow output can fill up the queues. If queues are limited there will be backpressure. If there is backpressure the source will stop. The idea here is to use separate receivers per tenant, but this need to be verified.
Look at how hot reload could improve the configuration update flow.
We want a PoC first through a discovery session. Metrics and traces will most probably require separate pipelines.
Currently we use OTTL to demonstrate the capabilities of the subscription filter, but we want to avoid that on the long run for security and operational maintainability reasons.
A tangible example: instead of using OTTL the user should provide kubernetes labels for example as filter expressions, which should be validated through the API, a webhook or the controller itself.
We lack a complete solution for collecting byte metrics, although we plan to use the count connector already. There is another approach that doesn't involve duplicating logs which is implemented in bindplane: https://github.com/observIQ/bindplane-agent/tree/release/v1.43.0/processor/metricextractprocessor
We have to keep considering both approaches until we can have a good measurement.
Qs
- understand how opamp provides as of metrics
- host logs
- file based logs through a managed sidecar container
- logs sent to a network/otlp endpoint directly
- kubernetes event log
- metrics and traces
Currently the receiver configuration is tuned to support containerd only.
Sort term: add a note in the docs that it only works with containerd for now Idea to investigate: setup fallback parser to support both
We could possibly optimize for the case when subscriptions have lots of overlap in their labelselector, thus might be sending the same data multiple times to the same destination. Instead of using a routerconnector for subscriptions we could possibly use a single pipeline to add all the subscriptions as subsequent processors and then use a routerconnector for the messages already labeled with the subscription id to route them to the right output.
Go with the simplest possible solution.
Existing alternatives currently (and possible improvement ideas)
- silly config check is available by default
- there is an option in the collector for syntax check, not implemented for the operator
- implementing a full config check by running an isolated job (probably not needed for our scenario, more for an aggregator where custom configs are applied by the user)
See the following issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4205
- Output secret management
- OTTL elimination from Subscription API
- Output API revamp (OTLP/Loki/Fluent)
- Shared output
- Variable size collector to support different node sizes -> daemonset does not support it
- Multiple daemonset for a single collector vs multiple collectors