Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit metrics via otel instead of custom format #4762

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

andreasohlund
Copy link
Member

@andreasohlund andreasohlund commented Jan 31, 2025

This replaces the homegrown metrics in the audit instance with OTEL to allow for easier consumption in visualization tools like Prometheus, Azure Monitor etc

@andreasohlund andreasohlund self-assigned this Jan 31, 2025
@andreasohlund
Copy link
Member Author

@lailabougria can you spot anything off in terms of how we emit the telemetry? naming etc

new DetectSuccessfulRetriesEnricher(),
new SagaRelationshipsEnricher()
}.Concat(auditEnrichers).ToArray();
var enrichers = new IEnrichImportedAuditMessages[] { new MessageTypeEnricher(), new EnrichWithTrackingIds(), new ProcessingStatisticsEnricher(), new DetectNewEndpointsFromAuditImportsEnricher(endpointInstanceMonitoring), new DetectSuccessfulRetriesEnricher(), new SagaRelationshipsEnricher() }.Concat(auditEnrichers).ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreasohlund update your Rider formatting so that this will not be put on a single line

readonly Meter batchSizeMeter;
readonly Meter batchDurationMeter;
readonly Counter receivedMeter;
readonly Histogram<long> auditBatchSize = AuditMetrics.Meter.CreateHistogram<long>($"{AuditMetrics.Prefix}.batch_size_audits");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called batch_size_audits when the application is particular.servicecontrol.audit?

Comment on lines +286 to +290
readonly Counter<long> storedAudits = AuditMetrics.Meter.CreateCounter<long>($"{AuditMetrics.Prefix}.stored_audit_messages");
readonly Counter<long> storedSagas = AuditMetrics.Meter.CreateCounter<long>($"{AuditMetrics.Prefix}.stored_saga_audits");
readonly Histogram<double> auditBulkInsertDuration = AuditMetrics.Meter.CreateHistogram<double>($"{AuditMetrics.Prefix}.bulk_insert_duration_audits", unit: "ms");
readonly Histogram<double> sagaAuditBulkInsertDuration = AuditMetrics.Meter.CreateHistogram<double>($"{AuditMetrics.Prefix}.bulk_insert_duration_sagas", unit: "ms");
readonly Histogram<double> auditCommitDuration = AuditMetrics.Meter.CreateHistogram<double>($"{AuditMetrics.Prefix}.commit_duration_audits", unit: "ms");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are better off managing all these metrics in one class with different methods for signaling a counter.

This way such method could also increase multiple counters or one instance counter and a lenght counters.

This also makes it much clearer what actual metrics exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially because here its also not clear what the name will be. I think its better if all names are listed in a single location including there prefix/scope.

{
if (!Uri.TryCreate(settings.OtelMetricsUrl, UriKind.Absolute, out var otelMetricsUri))
{
throw new UriFormatException($"Invalid OtelMetricsUrl: {settings.OtelMetricsUrl}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was returned in a search:

The environment variable OTEL_EXPORTER_OTLP_ENDPOINT exists as a standardized way to specify the OpenTelemetry OTLP endpoint. By default, it points to http://localhost:4317 for gRPC and http://localhost:4318 for HTTP/protobuf.
There are also some related environment variables:

OTEL_EXPORTER_OTLP_PROTOCOL for specifying the protocol (grpc, http/protobuf)
OTEL_EXPORTER_OTLP_HEADERS for additional headers
OTEL_EXPORTER_OTLP_TIMEOUT for request timeouts

Maybe instead we use a more verbose name?

OTEL_EXPORTER_OTLP_ENDPOINT

If not maybe immediately we should ensure that environment variable expansion is active so you could do:

<add key="ServiceControl.Audit/OTEL_EXPORTER_OTLP_ENDPOINT" value="%OTEL_EXPORTER_OTLP_ENDPOINT%" />

or for that matter:

SERVICECONTROL_AUDIT_OTEL_EXPORTER_OTLP_ENDPOINT=%OTEL_EXPORTER_OTLP_ENDPOINT%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, while testing envvar expansion behavior I stumbled upon the settings reader:

// The 3 app namespaces can be loaded from env vars without the namespace, so settings like TransportType can be
// shared between all instances in a container .env file. Settings in all other namespaces must be fully specified.
if (CanSetWithoutNamespaceList.Contains(settingsNamespace))
{
var nameOnly = regex.Replace(name, "_");
yield return nameOnly.ToUpperInvariant();
yield return nameOnly;
}
}
static readonly HashSet<SettingsRootNamespace> CanSetWithoutNamespaceList =
[
new SettingsRootNamespace("ServiceControl"),
new SettingsRootNamespace("ServiceControl.Audit"),
new SettingsRootNamespace("Monitoring")
];

This means we can't use it with a prefix as that logic would still load the value directly from OTEL_EXPORTER_OTLP_ENDPOINT.

}
builder.Services.AddOpenTelemetry()
.ConfigureResource(b => b.AddService(
serviceName: "Particular.ServiceControl.Audit",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be added to AuditMetrics

Comment on lines +5 to +10
static class AuditMetrics
{
public const string MeterName = "Particular.ServiceControl";
public static readonly Meter Meter = new(MeterName, "0.1.0");
public static readonly string Prefix = "particular.servicecontrol.audit";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should better define its intend:

  • No need for Audit in the name as the namespace already is clear
  • Its for OpenTelemetry
  • These are constants
Suggested change
static class AuditMetrics
{
public const string MeterName = "Particular.ServiceControl";
public static readonly Meter Meter = new(MeterName, "0.1.0");
public static readonly string Prefix = "particular.servicecontrol.audit";
}
static class TelemetryGlobals
{
public const string Source = "Particular.ServiceControl";
public const string Scope = "particular.servicecontrol.audit";
public static readonly Meter Meter = new(MeterName, "0.1.0");
}

@@ -109,6 +109,7 @@ public string RootUrl
public int Port { get; set; }

public bool PrintMetrics => SettingsReader.Read<bool>(SettingsRootNamespace, "PrintMetrics");
public string OtelMetricsUrl { get; set; } = SettingsReader.Read<string>(SettingsRootNamespace, nameof(OtelMetricsUrl));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on OTEL_EXPORTER_OTLP_ENDPOINT

Suggested change
public string OtelMetricsUrl { get; set; } = SettingsReader.Read<string>(SettingsRootNamespace, nameof(OtelMetricsUrl));
const string OTEL_EXPORTER_OTLP_ENDPOINT = nameof(OTEL_EXPORTER_OTLP_ENDPOINT);
[]
public string OtelExporterOltpEndpointUrl { get; init; } = SettingsReader.Read<string>(SettingsRootNamespace, OTEL_EXPORTER_OTLP_ENDPOINT);

Comment on lines +76 to +78
var auditSw = Stopwatch.StartNew();
await unitOfWork.RecordProcessedMessage(processedMessage, context.Body);
auditBulkInsertDuration.Record(auditSw.ElapsedMilliseconds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of things done in this method. Even multiple measurements.
I think we should consider doing the measures in the unitOfWork methods itself. The measurement is then simply that whole method and isn't making this method even worse then it already is :-)

readonly Histogram<long> auditBatchSize = AuditMetrics.Meter.CreateHistogram<long>($"{AuditMetrics.Prefix}.batch_size_audits");
readonly Histogram<double> auditBatchDuration = AuditMetrics.Meter.CreateHistogram<double>($"{AuditMetrics.Prefix}.batch_duration_audits", unit: "ms");
readonly Histogram<double> auditMessageSize = AuditMetrics.Meter.CreateHistogram<double>($"{AuditMetrics.Prefix}.audit_message_size", unit: "kilobytes");
readonly Counter<long> receivedAudits = AuditMetrics.Meter.CreateCounter<long>($"{AuditMetrics.Prefix}.received_audits");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong with the naming here. THese shouldn't have a prefix like this:

Suggested change
readonly Counter<long> receivedAudits = AuditMetrics.Meter.CreateCounter<long>($"{AuditMetrics.Prefix}.received_audits");
readonly Counter<long> receivedAudits = AuditMetrics.Meter.CreateCounter<long>(
"received_total",
description: "Total number of audit messages received"
);

readonly Meter bulkInsertCommitDurationMeter;
readonly IMessageSession messageSession;
readonly Lazy<IMessageDispatcher> messageDispatcher;
readonly Counter<long> storedAudits = AuditMetrics.Meter.CreateCounter<long>($"{AuditMetrics.Prefix}.stored_audit_messages");
Copy link
Member

@ramonsmits ramonsmits Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, these prefixes shouldn't be used and isn't according to spec. The Meter should define the prefix/scope AFAIK:

var meter = meterFactory.Create("Particular.ServiceControl.Audit");
stored_messages_total = meter.CreateCounter<int>("stored_messages_total");

or

var meter = meterFactory.Create("Particular.ServiceControl");
stored_messages_total = meter.CreateCounter<int>("audit.stored_messages_total");

but as we have seperate instance my preference would be the 1st.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants