Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk-metrics] Remove locking around exemplar updates in MetricPoint #5535

Merged

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Apr 12, 2024

Follow-up to #5465

Changes

  • Allow concurrent exemplar updates inside MetricPoint

Benchmarks

Counter

Before

Method AggregationTemporality ExemplarFilterType Mean Error StdDev
CounterHotPath Cumulative AlwaysOff 10.65 ns 0.144 ns 0.128 ns
CounterWith1LabelsHotPath Cumulative AlwaysOff 37.06 ns 0.576 ns 0.539 ns
CounterWith2LabelsHotPath Cumulative AlwaysOff 44.85 ns 0.647 ns 0.606 ns
CounterWith3LabelsHotPath Cumulative AlwaysOff 62.01 ns 0.479 ns 0.448 ns
CounterHotPath Cumulative AlwaysOn 22.88 ns 0.205 ns 0.182 ns
CounterWith1LabelsHotPath Cumulative AlwaysOn 49.37 ns 0.271 ns 0.253 ns
CounterWith2LabelsHotPath Cumulative AlwaysOn 56.46 ns 0.391 ns 0.366 ns
CounterWith3LabelsHotPath Cumulative AlwaysOn 73.57 ns 0.951 ns 0.889 ns

After

Method AggregationTemporality ExemplarFilterType Mean Error StdDev
CounterHotPath Cumulative AlwaysOff 10.87 ns 0.069 ns 0.065 ns
CounterWith1LabelsHotPath Cumulative AlwaysOff 36.05 ns 0.481 ns 0.426 ns
CounterWith2LabelsHotPath Cumulative AlwaysOff 44.59 ns 0.576 ns 0.539 ns
CounterWith3LabelsHotPath Cumulative AlwaysOff 62.53 ns 1.217 ns 1.139 ns
CounterHotPath Cumulative AlwaysOn 17.44 ns 0.375 ns 0.500 ns
CounterWith1LabelsHotPath Cumulative AlwaysOn 43.57 ns 0.381 ns 0.356 ns
CounterWith2LabelsHotPath Cumulative AlwaysOn 51.89 ns 0.364 ns 0.323 ns
CounterWith3LabelsHotPath Cumulative AlwaysOn 69.13 ns 0.785 ns 0.734 ns

Without exemplar enabled counter should be essentially as it was before. With exemplar counter gets a bit faster because it now doesn't need to fuss with a lock acquire/release for most measurements.

Histogram

Before

Method BoundCount ExemplarFilterType Mean Error StdDev
HistogramHotPath 10 AlwaysOff 36.36 ns 0.303 ns 0.283 ns
HistogramWith1LabelHotPath 10 AlwaysOff 64.80 ns 0.930 ns 0.870 ns
HistogramWith3LabelsHotPath 10 AlwaysOff 109.65 ns 1.654 ns 1.547 ns
HistogramHotPath 10 AlwaysOn 64.74 ns 1.292 ns 1.269 ns
HistogramWith1LabelHotPath 10 AlwaysOn 95.56 ns 0.925 ns 0.820 ns
HistogramWith3LabelsHotPath 10 AlwaysOn 154.83 ns 1.675 ns 1.566 ns

After

Method BoundCount ExemplarFilterType Mean Error StdDev
HistogramHotPath 10 AlwaysOff 37.67 ns 0.561 ns 0.497 ns
HistogramWith1LabelHotPath 10 AlwaysOff 64.87 ns 1.034 ns 0.967 ns
HistogramWith3LabelsHotPath 10 AlwaysOff 109.15 ns 1.058 ns 0.938 ns
HistogramHotPath 10 AlwaysOn 66.24 ns 0.994 ns 0.881 ns
HistogramWith1LabelHotPath 10 AlwaysOn 94.98 ns 0.715 ns 0.669 ns
HistogramWith3LabelsHotPath 10 AlwaysOn 154.98 ns 2.008 ns 1.878 ns

Histogram without exemplar enabled should be essentially as it was before. Histogram with exemplar should actually be slower by one Interlocked.Exchange call. Hoping to improve that in follow-ups but it doesn't cost a whole lot.

Stress test

Here is where these changes shine. The benchmarks above don't have any concurrency.

Setup

Stress test is run like this: OpenTelemetry.Tests.Stress.Metrics.exe -d 300 -t [Counter|Histogram] -o -e -v

Explanation of parameters:

  • -d: Run test for 5 minutes.
  • -t: Run test either using Counters or Histograms.
  • -o: Enable OTLP export every 5000ms. This is done to measure impact of allocations, locks introduced during collection.
  • -e: Enable exemplars.
  • -v: Enable a view which filters down the tags. This increases contention by reducing the unique number of MetricPoints being updated.

Results

Before

Counter:

  • Total Running Time (Seconds) 300
  • Total Loops: 3,414,183,059
  • Average Loops/Second: 11,379,283
  • Average CPU Cycles/Loop: 1,067
  • GC Total Allocated Bytes: 8265360

Histogram:

  • Total Running Time (Seconds) 300
  • Total Loops: 1,819,087,928
  • Average Loops/Second: 6,059,365
  • Average CPU Cycles/Loop: 1,370
  • GC Total Allocated Bytes: 11479464

After

Counter:

  • Total Running Time (Seconds) 300
  • Total Loops: 8,683,537,886
  • Average Loops/Second: 28,925,264
  • Average CPU Cycles/Loop: 1,951
  • GC Total Allocated Bytes: 7861152

For counters, throughput has more than doubled.

Histogram:

  • Total Running Time (Seconds) 300
  • Total Loops: 3,100,465,288
  • Average Loops/Second: 10,331,268
  • Average CPU Cycles/Loop: 2,546
  • GC Total Allocated Bytes: 11493808

For histograms, throughput is better but still well behind counters.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)

@CodeBlanch CodeBlanch added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package metrics Metrics signal related labels Apr 12, 2024
@CodeBlanch CodeBlanch requested a review from a team April 12, 2024 21:17
Copy link

codecov bot commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 93.82716% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 85.52%. Comparing base (6250307) to head (9ad2f34).
Report is 181 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5535      +/-   ##
==========================================
+ Coverage   83.38%   85.52%   +2.13%     
==========================================
  Files         297      289       -8     
  Lines       12531    12475      -56     
==========================================
+ Hits        10449    10669     +220     
+ Misses       2082     1806     -276     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 85.51% <93.82%> (?)
unittests-Solution-Stable 85.50% <93.82%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs 85.42% <75.00%> (+5.03%) ⬆️
src/OpenTelemetry/Metrics/MetricPoint.cs 94.45% <94.80%> (+25.98%) ⬆️

... and 76 files with indirect coverage changes

{
if (number < 0)
{
this.CompleteUpdateWithoutMeasurement();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix? do we need changelog for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think it is worth mentioning? Before for measurements of < 0 against an exponential histogram we no-op the update but we still set MetricPointStatus = MetricPointStatus.CollectPending which would cause the MetricPoint to be collected during the next export. Not sure if that is actually a bug or just odd 🤣 For cumulative I don't think it would mean anything. For delta, maybe?

/cc @alanwest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty subtle and not very noteworthy. If it were just me, I'd skip the changelog, but if you feel differently, I'll volunteer some verbiage:

Fixed an issue where a zero-valued histogram metric point may be exported after recording a negative measurement. This could only occur when using the Base2ExponentialHistogram aggregation. Negative measurements against an exponential histogram are not supported and should be ignored.

Copy link
Member

@vishweshbankwar vishweshbankwar Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it does not have any substantial end user impact. I guess ok to skip it.

I pointed this out as I noticed the change looked unrelated to the PR.

@CodeBlanch CodeBlanch merged commit bf423cc into open-telemetry:main Apr 17, 2024
39 checks passed
@CodeBlanch CodeBlanch deleted the sdk-metrics-no-exemplar-lock branch April 17, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics Metrics signal related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants