[sdk-metrics] Clean up locking in MetricPoint #5368

CodeBlanch · 2024-02-16T20:13:44Z

Changes

Use the isCriticalSectionOccupied on MetricPointOptionalComponents for both types of histograms instead of maintaining two additional flags (little less memory consumed).
Try to obtain the lock cheaply first before initializing SpinWait.

Merge requirement checklist

CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)

src/OpenTelemetry/Metrics/MetricPoint.cs

codecov · 2024-02-16T20:19:32Z

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (6250307) 83.38% compared to head (2530c2a) 83.35%.
Report is 90 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5368      +/-   ##
==========================================
- Coverage   83.38%   83.35%   -0.03%     
==========================================
  Files         297      277      -20     
  Lines       12531    11928     -603     
==========================================
- Hits        10449     9943     -506     
+ Misses       2082     1985      -97

Flag	Coverage Δ
unittests	`?`
unittests-Solution-Experimental	`83.33% <61.72%> (?)`
unittests-Solution-Stable	`83.02% <61.72%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...lemetry/Metrics/Base2ExponentialBucketHistogram.cs	`100.00% <ø> (ø)`
src/OpenTelemetry/Metrics/HistogramBuckets.cs	`100.00% <ø> (ø)`
...Telemetry/Metrics/MetricPointOptionalComponents.cs	`88.88% <100.00%> (+11.11%)`	⬆️
src/OpenTelemetry/Metrics/MetricPoint.cs	`68.07% <56.94%> (-0.40%)`	⬇️

... and 48 files with indirect coverage changes

cijothomas · 2024-02-16T21:16:32Z

src/OpenTelemetry/Metrics/MetricPointOptionalComponents.cs

+    [MethodImpl(MethodImplOptions.AggressiveInlining)]
+    public void AcquireLock()
+    {
+        if (Interlocked.CompareExchange(ref this.isCriticalSectionOccupied, 1, 0) != 0)


lets add some code comments to help with intuition/future readers?
something like:

isCriticalSectionOccupied = 0,1 indicates if some thread is already occupied or not.
The CAS is to do the following atomically:

If the original value was 0 (i.e unoccupied), set it to 1 (i.e occupied) and return (successfully acquired lock)
If the original value was 1 (i.e already occupied), don't change it, but spin and keep trying indefinitely until we succeed.

cijothomas · 2024-02-16T21:46:52Z

@CodeBlanch The volatile switch seems unnecessary. The benchmark proves it is faster, iff the state was 0. But a thread never would release the lock, unless the state was 1...

utpilla · 2024-02-16T21:53:53Z

src/OpenTelemetry/Metrics/MetricPointOptionalComponents.cs

+    {
+        if (Interlocked.Exchange(ref this.isCriticalSectionOccupied, 1) != 0)
+        {
+            this.AcquireLockRare();


Would this really be rare? 😄

Define rare 😄

Let me explain why I split this method up. Before we had this:

var sw = default(SpinWait); while (Interlocked.Exchange(ref isCriticalSectionOccupied, 1) != 0) { sw.SpinOnce(); }

C# initializes all locals by default. Part of its safety system. So we pay for the initialization of sw even thought we might not use it. What the split does is make it optimistic essentially. If we can get the lock, we're done. Otherwise we jump into a more rare case where we pay for the SpinWait because we know we need it.

For the thread that got the lock, it saved a tiny bit of CPU so it can release the lock even faster. For threads that need to wait, we're going to spin them anyway so it seems OK to me to ask them to do an extra call/jump. Thoughts?

Define rare 😄

I think having apps where multiple threads update the same MetricPoint would be a common thing.

For the thread that got the lock, it saved a tiny bit of CPU so it can release the lock even faster.

I don't think splitting the acquire lock code would speed up the release. In the existing code, the thread that acquires the lock would have already created the SpinWait struct before acquiring the lock. So, nothing has changed in what happens after a thread takes the lock. Or would the speed up come from using if vs while?

Having said all that, I'm okay with this change to avoid creating SpinWait struct for the threads that need to wait.

CodeBlanch · 2024-02-16T22:03:02Z

@cijothomas

The volatile switch seems unnecessary. The benchmark proves it is faster, iff the state was 0. But a thread never would release the lock, unless the state was 1...

I think you are misreading the benchmark.

These are testing calls to Interlocked.Exchange and then either Volatile.Write or another Interlocked.Exchange to release the lock:

Method	LockState	Mean	Error	StdDev
ExchangeVolatile	0	3.950 ns	0.0382 ns	0.0339 ns
Exchange	0	7.181 ns	0.0737 ns	0.0576 ns

These are testing calls to Interlocked.Exchange and then no other calls are made because the lock was already taken:

Method	LockState	Mean	Error	StdDev
ExchangeVolatile	1	3.739 ns	0.0541 ns	0.0506 ns
Exchange	1	3.708 ns	0.0701 ns	0.0547 ns

utpilla · 2024-02-16T22:13:19Z

Use the isCriticalSectionOccupied on MetricPointOptionalComponents for both types of histograms instead of maintaining two additional flags (little less memory consumed).

I don't think this helps with less memory consumption. For any given MetricPoint, you either initialize HistogramBuckets or Base2ExponentialBucketHistogram. We should never initialize both.

Try to obtain the lock cheaply first before initializing SpinWait.

Does it really make locking cheap? Creating a simple default SpinWait struct shouldn't cause any considerably measurable impact.

Switch ReleaseLock from Interlocked.Exchange to Volatile.Write.

Volatile.Write comes with weaker guarantees than Interlocked.Exchange. Are you sure that our logic would not be affected by these weaker guarantees?

cijothomas · 2024-02-16T22:16:38Z

Use the isCriticalSectionOccupied on MetricPointOptionalComponents for both types of histograms instead of maintaining two additional flags (little less memory consumed).

I don't think this helps with less memory consumption. For any given MetricPoint, you either initialize HistogramBuckets or Base2ExponentialBucketHistogram. We should never initialize both.

This PR is removing locking field from histogram, and relying on the one on MetricPointOptionalComponents , since MetricPointOptionalComponents is already having that, and will always exist for any type of histogram.

CodeBlanch · 2024-02-16T22:18:35Z

I don't think this helps with less memory consumption. For any given MetricPoint, you either initialize HistogramBuckets or Base2ExponentialBucketHistogram. We should never initialize both.

If you are going to end up with either HistogramBuckets or Base2ExponentialBucketHistogram you already have a MetricPointOptionalComponents (which has a isCriticalSectionOccupied already). So the two you get are either MetricPointOptionalComponents + HistogramBuckets or MetricPointOptionalComponents + Base2ExponentialBucketHistogram. For histograms we'll save 4 bytes x number of metric points.

utpilla · 2024-02-16T22:20:43Z

This PR is removing locking field from histogram, and relying on the one on MetricPointOptionalComponents , since MetricPointOptionalComponents is already having that, and will always exist for any type of histogram.

If you are going to end up with either HistogramBuckets or Base2ExponentialBucketHistogram you already have a MetricPointOptionalComponents (which has a isCriticalSectionOccupied already). So the two you get are either MetricPointOptionalComponents + HistogramBuckets or MetricPointOptionalComponents + Base2ExponentialBucketHistogram. For histograms we'll save 4 bytes x number of metric points.

My bad! I missed the already existing IsCriticalSectionOccupied field on the MetricPointOptionalComponents class.

CodeBlanch · 2024-02-16T22:21:02Z

@utpilla

Volatile.Write comes with weaker guarantees than Interlocked.Exchange. Are you sure that our logic would not be affected by these weaker guarantees?

I think it is good but this always comes up 😄

From the docs: https://learn.microsoft.com/dotnet/api/system.threading.volatile

Volatile reads and writes ensure that a value is read or written to memory and not cached (for example, in a processor register). Thus, you can use these operations to synchronize access to a field that can be updated by another thread or by hardware.

That is really what we want.

Interlocked.Exchange gives you a read + write in an atomic operation. But for release the lock we don't need the read bit, only the write. I think that's why it is faster.

utpilla · 2024-02-17T00:00:32Z

From the docs: https://learn.microsoft.com/dotnet/api/system.threading.volatile

Volatile reads and writes ensure that a value is read or written to memory and not cached (for example, in a processor register). Thus, you can use these operations to synchronize access to a field that can be updated by another thread or by hardware.

That is really what we want.

Interlocked.Exchange gives you a read + write in an atomic operation. But for release the lock we don't need the read bit, only the write. I think that's why it is faster.

@CodeBlanch There are few other things mentioned in the same doc:

Reordering of memory operations

A volatile write operation prevents earlier memory operations on the thread from being reordered to occur after the volatile write. A volatile read operation prevents later memory operations on the thread from being reordered to occur before the volatile read.

Since we are using a volatile write, later memory operations on the thread could be reordered before the volatile write. We have to review our code to see if this reordering could affect the MetricPoint updates or snapshot. The reordering may or may not be an issue for us, but we need to carefully review that first. A concrete example of how the reoderings may or may not affect would probably help.

The second example on the doc says:

The volatile write to y does not guarantee that a following volatile read of y on a different processor will see the updated value.

This sounds problematic. In our case, this would be equivalent to having a thread release the lock but another thread on a different processor not finding the lock released. The doc doesn't clearly mention whether this would only be a temporary issue and another thread on a different processor would eventually see it updated.

…k-fixes

CodeBlanch · 2024-02-20T20:50:44Z

@utpilla

I put ReleaseLock back to Interlocked.Exchange.

Ran it by @stephentoub. If I understand him correctly Volatile.Write guarantees nothing will move out of the lock. So it is pretty safe to switch to Volatile.Write. But it doesn't guarantee other things won't move into the lock. If things move into the lock, it could be held longer. Since the goal here was to reduce the time we hold the lock, I decided to put it back so it is at least deterministic.

PS: Another interesting tidbit Stephen shared with me is the movement isn't a concern on x86/64 (due to its strong memory model) but could be a concern on ARM.

src/OpenTelemetry/Metrics/MetricPointOptionalComponents.cs

cijothomas

LGTM.

utpilla · 2024-02-20T21:24:47Z

From the docs: https://learn.microsoft.com/dotnet/api/system.threading.volatile

The second example on the doc says:

The volatile write to y does not guarantee that a following volatile read of y on a different processor will see the updated value.

@stephentoub The doc doesn't explicitly mention whether this updated value will eventually (in subsequent reads) be made available to another processor. Is it fair to assume that the unavailability of the updated value would only be a transient issue?

stephentoub · 2024-02-20T21:26:24Z

Is it fair to assume that the unavailability of the updated value would only be a transient issue?

Yes. And there's not really any difference in that regard between a volatile or interlocked write.

CodeBlanch · 2024-02-20T21:28:51Z

The doc is strange doesn't this seem contradictory?

stephentoub · 2024-02-20T21:34:21Z

doesn't this seem contradictory?

Contradictory how?

CodeBlanch · 2024-02-20T22:16:08Z

Contradictory how?

The content in the "Note" seems to say you won't see a cached value on some other processor.

stephentoub · 2024-02-20T22:19:36Z

Contradictory how?

The content in the "Note" seems to say you won't see a cached value on some other processor.

The "is read or written to memory and not cached" is from the point of view of the thread/core doing the reading/writing.

utpilla · 2024-02-20T23:02:32Z

@stephentoub A few follow-up questions:

Is it fair to assume that the unavailability of the updated value would only be a transient issue?

Yes. And there's not really any difference in that regard between a volatile or interlocked write.

Interlocked methods are supposed to perform atomic operations, right? So, they shouldn't lead to any intermediate states where one thread updates the value of a variable but another thread (that might or not be on the same processor) doesn't see the updated value?

The "is read or written to memory and not cached" is from the point of view of the thread/core doing the reading/writing.

The note seems contradictory because we have an assumption that if an update is written to the memory (instead of just the processor register), then it should be discoverable/aviablable for every other thread. Is that assumption wrong?

stephentoub · 2024-02-20T23:27:38Z

So, they shouldn't lead to any intermediate states where one thread updates the value of a variable but another thread (that might or not be on the same processor) doesn't see the updated value?

If both cores are using interlocked/atomic operations to mutate the same shared value, then they need to be coordinating, yes. That doesn't say anything, though, about one thread manipulating a memory location with an interlocked and another thread just reading that value... there's no guarantee that that other thread/core will see that value any sooner with it having been written with some atomic interlocked exchange than having been written with a volatile write. Imagine that other thread is just spinning in a while (*memoryLocation) { }... the value could have been enregistered so it doesn't see any changes, it might still be reading from a cache, the compiler may even have hoisted the read and changed the loop to be while (true).

then it should be discoverable/aviablable for every other thread

And what if that other thread is reading the value from a register?

utpilla · 2024-02-21T00:08:35Z

@stephentoub

So, they shouldn't lead to any intermediate states where one thread updates the value of a variable but another thread (that might or not be on the same processor) doesn't see the updated value?

If both cores are using interlocked/atomic operations to mutate the same shared value, then they need to be coordinating, yes. That doesn't say anything, though, about one thread manipulating a memory location with an interlocked and another thread just reading that value

then it should be discoverable/aviablable for every other thread

And what if that other thread is reading the value from a register?

Okay so it looks like both the questions that I asked are kind of pointing to the same scenario:

Let's say thread A writes to a variable using Interlocked or Volatile methods. This write is written to the memory. Now, if we have another thread B (may or may not be on the same processor that made the write) that is trying to read the data, it may or may not discover the latest update depending on how it reads the data:

If it reads the data using an Interlocked operation, then it should immediately see the update made by thread A
If it's a simple read without using any synchronization construct, then there are no guarantees whether thread B would see the update by thread A as it might always read the value from its processor register.
What if it's a volatile read, is it guaranteed that thread B should immediately see the update made by thread A?

I tried to summarize my understanding of this issue. Could you please confirm if my understanding of points 1 and 2 is correct? Also, could you answer what should be expected for point 3?

stephentoub · 2024-02-21T02:25:52Z

In general, there aren't guarantees being made about how quickly data becomes visible to other threads. The guarantees that are made are around the semantics of the operations. Volatile.Read/Write make guarantees about the order in which operations will be visible to other threads; they don't promise that the results of operations will be visible quickly or slowly, but that when they are visible, they'll be visible in the guaranteed order. Let's say you have this code using Volatile:

using System.Threading;

public class C
{
    private int _value;
    
    public int M1() => _value;
    public int M2() => Volatile.Read(ref _value);
}

Here's what SharpLab produces, on x64:

C.M1()
    L0000: mov eax, [rcx+8]
    L0003: ret

C.M2()
    L0000: mov eax, [rcx+8]
    L0003: ret

Note that they're identical. That's because the x64 memory model already prohibits the kinds of reorderings that volatile also prevents, so the JIT doesn't need to emit anything specific for volatile. And as such, there can't be any additional guarantees here about the speed at which data becomes visible to other threads, because if there were, such guarantees would be needed for all writes, or else these wouldn't be identical.

Interlocked.Exchange et al similarly don't make any guarantees about how quickly or slowly data will be visible, but they do make atomicity guarantees around the semantics of the operations (e.g. a new value will be written and the old one returned, atomically), and those guarantees mean synchronization is in play that necessitates certain behavior when interacting with operations being performed on other threads/cores, i.e. you might not see the updated value on another thread unless you perform an operation there where the semantics demand it and force the synchronization for that specific memory location / cache line.

So I'm struggling to answer the questions that are in terms of timing / immediacy.

utpilla · 2024-02-21T03:14:17Z

@stephentoub Thanks a lot for that explanation!

you might not see the updated value on another thread unless you perform an operation there where the semantics demand it and force the synchronization for that specific memory location / cache line.

What would these operations be whose semantics demand to see the updated value (by forcing the synchronization for the memory location/ cache line)? (Any of these operations: lock statement, Interlocked methods, or Volatile read?)

Clean up locking in MetricPoint.

c9cfe4e

CodeBlanch added the metrics Metrics signal related label Feb 16, 2024

CodeBlanch requested a review from a team February 16, 2024 20:13

CodeBlanch commented Feb 16, 2024

View reviewed changes

src/OpenTelemetry/Metrics/MetricPoint.cs Show resolved Hide resolved

cijothomas reviewed Feb 16, 2024

View reviewed changes

CodeBlanch added 2 commits February 16, 2024 13:21

Switch back to Interlocked.Exchange.

c4c12ec

Tweak.

4066122

utpilla reviewed Feb 16, 2024

View reviewed changes

CodeBlanch added 2 commits February 20, 2024 12:38

Merge remote-tracking branch 'upstream/main' into sdk-metricpoint-loc…

19b1234

…k-fixes

Switch back to Interlocked.Exchange in ReleaseLock.

6417d23

cijothomas reviewed Feb 20, 2024

View reviewed changes

src/OpenTelemetry/Metrics/MetricPointOptionalComponents.cs Show resolved Hide resolved

cijothomas approved these changes Feb 20, 2024

View reviewed changes

utpilla approved these changes Feb 20, 2024

View reviewed changes

Code review.

6ad92ae

Merge branch 'main' into sdk-metricpoint-lock-fixes

2530c2a

CodeBlanch merged commit 4ae0aaf into open-telemetry:main Feb 20, 2024
37 checks passed

CodeBlanch deleted the sdk-metricpoint-lock-fixes branch February 20, 2024 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sdk-metrics] Clean up locking in MetricPoint #5368

[sdk-metrics] Clean up locking in MetricPoint #5368

CodeBlanch commented Feb 16, 2024 •

edited

Loading

codecov bot commented Feb 16, 2024 •

edited

Loading

cijothomas Feb 16, 2024

cijothomas commented Feb 16, 2024

utpilla Feb 16, 2024

CodeBlanch Feb 16, 2024

utpilla Feb 17, 2024

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 16, 2024

cijothomas commented Feb 16, 2024

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 16, 2024

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 17, 2024

CodeBlanch commented Feb 20, 2024

cijothomas left a comment

utpilla commented Feb 20, 2024

stephentoub commented Feb 20, 2024

CodeBlanch commented Feb 20, 2024

stephentoub commented Feb 20, 2024

CodeBlanch commented Feb 20, 2024

stephentoub commented Feb 20, 2024

utpilla commented Feb 20, 2024

stephentoub commented Feb 20, 2024

utpilla commented Feb 21, 2024 •

edited

Loading

stephentoub commented Feb 21, 2024

utpilla commented Feb 21, 2024 •

edited

Loading

[sdk-metrics] Clean up locking in MetricPoint #5368

[sdk-metrics] Clean up locking in MetricPoint #5368

Conversation

CodeBlanch commented Feb 16, 2024 • edited Loading

Changes

Merge requirement checklist

codecov bot commented Feb 16, 2024 • edited Loading

Codecov Report

cijothomas Feb 16, 2024

Choose a reason for hiding this comment

cijothomas commented Feb 16, 2024

utpilla Feb 16, 2024

Choose a reason for hiding this comment

CodeBlanch Feb 16, 2024

Choose a reason for hiding this comment

utpilla Feb 17, 2024

Choose a reason for hiding this comment

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 16, 2024

cijothomas commented Feb 16, 2024

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 16, 2024

CodeBlanch commented Feb 16, 2024

utpilla commented Feb 17, 2024

CodeBlanch commented Feb 20, 2024

cijothomas left a comment

Choose a reason for hiding this comment

utpilla commented Feb 20, 2024

stephentoub commented Feb 20, 2024

CodeBlanch commented Feb 20, 2024

stephentoub commented Feb 20, 2024

CodeBlanch commented Feb 20, 2024

stephentoub commented Feb 20, 2024

utpilla commented Feb 20, 2024

stephentoub commented Feb 20, 2024

utpilla commented Feb 21, 2024 • edited Loading

stephentoub commented Feb 21, 2024

utpilla commented Feb 21, 2024 • edited Loading

CodeBlanch commented Feb 16, 2024 •

edited

Loading

codecov bot commented Feb 16, 2024 •

edited

Loading

utpilla commented Feb 21, 2024 •

edited

Loading

utpilla commented Feb 21, 2024 •

edited

Loading