-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sdk-metrics] Clean up locking in MetricPoint #5368
[sdk-metrics] Clean up locking in MetricPoint #5368
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #5368 +/- ##
==========================================
- Coverage 83.38% 83.35% -0.03%
==========================================
Files 297 277 -20
Lines 12531 11928 -603
==========================================
- Hits 10449 9943 -506
+ Misses 2082 1985 -97
Flags with carried forward coverage won't be shown. Click here to find out more.
|
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public void AcquireLock() | ||
{ | ||
if (Interlocked.CompareExchange(ref this.isCriticalSectionOccupied, 1, 0) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add some code comments to help with intuition/future readers?
something like:
isCriticalSectionOccupied = 0,1 indicates if some thread is already occupied or not.
The CAS is to do the following atomically:
If the original value was 0 (i.e unoccupied), set it to 1 (i.e occupied) and return (successfully acquired lock)
If the original value was 1 (i.e already occupied), don't change it, but spin and keep trying indefinitely until we succeed.
@CodeBlanch The volatile switch seems unnecessary. The benchmark proves it is faster, iff the state was 0. But a thread never would release the lock, unless the state was 1... |
{ | ||
if (Interlocked.Exchange(ref this.isCriticalSectionOccupied, 1) != 0) | ||
{ | ||
this.AcquireLockRare(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this really be rare? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define rare 😄
Let me explain why I split this method up. Before we had this:
var sw = default(SpinWait);
while (Interlocked.Exchange(ref isCriticalSectionOccupied, 1) != 0)
{
sw.SpinOnce();
}
C# initializes all locals by default. Part of its safety system. So we pay for the initialization of sw
even thought we might not use it. What the split does is make it optimistic essentially. If we can get the lock, we're done. Otherwise we jump into a more rare case where we pay for the SpinWait
because we know we need it.
For the thread that got the lock, it saved a tiny bit of CPU so it can release the lock even faster. For threads that need to wait, we're going to spin them anyway so it seems OK to me to ask them to do an extra call/jump. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define rare 😄
I think having apps where multiple threads update the same MetricPoint would be a common thing.
For the thread that got the lock, it saved a tiny bit of CPU so it can release the lock even faster.
I don't think splitting the acquire lock code would speed up the release. In the existing code, the thread that acquires the lock would have already created the SpinWait
struct before acquiring the lock. So, nothing has changed in what happens after a thread takes the lock. Or would the speed up come from using if
vs while
?
Having said all that, I'm okay with this change to avoid creating SpinWait
struct for the threads that need to wait.
I think you are misreading the benchmark. These are testing calls to
These are testing calls to
|
I don't think this helps with less memory consumption. For any given MetricPoint, you either initialize
Does it really make locking cheap? Creating a simple default SpinWait struct shouldn't cause any considerably measurable impact.
|
This PR is removing locking field from histogram, and relying on the one on MetricPointOptionalComponents , since MetricPointOptionalComponents is already having that, and will always exist for any type of histogram. |
If you are going to end up with either |
My bad! I missed the already existing |
I think it is good but this always comes up 😄 From the docs: https://learn.microsoft.com/dotnet/api/system.threading.volatile
That is really what we want.
|
@CodeBlanch There are few other things mentioned in the same doc:
Since we are using a volatile write, later memory operations on the thread could be reordered before the volatile write. We have to review our code to see if this reordering could affect the MetricPoint updates or snapshot. The reordering may or may not be an issue for us, but we need to carefully review that first. A concrete example of how the reoderings may or may not affect would probably help.
This sounds problematic. In our case, this would be equivalent to having a thread release the lock but another thread on a different processor not finding the lock released. The doc doesn't clearly mention whether this would only be a temporary issue and another thread on a different processor would eventually see it updated. |
I put Ran it by @stephentoub. If I understand him correctly PS: Another interesting tidbit Stephen shared with me is the movement isn't a concern on x86/64 (due to its strong memory model) but could be a concern on ARM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@stephentoub The doc doesn't explicitly mention whether this updated value will eventually (in subsequent reads) be made available to another processor. Is it fair to assume that the unavailability of the updated value would only be a transient issue? |
Yes. And there's not really any difference in that regard between a volatile or interlocked write. |
Contradictory how? |
The content in the "Note" seems to say you won't see a cached value on some other processor. |
The "is read or written to memory and not cached" is from the point of view of the thread/core doing the reading/writing. |
@stephentoub A few follow-up questions:
Interlocked methods are supposed to perform atomic operations, right? So, they shouldn't lead to any intermediate states where one thread updates the value of a variable but another thread (that might or not be on the same processor) doesn't see the updated value?
The note seems contradictory because we have an assumption that if an update is written to the memory (instead of just the processor register), then it should be discoverable/aviablable for every other thread. Is that assumption wrong? |
If both cores are using interlocked/atomic operations to mutate the same shared value, then they need to be coordinating, yes. That doesn't say anything, though, about one thread manipulating a memory location with an interlocked and another thread just reading that value... there's no guarantee that that other thread/core will see that value any sooner with it having been written with some atomic interlocked exchange than having been written with a volatile write. Imagine that other thread is just spinning in a
And what if that other thread is reading the value from a register? |
Okay so it looks like both the questions that I asked are kind of pointing to the same scenario: Let's say thread A writes to a variable using Interlocked or Volatile methods. This write is written to the memory. Now, if we have another thread B (may or may not be on the same processor that made the write) that is trying to read the data, it may or may not discover the latest update depending on how it reads the data:
I tried to summarize my understanding of this issue. Could you please confirm if my understanding of points 1 and 2 is correct? Also, could you answer what should be expected for point 3? |
In general, there aren't guarantees being made about how quickly data becomes visible to other threads. The guarantees that are made are around the semantics of the operations. Volatile.Read/Write make guarantees about the order in which operations will be visible to other threads; they don't promise that the results of operations will be visible quickly or slowly, but that when they are visible, they'll be visible in the guaranteed order. Let's say you have this code using Volatile: using System.Threading;
public class C
{
private int _value;
public int M1() => _value;
public int M2() => Volatile.Read(ref _value);
} Here's what SharpLab produces, on x64:
Note that they're identical. That's because the x64 memory model already prohibits the kinds of reorderings that volatile also prevents, so the JIT doesn't need to emit anything specific for volatile. And as such, there can't be any additional guarantees here about the speed at which data becomes visible to other threads, because if there were, such guarantees would be needed for all writes, or else these wouldn't be identical. Interlocked.Exchange et al similarly don't make any guarantees about how quickly or slowly data will be visible, but they do make atomicity guarantees around the semantics of the operations (e.g. a new value will be written and the old one returned, atomically), and those guarantees mean synchronization is in play that necessitates certain behavior when interacting with operations being performed on other threads/cores, i.e. you might not see the updated value on another thread unless you perform an operation there where the semantics demand it and force the synchronization for that specific memory location / cache line. So I'm struggling to answer the questions that are in terms of timing / immediacy. |
@stephentoub Thanks a lot for that explanation!
What would these operations be whose semantics demand to see the updated value (by forcing the synchronization for the memory location/ cache line)? (Any of these operations: lock statement, Interlocked methods, or Volatile read?) |
Changes
isCriticalSectionOccupied
onMetricPointOptionalComponents
for both types of histograms instead of maintaining two additional flags (little less memory consumed).SpinWait
.Merge requirement checklist