Memory barrier generators
Here is my take on the subject and to attempt to provide a quasi-complete list in one answer. If I run across any others I will edit my answer from time to time.
Mechanisms that are generally agreed upon to cause implicit barriers:
- All
Monitor
class methods including the C# keywordlock
- All
Interlocked
class methods. - All
Volatile
class methods (.NET 4.5+). - Most
SpinLock
methods includingEnter
andExit
. Thread.Join
Thread.VolatileRead
andThread.VolatileWrite
Thread.MemoryBarrier
- The
volatile
keyword. - Anything that starts a thread or causes a delegate to execute on another thread including
QueueUserWorkItem
,Task.Factory.StartNew
,Thread.Start
, compiler suppliedBeginInvoke
methods, etc. - Using a signaling mechanism such as
ManualResetEvent
,AutoResetEvent
,CountdownEvent
,Semaphore
,Barrier
, etc. - Using marshaling operations such as
Control.Invoke
,Dispatcher.Invoke
,SynchronizationContext.Post
, etc.
Mechanisms that are speculated (but not known for certain) to cause implicit barriers:
Thread.Sleep
(proposed by myself and possibly others due to the fact that code which exhibits a memory barrier problem can be fixed with this method)Thread.Yield
Thread.SpinWait
Lazy<T>
depending on whichLazyThreadSafetyMode
is specified
Other notable mentions:
- Default add and remove handlers for events in C# since they use
lock
orInterlocked.CompareExchange
. - x86 stores have release fence semantics
- Microsoft's implemenation of the CLI has release fence semantics on writes despite the fact that the ECMA specification does not mandate it.
MarshalByRefObject
seems to suppress certain optimizations in subclasses which may make it appear as if an implicit memory barrier were present. Thanks to Hans Passant for discovering this and bringing it to my attention.1
1This explains why BackgroundWorker
works correctly without having volatile
on the underlying field for the CancellationPending
property.
Memory barrier on single core ARM
Why would a context switch on a single core behave differently compared to 2 threads on different cores ? (except any cache coherency issues)
The threads on separate cores may act at exactly the same time. You still have issues on a single core.
Somewhere here on Stackoverflow is also stated that memory barriers are not required on single core processors.
This information maybe taken out of context (or not provide enough context).
Wikipedia's Memory barrier and Memory ordering pages have sections Out-of-order execution versus compiler reordering optimizations and Compile time/Run time ordering. There are many places in a pipeline where the ordering of memory may matter. In some cases, this may be taken care of by the compiler, by the OS, or by our own code.
Compiler memory barriers apply to a single CPU. They are especially useful with hardware where the ordering and timing of writes and reads matter.
Linux defines some more types of memory barriers,
- Write/Store.
- Data dependency.
- Read/Load.
- General memory barriers.
Mainly these map fairly well to DMB
(DSB
and IMB
are more for code modification).
The more advances ARM CPUs have multiple load/store units. In theory some non-preemptive threading switch Note1 (especially with aliased memory) could cause some issue with a multi-threaded single CPU application. However, it would be fairly hard to construct this case.
For the most part, good memory ordering is handled by the CPU by scheduling instructions. A common case where it does matter with a single CPU is for system level programmers altering CP15
registers. For instance, an ISB
should be issued when turning on the MMU. The same may be true for certain hardware/device registers. Finally, a program loader will need barriers as well as cache operations, even on single CPU systems.
UnixSmurf wrote these blogs on memory access ordering,
- Intro
- Barriers and the Linux kernel
- Memory access and the ARM architecture
The topic is complex and you have to be specific about the types of barriers you are discussing.
Note1: I say non preemptive as if an interrupt occurs, the single CPU will probably ensure that all outstanding memory requests are complete. With a non preemptive switch, you do something like longjmp
to change threads. In theory, you could change contexts before all writes had completed. The system would only need a DMB
in the yield()
to avoid it.
Is function call an effective memory barrier for modern platforms?
Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.
If your shared data variable is declared as volatile
, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile
, nor what it was meant for.
(If I had any votes left, I'd +1 your question for the narration.)
Thread safe usage of lock helpers (concerning memory barriers)
No, you do not need to do anything special to guarentee that memory barriers are created. This is because almost any mechanism used to get a method executing on another thread produces a release-fence barrier on the calling thread and an aquire-fence barrier on the worker thread (actually they may be full fence barriers). So either QueueUserWorkItem
or Thread.Start
will automatically insert the necessary barriers. Your code is safe.
Also, as a matter of tangential interest Thread.Sleep
also generates a memory barrier. This is interesting because some people naively use Thread.Sleep
to simulate thread interleaving. If this strategy were used to troubleshoot low-lock code then it could very well mask the problem you were trying to find.
Explanation of Thread.MemoryBarrier() Bug with OoOP
It doesn't fix any issues. It's a fake fix, rather dangerous in production code, as it may work, or it may not work.
The core problem is in this line
static bool stop = false;
The variable that stops a while
loop is not volatile. Which means it may or may not be read from memory all the time. It can be cached, so that only the last read value is presented to a system (which may not be the actual current value).
This code
// Thread.MemoryBarrier() or Console.WriteLine() fixes issue
May or may not fix an issue on different platforms. Memory barrier or console write just happen to force application to read fresh values on a particular system. It may not be the same elsewhere.
Additionally, volatile
and Thread.MemoryBarrier()
only provide weak guarantees, which means they don't provide 100% assurance that a read value will always be the latest on all systems and CPUs.
Eric Lippert says
The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Certain operations such as creating a new thread, entering a lock, or
using one of the Interlocked family of methods introduce stronger
guarantees about observation of ordering. If you want more details,
read sections 3.10 and 10.5.3 of the C# 4.0 specification.
Why does multithreaded code using CancellationTokenSource.Cancel require less anti-reordering measures
As it was noted by the author of The Old New Thing
in his comment, source.Cancel();
instruction placed in multithreaded code is protected from reordering by means of its internal implementation.
https://referencesource.microsoft.com/#mscorlib/system/threading/CancellationTokenSource.cs,723 states that CancellationTokenSource relies upon Interlocked class methods.
According to Joe Albahari, all methods on the Interlocked class in C# implicitly generate full fences: http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
So one can freely place a call to CancellationTokenSource.Cancel method inside a delegate body without an additional lock or memory barrier if they need to protect it while accessed by multiple tasks.
Related Topics
Why Doesn't .Net/C# Optimize for Tail-Call Recursion
Casting Interfaces for Deserialization in JSON.Net
Checking for Directory and File Write Permissions in .Net
How to Change a Private Readonly Field in C# Using Reflection
Run Async Method Regularly with Specified Interval
Resolving Instances with ASP.NET Core Di from Within Configureservices
How to Create a Simple Proxy in C#
Detect Windows Version in .Net
How to Get Object Size in Memory
Using String Format to Show Decimal Up to 2 Places or Simple Integer
How to Programmatically Generate Keypress Events in C#
Launching an Application (.Exe) from C#
Reading Email Using Pop3 in C#
Why Are C# 3.0 Object Initializer Constructor Parentheses Optional
Entity Framework Async Operation Takes Ten Times as Long to Complete
Create Instance of Generic Type Whose Constructor Requires a Parameter