Where is InterlockedRead? - multithreading

Where is InterlockedRead? - multithreading

Win32 api has a set of InterlockedXXX functions to atomically and synchronously manipulate simple variables, however there doesn't seem to be any InterlockedRead function, to simply retrive the value of the variable. How come?
MSDN says that:
Simple reads and writes to properly-aligned 32-bit variables are atomic operations
but adds:
However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation.
Which means, as I understand it, that a simple read operation of a variable can take place while another, say, InterlockedAdd operation is in place. So why isn't there an interlocked function to read a variable?
I guess the value can be read as the result InterlockedAdd-ing zero, but that doesn't seem the right way to go.

The normal way of implementing this is to use a compare-exchange operation (e.g. InterlockedCompareExchange64) where both values are the same. I have a sneaking suspicion this can be performed more efficiently than an add of 0 for some reason, but I have no evidence to back this up.
Interestingly, .NET's Interlocked class didn't gain a Read method until .NET 2.0. I believe that Interlocked.Read is implemented using Interlocked.CompareExchange. (Note that the documentation for Interlocked.Read strikes me as somewhat misleading - it talks about atomicity, but not volatility, which means something very specific on .NET. I'm not sure what the Win32 memory model guarantees about visibility of newly written values from a different thread, if anything.)

I think that your interpretation of "not synchronized" is wrong. Simple reads are atomic, but you have to take care of reordering and memory visibility issues yourself. The former is handled by using fence instructions at appropriate places, the latter is a non-issue with read (but a potential concurrent write has to ensure proper visibility, which Interlocked functions should do if they map to LOCKED asm instructions).

The crux of this whole discussion is proper alignment, which is devined in Partition I of xxx, in section '12.6.2 Alignment':
Built-in datatypes shall be properly aligned, which is defined as follows:
• 1-byte, 2-byte, and 4-byte data is properly aligned when it is stored at
a 1-byte, 2-byte, or 4-byte boundary, respectively.
• 8-byte data is properly aligned when it is stored on the same boundary
required by the underlying hardware for atomic access to a native int.
Basically, all 32-bit values have the required alignment, and on a 64-bit platform, 64-bit values also have the required alignment.
Note though: There are attributes to explicitly alter the layout of classes in memory, which may cause you to lose this alignment. These are attributes specificially for this purpose though, so unless you have set out to alter the layout, this should not apply to you.
With that out of the way, the purpose of the Interlocked class is to provide operations that (to paraphrase) can only be observed in their 'before' or 'after' state. Interlocked operations are normally only of concern when modifying memory (typically in some non-trivial compare-exchange type way). As the MSDN article you found indicates, read operations (when properly aligned) can be considered atomic at all times without further precautions.
There are however other considerations when dealing with read operations:
On modern CPUs, although the read may be atomic, it also may return the wrong value from a stale cache somewhere... this is where you may need to make the field 'volatile' to get the behaviour you expect
If you are dealing with a 64-bit value on 32-bit hardware, you may need to use the Interlocked.Read operation to guarantee the whole 64-bit value is read in a single atomic operation (otherwise it may be performed as 2 separate 32-bit reads which can be from either side of a memory update)
Re-ordering of your reads / writes may cause you to not get the value you expected; in which case some memory barrier may be needed (either explicit, or through the use of the Interlocked class operations)
Short summary; as far as atomicity goes, it is very likely that what you are doing does not need any special instruction for the read... there may however be other things you need to be careful of, depending on what exactly you are doing.

Related

Does a variable only read by one thread, read and written by another, need synchronization?

Motive:
I am just learning the fundamentals of multithreading, not close to finishing them, but I'd like to ask a question this early in my learning journey to guide me toward the topics most relevant to my project I 'm working on.
Main:
a. If a process has two threads, one that edits a set of variables, the other only reads said variables and never edits their values; Then do we need any sort of synchronization for guaranteeing the validity of the read values by the reading thread?
b. Is it possible for the OS scheduling these two threads to cause the reading-thread to read a variable in a memory location in the exact same moment while the writing-thread is writing into the same memory location, or that's just a hardware/bus situation will never be allowed happen and a software designer should never care about that? What if the variable is a large struct instead of a little int or char?

a. If a process has two threads, one that edits a set of variables, the other only reads said variables and never edits their values; Then do we need any sort of synchronization for guaranteeing the validity of the read values by the reading thread?
In general, yes. Otherwise, the thread editing the value could change the value only locally so that the other thread will never see the value change. This can happens because of compilers (that could use registers to read/store variables) but also because of the hardware (regarding the cache coherence mechanism used on the target platform). Generally, locks, atomic variables and memory barriers are used to perform such synchronizations.
b. Is it possible for the OS scheduling these two threads to cause the reading-thread to read a variable in a memory location in the exact same moment while the writing-thread is writing into the same memory location, or that's just a hardware/bus situation will never be allowed happen and a software designer should never care about that? What if the variable is a large struct instead of a little int or char?
In general, there is no guarantee that accesses are done atomically. Theoretically, two cores executing each one a thread can load/store the same variable at the same time (but often not in practice). It is very dependent of the target platform.
For processor having (coherent) caches (ie. all modern mainstream processors) cache lines (ie. chunks of typically 64 or 128 bytes) have a huge impact on the implicit synchronization between threads. This is a complex topic, but you can first read more about cache coherence in order to understand how the memory hierarchy works on modern platforms.
The cache coherence protocol prevent two load/store being done exactly at the same time in the same cache line. If the variable cross multiple cache lines, then there is no protection.
On widespread x86/x86-64 platforms, variables having primitive types of <= 8 bytes can be modified atomically (because the bus support that as well as the DRAM and the cache) assuming the address is correctly aligned (it does not cross cache lines). However, this does not means all such accesses are atomic. You need to specify this to the compiler/interpreter/etc. so it produces/executes the correct instructions. Note that there is also an extension for 16-bytes atomics. There is also an instruction set extension for the support of transactional memory. For wider types (or possibly composite ones) you likely need a lock or an atomic state to control the atomicity of the access to the target variable.

Why are loads required when using atomics

When using atomics in Go (and other languages like c++) its advised to use an atomic load operation for reading a concurrently written value.
If the definition (as I understand it) of an atomic write (be it a store or an integer increment) is that no thread can view a partial write, why is an atomic load required?
Would a plain load of the memory address always be safe from a torn view, if only atomic stores are used on that memory address?

This answer is mainly for C and C++ as I am not directly familiar with atomics in many other languages, but I suspect they are similar.
It's true that many actual machines work this way, in some cases. For instance, on x86-64, ordinary load instructions are atomic with respect to ordinary stores or locked read-modify-write instructions. So for types that can be loaded with a single instruction, you could in principle use ordinary assignment and avoid tearing.
But there are cases where this would not work. For instance:
Types which are not lock-free (e.g. structs of more than a couple words). In this case, several instructions are needed to load or store, and so a lock must be taken around them, or tearing is entirely possible. The atomic load function knows to take the lock, an ordinary assignment wouldn't.
Types which can be lock-free but need special handling. For example, 64-bit long long int on x86-32. An ordinary load would execute two 32-bit integer load instructions (which are individually atomic), and so even if the store is atomic, it could happen in between. But the atomic load function can emit a 64-bit floating point or SIMD load, which is less efficient but does it in one atomic instruction. Example on godbolt.
As such, the language promises atomicity only when the store and load both use the provided atomic functions. - your "definition" is not accurate for C or C++. By requiring the programmer to always use an atomic load, the language provides a "hook" where implementations can take appropriate action if needed. In cases where an ordinary load would suffice, the implementation can optimize accordingly and nothing is lost.
Another point is that the atomic load provides a place to put a memory barrier when one is wanted (any ordering except relaxed). Some architectures include load instructions with a built-in barrier (e.g. ARM64's ldar), and making the barrier part of the load at the language level makes it easier for the compiler to take advantage of this. If you had to do a regular assignment followed by a call to a barrier function, it would be harder for the compiler to figure out that it could optimize them into ldar.

Does all shared variables needs to be atomic?

I have started working to learn multicore programming. I started leaning c++11 atomics. I would like to know if all he shared variables needs to be atomic?

The only time a variable needs to be "atomic", i.e., can be updated in "one fell swoop, without any other thread being able to read it meanwhile", is if it even can be read while someone else is updating it. For instance, if it's set on initialization and then never changes, no one can ever read it while it's changing, and so it doesn't have to be atomic. On the other hand, if it ever changes after initialization and there's a risk that anyone else than the changing thread reads it while it's changing, then it needs to be atomic (atomic intrinsics or protected by mutex or otherwise)

If multiple thread accessing (read/write) the same variable then it should be atomic.
Also you can go though this.

Not necessarily for all scenarios. Also, note that atomicity of variable access alone does not guarantee full thread safety. It just ensures that the particular variable being read is got as a whole.In some architectures, the read operation does not happen in single assembly instruction. For example, if you are reading 64 bit value, the compiler might implement the read using two load instructions of assembly such that the 1st instruction reads the lower 32 bits and the 2nd instruction reads the higher 32 bits. This in turn can lead to race condition. So, atomic reads are preferred.

Usage of registers by the compiler in multithreaded program

It is a general question but:
In a multithreaded program, is it safe for the compiler to use registers to temporarily store global variables?
I think its not, since storing global variables in registers may change saved values
for other threads.
And how about using registers to store local variables defined within a function?
I think it is ok,since no other thread will be able to get these variables.
Please correct me if im wrong.
Thank you!

Things are much more complicated than you think they are.
Even if the compiler stores a value to memory, the CPU generally does not immediately push the data out to RAM. It stores it in a cache (and some systems have 2 or 3 levels of caches between the processor and the memory).
To make things worse, the order of instructions that the compiler decides, may not be what actually gets executed as many processors can reorder instructions (and even sub-parts of instructions) in their own pipelines.
In general, in a multithreaded environment you should personally take care to never access (either read or write) the same memory from two separate threads unless one of the following is true:
you are using one of several special atomic operations that ensure proper synchronization.
you have used one of several synchronization operations to "reserve" access to shared data and then to "relinquish" it. These do include the required memory barriers that also guarantee the data is what it's supposed to be.
You may want to read http://en.wikipedia.org/wiki/Memory_ordering#Memory_barrier_types and http://en.wikipedia.org/wiki/Memory_barrier
If you are ready for a little headache and want to see how complicated things can actually get, here is your evening lecture Memory Barriers: a Hardware View for Software Hackers.

'Safe' is not really the right word to use. Many higher level languages (eg. C) do not have a threading model and so the language specification says nothing about mutli-threaded interactions.
If you are not using any kind of locking primitives then you have no guarantees what so ever about how the different threads interact. So the compiler is within its rights to use registers for global variables.
Even if you are using locking the behaviour can still be tricky: if you read a variable, then grab a lock and then read the variable again the compiler still has no way of knowing if it has to read the variable from memory again, or can use the earlier value it stored in a register.
In C/C++ declaring a variable as volatile will force the compiler to always reload the variable from memory and solve this particular instance.
There are also 'Interlocked*' primitives on most systems that have guaranteed atomicity semantics which can be used to ensure certain operations are threadsafe. Locking primitives are typically built on these low level operations.

In a multithreaded program, you have one of two cases: if it's running on a uniprocessor (single core, single CPU), then switching between threads is handled like switching between processes (although it's not quite as much work since the threads operate in the same virtual memory space) - all registers of one thread are saved during the transition to another thread, so using registers for whatever purpose is fine. This is the job of the context switch routines that the OS uses, and the register set is considered part of a threads (or processes) context. If you have a multiprocessor system - either multiple CPUs or multiple cores on a single CPU - each processor has its own distinct set of registers, so again, using registers for storing things is fine. On top of that, of course, context switching on a particular CPU will save the registers of the old thread/process before switching to the new one, so everything is preserved.
That said, on some architectures and/or with some OSes, there might be specific exceptions to that, because certain registers are reserved by the ABI for specific uses by the OS or by the libraries that provide an interface to the OS, but your compiler(s) generally have that type of knowledge of your platform built in. You need to be aware of them, though, if you're doing inline assembly or certain other "low-level" things...

Can threads safely read variables set by VCL events?

Is it safe for a thread to READ a variable set by a Delphi VCL event?
When a user clicks on a VCL TCheckbox, the main thread sets a boolean to the checkbox's Checked state.
CheckboxState := CheckBox1.Checked;
At any time, a thread reads that variable
if CheckBoxState then ...
It doesn't matter if the thread "misses" a change to the boolean, because the thread checks the variable in a loop as it does other things. So it will see the state change eventually...
Is this safe? Or do I need special code? Is surrounding the read and write of the variable (in the thread and main thread respectively) with critical code calls necessary and sufficient?
As I said, it doesn't matter if the thread gets the "wrong" value, but I keep thinking that there might be a low-level problem if one thread tries to read a variable while the main thread is in the middle of writing it, or vice versa.
My question is similar to this one: Cross thread reading of a variable who's value is not considered important.
(Also related to my previous question: Using EnterCriticalSection in Thread to update VCL label)

This is safe, for three reasons:
Only one thread writes to the variable.
The variable is only one byte, so there is no way to read an inconsistent value. It will be read either as True or as False. There can't be alignment issues with Delphi boolean values.
The Delphi compiler does no extensive checks whether a variable is actually written to, and does not "optimize" away any code if not. Non-local variables will always be read, there is no need for the volatile specifier.
Having said that, if you are really unsure about this you could use an integer value instead of the boolean, and use the InterlockedExchange() function to write to the variable. This is overkill here, but it's a good technique to know about, because for single machine word sized values it may eliminate the need for locks.
You can also replace the boolean by a proper synchronization primitive, like an event, and have the thread block on that - this would help you eliminate busy loops in the thread.

In your case (Checked property) a read operation is atomic, so it is safe. That is the same as with TThread.Terminated property; the simple read and write operations for the properly aligned bytes, words and doublewords are atomic. You can check intel documentation for more information:
CHAPTER 8 - MULTIPLE-PROCESSOR MANAGEMENT
8.1.1 Guaranteed Atomic Operations
The Intel486 processor (and newer processors since) guarantees that the following
basic memory operations will always be carried out atomically:
Reading or writing a byte
Reading or writing a word aligned on a 16-bit boundary
Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer processors since) guarantees that the following
additional memory operations will always be carried out atomically:
Reading or writing a quadword aligned on a 64-bit boundary
16-bit accesses to uncached memory locations that fit within a 32-bit data bus

For the example you give, it will be safe. Technically the main issue is in cases where your variable goes over a machine word in size, and thus you might get the high word and long word not synchronized. For small values though, this is not an issue.
If you think it is a potential problem, for example using a pointer, then the thing to use is a TCriticalSection to control read and writes to the item. This is fast enough for all practical situations, and ensures you are 100% safe.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string