Atomic Operation in Linux Kernel - linux

typedef struct { int counter; } atomic_t;
what does atomic_t means? HOW does compiled treats it? Historically, counter has been declared volatile, which implied it's a CPU register right?

The reason it is declared as a struct like that is so that the programmer using it is forced (gently reminded, rather) to use the access functions to manipulate it. For example, aval = 27 would not compile. Neither would aval++.
The volatile keyword has always meant the opposite of a CPU register: it means a value that has to be read from and written to memory directly.
If counter was historically volatile it was wrong because volatile has never been good enough on its own to ensure proper atomic updates. I believe that the current atomic manipulator functions use a cast through a volatile pointer combined with the appropriate write barrier functions, and machine code for some operations that the compiler cannot do properly.

atomic_t indicates it's an atomic type. Compiler will treats it as typedefed struct. I don't know what history says, but volatile is usually used to skip compiler optimizations and it doesn't imply CPU register.
Well, as it's name implies, all of it's operation is atomic i.e done at once, can't be scheduled out. atomic_t types have few helpers (like atomic_{inc,dec}, atomic_or and many) for manipulating any atomic type data. During manipulation of an atomic type, helpers usually inserts bus lock, as if they're not interrupted and make the whole thing atomic.

Related

Why use an AtomicU32 in Rust, given that U32 already implements Sync?

The std::sync::atomic module contains a number of atomic variants of primitive types, with the stated purpose that these types are now thread-safe. However, all the primatives that correspond to the atomic types already implement Send and Sync, and should therefore already be thread-safe. What's the reasoning behind the Atomic types?
Generally, non-atomic integers are safe to share across threads because they're immutable. If you attempt to modify the value, you implicitly create a new one in most cases because they're Copy. However, it isn't safe to share a mutable reference to a u32 across threads (or have both mutable and immutable references to the same value), which practically means that you won't be able to modify the variable and have another thread see the results. An atomic type has some additional behavior which makes it safe.
In the more general case, using non-atomic operations doesn't guarantee that a change made in one thread will be visible in another. Many architectures, especially RISC architectures, do not guarantee that behavior without additional instructions.
In addition, compilers often reorder accesses to memory in functions and in some cases, across functions, and an atomic type with an appropriate barrier is required to indicate to the compiler that such behavior is not wanted.
Finally, atomic operations are often required to logically update the contents of a variable. For example, I may want to atomically add 1 to a variable. On a load-store architecture such as ARM, I cannot modify the contents of memory with an add instruction; I can only perform arithmetic on registers. Consequently, an atomic add is multiple instructions, usually consisting of a load-linked, which loads a memory location, the add operation on the register, and then a store-conditional, which stores the value if the memory location has not changed. There's also a loop to retry if it has.
These are why atomic operations are needed and generally useful across languages. So while one can use non-atomic operations in non-Rust languages, they don't generally produce useful results, and since one typically wants one's code to function correctly, atomic operations are desirable for correctness. Rust's atomic types guarantee this behavior by generating suitable instructions and therefore can be safely shared across threads.

Shared variables in OpenMP

I have a very basic question (maybe stupid) regarding shared variables in OpenMP. Consider the following code:
void main()
{
int numthreads;
#pragma omp parallel default(none) shared(numthreads)
{
numthreads = omp_get_num_threads();
printf("%d\n",numthreads);
}
}
Now the value of numthreads is the same for all threads. is there a possibility that since various threads are writing the same value to the same variable, the value might get garbled/mangled ? Or is this operation on a primitive datatype guaranteed to be atomic ?
As per the standard, this is not safe:
A single access to a variable may be implemented with multiple load or store instructions, and
hence is not guaranteed to be atomic with respect to other accesses to the same variable.
[...]
If multiple threads write without synchronization to the same memory unit, including cases due to
atomicity considerations as described above, then a data race occurs. [...] If a data race occurs then the result of the program is unspecified.
I strongly recommend reading 1.4.1 Structure of the OpenMP Memory Model. While it's not the easiest read, it's very specific and quite clear. By far better than I could describe it here.
Two things need to be considered about shared variables in OpenMP: atomicity of access and the temporary view of memory.

What is the use-case of atomic read

I understand that atomic read serializes the read operations that performed by multiple threads.
What I don't understand is what is the use case?
More interestingly, I've found some implementation of atomic read which is
static inline int32_t ASMAtomicRead32(volatile int32_t *pi32)
{
return *pi32;
}
Where the only distinction to regular read is volatile. Does it mean that atomic read is the same as volatile read?
I understand that atomic read serializes the read operations that performed by multiple threads.
It's rather wrong. How you can ensure the order of reads if there is no write which stores a different value? Even when you have both read and write, it's not necessarily serialized unless correct memory semantics is used in conjunction with both the read & write operations, e.g. 'store-with-release' and 'load-with-acquire'. In your particular example, the memory semantics is relaxed. Though on x86, one can imply acquire semantics for each load and release for each store (unless non-temporal stores are used).
What I don't understand is what is the use case?
atomic reads must ensure that the data is read in one shot and no other thread can store a part of the data in the between. Thus it usually ensures the alignment of the atomic variable (since the read of aligned machine word is atomic) or work-arounds non-aligned cases using more heavy instructions. And finally, it ensures that the read is not optimized out by the compiler nor reordered across other operations in this thread (according to the memory semantics).
Does it mean that atomic read is the same as volatile read?
In a few words, volatile was not intended for such a use-case but sometimes can be abused for it when other requirements are met as well. For your example, my analysis is the following:
int32_t is likely a machine word or less - ok.
usually, everything is aligned at least on 4 bytes boundary, though there is no guarantee in your example
volatile ensures the read is not optimized out
the is no guarantee it will not be reordered either by processor (ok for x86) or by compiler (bad)
Please refer to Arch's blog and Concurrency: Atomic and volatile in C++11 memory model for the details.

Does atomic_cmpxchg() imply memory barriers?

The following two citations seem contradicting:
https://www.kernel.org/doc/Documentation/atomic_ops.txt
int atomic_cmpxchg(atomic_t *v, int old, int new);
This performs an atomic compare exchange operation on the atomic value
v, with the given old and new values. Like all atomic_xxx operations,
atomic_cmpxchg will only satisfy its atomicity semantics as long as
all other accesses of *v are performed through atomic_xxx operations.
atomic_cmpxchg requires explicit memory barriers around the operation.
vs
https://www.kernel.org/doc/Documentation/memory-barriers.txt
Any atomic operation that modifies some state in memory and returns
information about the state (old or new) implies an SMP-conditional
general memory barrier (smp_mb()) on each side of the actual operation
(with the exception of explicit lock operations, described later).
These include:
<...>
atomic_xchg();
atomic_cmpxchg();
<...>
These are used for such things as implementing LOCK-class and UNLOCK-class operations and adjusting reference
counters towards object destruction, and as such the implicit memory
barrier effects are necessary.
So should one put memory barriers around atomic_xchg() manually?
I'm not aware yet about Linux kernel programming specifics, so here is a partial (general) answer.
On x86, this operation carries full memory fence with it, there is no need in mfence/lfence/sfence around cmpxchg op.
On other architectures with relaxed memory model, it can be coupled with other memory semantics, e.g. "release", depending on how atomic_cmpxchg() is translated to the op codes.
It's on the processor side of things. However, there is compiler which can also reorder the operations, so if compiler barrier is not implied by atomic_cmpxchg() (by e.g. __asm__ __volatile__), you would need one.

volatile keyword for objects in C++

I have a thread safe counter object( it's a class which uses std::atomic load() and store() )
as one of the class members. Thread 1 increments the counter and Thread 2 reads the counter.
Usually, primitive types ( int etc ) which are shared by different threads are declared volatile to prevent any compiler optimizations. Do I have to declare this thread safe counter object which is shared by 2 different threads as volatile ?
Could someone provide more insight into this ?
No. There is no need if the object is declared atomic.
A C or C++ compiler may not reorder reads and writes to volatile memory locations, nor may it omit a read or write to a volatile memory location.
By using atomic, it already achieves the what volatile intended to do, so no need to declare volatile.
Take a look at: volatile (C++) msdn article
You don't have to because
"The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication. For inter-thread communication, use mechanisms such as std::atomic from the C++ Standard Template Library."

Resources