Does all shared variables needs to be atomic? - multithreading

I have started working to learn multicore programming. I started leaning c++11 atomics. I would like to know if all he shared variables needs to be atomic?

The only time a variable needs to be "atomic", i.e., can be updated in "one fell swoop, without any other thread being able to read it meanwhile", is if it even can be read while someone else is updating it. For instance, if it's set on initialization and then never changes, no one can ever read it while it's changing, and so it doesn't have to be atomic. On the other hand, if it ever changes after initialization and there's a risk that anyone else than the changing thread reads it while it's changing, then it needs to be atomic (atomic intrinsics or protected by mutex or otherwise)

If multiple thread accessing (read/write) the same variable then it should be atomic.
Also you can go though this.

Not necessarily for all scenarios. Also, note that atomicity of variable access alone does not guarantee full thread safety. It just ensures that the particular variable being read is got as a whole.In some architectures, the read operation does not happen in single assembly instruction. For example, if you are reading 64 bit value, the compiler might implement the read using two load instructions of assembly such that the 1st instruction reads the lower 32 bits and the 2nd instruction reads the higher 32 bits. This in turn can lead to race condition. So, atomic reads are preferred.

Related

Does a variable only read by one thread, read and written by another, need synchronization?

Motive:
I am just learning the fundamentals of multithreading, not close to finishing them, but I'd like to ask a question this early in my learning journey to guide me toward the topics most relevant to my project I 'm working on.
Main:
a. If a process has two threads, one that edits a set of variables, the other only reads said variables and never edits their values; Then do we need any sort of synchronization for guaranteeing the validity of the read values by the reading thread?
b. Is it possible for the OS scheduling these two threads to cause the reading-thread to read a variable in a memory location in the exact same moment while the writing-thread is writing into the same memory location, or that's just a hardware/bus situation will never be allowed happen and a software designer should never care about that? What if the variable is a large struct instead of a little int or char?
a. If a process has two threads, one that edits a set of variables, the other only reads said variables and never edits their values; Then do we need any sort of synchronization for guaranteeing the validity of the read values by the reading thread?
In general, yes. Otherwise, the thread editing the value could change the value only locally so that the other thread will never see the value change. This can happens because of compilers (that could use registers to read/store variables) but also because of the hardware (regarding the cache coherence mechanism used on the target platform). Generally, locks, atomic variables and memory barriers are used to perform such synchronizations.
b. Is it possible for the OS scheduling these two threads to cause the reading-thread to read a variable in a memory location in the exact same moment while the writing-thread is writing into the same memory location, or that's just a hardware/bus situation will never be allowed happen and a software designer should never care about that? What if the variable is a large struct instead of a little int or char?
In general, there is no guarantee that accesses are done atomically. Theoretically, two cores executing each one a thread can load/store the same variable at the same time (but often not in practice). It is very dependent of the target platform.
For processor having (coherent) caches (ie. all modern mainstream processors) cache lines (ie. chunks of typically 64 or 128 bytes) have a huge impact on the implicit synchronization between threads. This is a complex topic, but you can first read more about cache coherence in order to understand how the memory hierarchy works on modern platforms.
The cache coherence protocol prevent two load/store being done exactly at the same time in the same cache line. If the variable cross multiple cache lines, then there is no protection.
On widespread x86/x86-64 platforms, variables having primitive types of <= 8 bytes can be modified atomically (because the bus support that as well as the DRAM and the cache) assuming the address is correctly aligned (it does not cross cache lines). However, this does not means all such accesses are atomic. You need to specify this to the compiler/interpreter/etc. so it produces/executes the correct instructions. Note that there is also an extension for 16-bytes atomics. There is also an instruction set extension for the support of transactional memory. For wider types (or possibly composite ones) you likely need a lock or an atomic state to control the atomicity of the access to the target variable.

Deciding the critical section of kernel code

Hi I am writing kernel code which intends to do process scheduling and multi-threaded execution. I've studied about locking mechanisms and their functionality. Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking (mutex/semaphores/spinlocks)?
I know that where ever there is chance of concurrency in part of code, we require lock. But how do we decide, what if we miss and test cases don't catch them. Earlier I wrote code for system calls and file systems where I never cared about taking locks.
Is there a thumb rule regarding what sort of data structure in critical section should be protected by locking?
Any object (global variable, field of the structure object, etc.), accessed concurrently when one access is write access requires some locking discipline for access.
But how do we decide, what if we miss and test cases don't catch them?
Good practice is appropriate comment for every declaration of variable, structure, or structure field, which requires locking discipline for access. Anyone, who uses this variable, reads this comment and writes corresponded code for access. Kernel core and modules tend to follow this strategy.
As for testing, common testing rarely reveals concurrency issues because of their low probability. When testing kernel modules, I would advice to use Kernel Strider, which attempts to prove correctness of concurrent memory accesses or RaceHound, which increases probability of concurrent issues and checks them.
It is always safe to grab a lock for the duration of any code that accesses any shared data, but this is slow since it means only one thread at a time can run significant chunks of code.
Depending on the data in question though, there may be shortcuts that are safe and fast. If it is a simple integer ( and by integer I mean the native word size of the CPU, i.e. not a 64 bit on a 32 bit cpu ), then you may not need to do any locking: if one thread tries to write to the integer, and the other reads it at the same time, the reader will either get the old value, or the new value, never a mix of the two. If the reader doesn't care that he got the old value, then there is no need for a lock.
If however, you are updating two integers together, and it would be bad for the reader to get the new value for one and the old value for the other, then you need a lock. Another example is if the thread is incrementing the integer. That normally involves a read, add, and write. If one reads the old value, then the other manages to read, add, and write the new value, then the first thread adds and writes the new value, both believe they have incremented the variable, but instead of being incremented twice, it was only incremented once. This needs either a lock, or the use of an atomic increment primitive to ensure that the read/modify/write cycle can not be interrupted. There are also atomic test-and-set primitives so you can read a value, do some math on it, then try to write it back, but the write only succeeds if it still holds the original value. That is, if another thread changed it since the time you read it, the test-and-set will fail, then you can discard your new value and start over with a read of the value the other thread set and try to test-and-set it again.
Pointers are really just integers, so if you set up a data structure then store a pointer to it where another thread can find it, you don't need a lock as long as you set up the structure fully before you store its address in the pointer. Another thread reading the pointer ( it will need to make sure to read the pointer only once, i.e. by storing it in a local variable then using only that to refer to the structure from then on ) will either see the new structure, or the old one, but never an intermediate state. If most threads only read the structure via the pointer, and any that want to write do so either with a lock, or an atomic test-and-set of the pointer, this is sufficient. Any time you want to modify any member of the structure though, you have to copy it to a new one, change the new one, then update the pointer. This is essentially how the kernel's RCU ( read, copy, update ) mechanism works.
Ideally, you must enumerate all the resources available in your system , the related threads and communication, sharing mechanism during design. Determination of the following for every resource and maintaining a proper check list whenever change is made can be of great help :
The duration for which the resource will be busy (Utilization of resource) & type of lock
Amount of tasks queued upon that particular resource (Load) & priority
Type of communication, sharing mechanism related to resource
Error conditions related to resource
If possible, it is better to have a flow diagram depicting the resources, utilization, locks, load, communication/sharing mechanism and errors.
This process can help you in determining the missing scenarios/unknowns, critical sections and also in identification of bottlenecks.
On top of the above process, you may also need certain tools that can help you in testing / further analysis to rule out hidden problems if any :
Helgrind - a Valgrind tool for detecting synchronisation errors.
This can help in identifying data races/synchronization issues due
to improper locking, the lock ordering that can cause deadlocks and
also improper POSIX thread API usage that can have later impacts.
Refer : http://valgrind.org/docs/manual/hg-manual.html
Locksmith - For determining common lock errors that may arise during
runtime or that may cause deadlocks.
ThreadSanitizer - For detecting race condtion. Shall display all accesses & locks involved for all accesses.
Sparse can help to lists the locks acquired and released by a function and also identification of issues such as mixing of pointers to user address space and pointers to kernel address space.
Lockdep - For debugging of locks
iotop - For determining the current I/O usage by processes or threads on the system by monitoring the I/O usage information output by the kernel.
LTTng - For tracing race conditions and interrupt cascades possible. (A successor to LTT - Combination of kprobes, tracepoint and perf functionalities)
Ftrace - A Linux kernel internal tracer for analysing /debugging latency and performance related issues.
lsof and fuser can be handy in determining the processes having lock and the kind of locks.
Profiling can help in determining where exactly the time is being spent by the kernel. This can be done with tools like perf, Oprofile.
The strace can intercept/record system calls that are called by a process and also the signals that are received by a process. It shall show the order of events and all the return/resumption paths of calls.

Perl ithreads :shared variables - multiprocessor kernel threads - visibility

perlthrtut excerpt:
Note that a shared variable guarantees that if two or more threads try
to modify it at the same time, the internal state of the variable will
not become corrupted. However, there are no guarantees beyond this, as
explained in the next section.
Working on Linux supporting multiprocessor kernel threads.
Is there a guarantee that all threads will see the updated shared variable value ?
Consulting the perlthrtut doc as stated above there is no such guarantee.
Now the question: What can be done programmatically to guarantee that?
You ask
Is there a guarantee that all threads will see the updated shared variable value ?
Yes. :shared is that guarantee. The value will be safely and consistently and freshly updated.
The problem is simply that, without other synchronization, you don't know the order of these updates.
Consulting the perlthrtut doc as stated above there is no such guarantee.
You didn't read far enough. :)
The very next section in perlthrtut explains the kind of pitfalls you do face with perl threads: data races, which is to say, application logic races concerning shared data. Again, the shared data will be consistent and fresh and immune to corruption from (more-or-less) atomic perl opcodes. However, the high-level perl operations you perform on that shared data are not guaranteed to be atomic. $shared_var++, for instance, might be more than one atomic operation.
(If I may hazard a guess, you are perhaps thinking too much about other languages' lower level threading interfaces with their cache inconsistencies, torn words, reordered instructions, and lions and tigers and bears. Perl's model takes care of those low-level concerns for you.)
Using :shared on a variable causes all threads to reference it in the same physical memory address, so it doesn't matter which processor/core/hyper-thread they happen to be executing in. The perlthrtut talk of guarantees is in reference to race conditions, and in short, that you need to take into account that shared variables can be modified by any thread at any time. If this is a problem you'll need to make use of synchronization functions (e.g. lock() and cond_wait()) to control access.
You seem to be confused as to what :shared does. It makes it so a variable is shared by all threads.
A variable is indeed guaranteed to have the value it has, no matter which thread accesses it. It's a tautology, so nothing can be done to programmatically guarantee that.

Can threads safely read variables set by VCL events?

Is it safe for a thread to READ a variable set by a Delphi VCL event?
When a user clicks on a VCL TCheckbox, the main thread sets a boolean to the checkbox's Checked state.
CheckboxState := CheckBox1.Checked;
At any time, a thread reads that variable
if CheckBoxState then ...
It doesn't matter if the thread "misses" a change to the boolean, because the thread checks the variable in a loop as it does other things. So it will see the state change eventually...
Is this safe? Or do I need special code? Is surrounding the read and write of the variable (in the thread and main thread respectively) with critical code calls necessary and sufficient?
As I said, it doesn't matter if the thread gets the "wrong" value, but I keep thinking that there might be a low-level problem if one thread tries to read a variable while the main thread is in the middle of writing it, or vice versa.
My question is similar to this one: Cross thread reading of a variable who's value is not considered important.
(Also related to my previous question: Using EnterCriticalSection in Thread to update VCL label)
This is safe, for three reasons:
Only one thread writes to the variable.
The variable is only one byte, so there is no way to read an inconsistent value. It will be read either as True or as False. There can't be alignment issues with Delphi boolean values.
The Delphi compiler does no extensive checks whether a variable is actually written to, and does not "optimize" away any code if not. Non-local variables will always be read, there is no need for the volatile specifier.
Having said that, if you are really unsure about this you could use an integer value instead of the boolean, and use the InterlockedExchange() function to write to the variable. This is overkill here, but it's a good technique to know about, because for single machine word sized values it may eliminate the need for locks.
You can also replace the boolean by a proper synchronization primitive, like an event, and have the thread block on that - this would help you eliminate busy loops in the thread.
In your case (Checked property) a read operation is atomic, so it is safe. That is the same as with TThread.Terminated property; the simple read and write operations for the properly aligned bytes, words and doublewords are atomic. You can check intel documentation for more information:
CHAPTER 8 - MULTIPLE-PROCESSOR MANAGEMENT
8.1.1 Guaranteed Atomic Operations
The Intel486 processor (and newer processors since) guarantees that the following
basic memory operations will always be carried out atomically:
Reading or writing a byte
Reading or writing a word aligned on a 16-bit boundary
Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer processors since) guarantees that the following
additional memory operations will always be carried out atomically:
Reading or writing a quadword aligned on a 64-bit boundary
16-bit accesses to uncached memory locations that fit within a 32-bit data bus
For the example you give, it will be safe. Technically the main issue is in cases where your variable goes over a machine word in size, and thus you might get the high word and long word not synchronized. For small values though, this is not an issue.
If you think it is a potential problem, for example using a pointer, then the thing to use is a TCriticalSection to control read and writes to the item. This is fast enough for all practical situations, and ensures you are 100% safe.

Should access to a shared resource be locked by a parent thread before spawning a child thread that accesses it?

If I have the following psuedocode:
sharedVariable = somevalue;
CreateThread(threadWhichUsesSharedVariable);
Is it theoretically possible for a multicore CPU to execute code in threadWhichUsesSharedVariable() which reads the value of sharedVariable before the parent thread writes to it? For full theoretical avoidance of even the remote possibility of a race condition, should the code look like this instead:
sharedVariableMutex.lock();
sharedVariable = somevalue;
sharedVariableMutex.unlock();
CreateThread(threadWhichUsesSharedVariable);
Basically I want to know if the spawning of a thread explicitly linearizes the CPU at that point, and is guaranteed to do so.
I know that the overhead of thread creation probably takes enough time that this would never matter in practice, but the perfectionist in me is afraid of the theoretical race condition. In extreme conditions, where some threads or cores might be severely lagged and others are running fast and efficiently, I can imagine that it might be remotely possible for the order of execution (or memory access) to be reversed unless there was a lock.
I would say that your pseudocode is safe on any correctly functioning
multiprocessor system. The C++ compiler cannot generate a call to
CreateThread() before sharedVariable has received a correct value
unless it can prove to itself that doing so is safe. You are guaranteed
that your single-threaded code executes equivalently to a completely
non-reordered linear execution path. Any system that "time warps" the
thread creation ahead of the variable assignment is seriously broken.
I don't think declaring sharedVariable as volatile does anything
useful in this case.
Given your example and if you were using Java then the answer would be "No". In Java it is not possible for the thread to spawn and read your value before the assignment operation is complete. In some other languages this might be a different story.
"Variables shared between multiple threads (e.g., instance variables of objects) have atomic assignment guaranteed by the Java language specification for all data types except longs and doubles... If a method consists solely of a single variable access or assignment, there is no need to make it synchronized for thread-safety, and every reason not to do so for performance."
reference
If your double or long is declared volatile, then you are also guaranteed that the assignment is an atomic operation.
Update:
Your example is going to work in C++ just like it works in Java. Theoretically there is no way that the thread spawning will begin or complete before the assignment, even with Out of Order Execution.
Note that your example is VERY specific and in any other case it is recommended that you ensure the shared resource is protected properly. The new C++ standard is coming out with a lot of atomic stuff, so you could declare your variable as atomic and the assignment operation will be visible to all threads without the need of locking. CAS (compare and set) is a your next best option.

Resources