Is it ok to have multiple threads writing the same values to the same variables? - multithreading

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?

The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.

It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.

I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem

In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.

Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.

If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.

Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Related

If one thread writes to a location and another thread is reading, can the second thread see the new value then the old?

Start with x = 0. Note there are no memory barriers in any of the code below.
volatile int x = 0
Thread 1:
while (x == 0) {}
print "Saw non-zer0"
while (x != 0) {}
print "Saw zero again!"
Thread 2:
x = 1
Is it ever possible to see the second message, "Saw zero again!", on any (real) CPU? What about on x86_64?
Similarly, in this code:
volatile int x = 0.
Thread 1:
while (x == 0) {}
x = 2
Thread 2:
x = 1
Is the final value of x guaranteed to be 2, or could the CPU caches update main memory in some arbitrary order, so that although x = 1 gets into a CPU's cache where thread 1 can see it, then thread 1 gets moved to a different cpu where it writes x = 2 to that cpu's cache, and the x = 2 gets written back to main memory before x = 1.
Yes, it's entirely possible. The compiler could, for example, have just written x to memory but still have the value in a register. One while loop could check memory while the other checks the register.
It doesn't happen due to CPU caches because cache coherency hardware logic makes the caches invisible on all CPUs you are likely to actually use.
Theoretically, the write race you talk about could happen due to posted write buffering and read prefetching. Miraculous tricks were used to make this impossible on x86 CPUs to avoid breaking legacy code. But you shouldn't expect future processors to do this.
Leaving aside for a second tricks done by the compiler (even ones allowed by language standards), I believe you're asking how the micro-architecture could behave in such scenario. Keep in mind that the code would most likely expand into a busy wait loop of cmp [x] + jz or something similar, which hides a load inside it. This means that [x] is likely to live in the cache of the core running thread 1.
At some point, thread 2 would come and perform the store. If it resides on a different core, the line would first be invalidated completely from the first core. If these are 2 threads running on the same physical core - the store would immediately affect all chronologically younger loads.
Now, the most likely thing to happen on a modern out-of-order machine is that all the loads in the pipeline at this point would be different iterations of the same first loop (since any branch predictor facing so many repetitive "taken" resolution is likely to assume the branch will continue being taken, until proven wrong), so what would happen is that the first load to encounter the new value modified by the other thread will cause the matching branch to simply flush the entire pipe from all younger operations, without the 2nd loop ever having a chance to execute.
However, it's possible that for some reason you did get to the 2nd loop (let's say the predictor issue a not-taken prediction just at the right moment when the loop condition check saw the new value) - in this case, the question boils down to this scenario:
Time -->
----------------------------------------------------------------
thread 1
cmp [x],0 execute
je ... execute (not taken)
...
cmp [x],0 execute
jne ... execute (not taken)
Can_We_Get_Here:
...
thread2
store [x],1 execute
In other words, given that most modern CPUs may execute instructions out of order, can a younger load be evaluated before an older one to the same address, allowing the store (from another thread) to change the value so it may be observed inconsistently by the loads.
My guess is that the above timeline is quite possible given the nature of out-of-order execution engines today, as they simply arbitrate and perform whatever operation is ready. However, on most x86 implementations there are safeguards to protect against such a scenario, since the memory ordering rules strictly say -
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
Such mechanisms may detect this scenario and flush the machine to prevent the stale/wrong values becoming visible. So The answer is - no, it should not be possible, unless of course the software or the compiler change the nature of the code to prevent the hardware from noticing the relation. Then again, memory ordering rules are sometimes flaky, and i'm not sure all x86 manufacturers adhere to the exact same wording, but this is a pretty fundamental example of consistency, so i'd be very surprised if one of them missed it.
The answer seems to be, "this is exactly the job of the CPU cache coherency." x86 processors implement the MESI protocol, which guarantee that the second thread can't see the new value then the old.

about mutexes and deadlocks

I have the following code:
pthread_mutex lock_row[M], lock_culm[M];
FUNCTION SIGNATURE (..., int i, int j, ...) {
pthread_mutex_lock(&lock_row[i]);
pthread_mutex_lock(&lock_culm[j]);
...CRITICAL CODE...
pthread_mute_unlock(&lock_row[j]);
pthread_mute_unlock(&lock_row[i]);
}
Can I get a deadlock between the first lock to the second? Let's say if we have a context switch after the first row, and other thread tries to lock something again? I don't really get this I would like to understand this a little further.
Besides the probable typo when you try to unlock sth twice, this example will never deadlock. Context switches between the two lock-calls pose no threat to the mechanism involved here. Think of it as a getting a higher level of allowance. With each lock gained, this process or thread is allowed to do more. Each locking is a gate which might hold the process up until no other lock-holder prevents the entering of the higher level. Whatever happens between the two lockings does not matter as long as it does not change that level of allowance.
pthread_mutex_lock(&lock_row[i]);
pthread_mutex_lock(&lock_culm[j]);
This is fine as long as all of your code takes these locks in this order - the lock_row lock first, then the lock_culm lock second. If another part of the code takes these same locks in the opposite order, then it can deadlock.
For this reason it is usual in complex programs to define the locking order - a global ordering of all the locks in the program, defining the order in which they should be taken.

Why is threading dangerous?

I've always been told to puts locks around variables that multiple threads will access, I've always assumed that this was because you want to make sure that the value you are working with doesn't change before you write it back
i.e.
mutex.lock()
int a = sharedVar
a = someComplexOperation(a)
sharedVar = a
mutex.unlock()
And that makes sense that you would lock that. But in other cases I don't understand why I can't get away with not using Mutexes.
Thread A:
sharedVar = someFunction()
Thread B:
localVar = sharedVar
What could possibly go wrong in this instance? Especially if I don't care that Thread B reads any particular value that Thread A assigns.
It depends a lot on the type of sharedVar, the language you're using, any framework, and the platform. In many cases, it's possible that assigning a single value to sharedVar may take more than one instruction, in which case you may read a "half-set" copy of the value.
Even when that's not the case, and the assignment is atomic, you may not see the latest value without a memory barrier in place.
MSDN Magazine has a good explanation of different problems you may encounter in multithreaded code:
Forgotten Synchronization
Incorrect Granularity
Read and Write Tearing
Lock-Free Reordering
Lock Convoys
Two-Step Dance
Priority Inversion
The code in your question is particularly vulnerable to Read/Write Tearing. But your code, having neither locks nor memory barriers, is also subject to Lock-Free Reordering (which may include speculative writes in which thread B reads a value that thread A never stored) in which side-effects become visible to a second thread in a different order from how they appeared in your source code.
It goes on to describe some known design patterns which avoid these problems:
Immutability
Purity
Isolation
The article is available here
The main problem is that the assignment operator (operator= in C++) is not always guaranteed to be atomic (not even for primitive, built in types). In plain English, that means that assignment can take more than a single clock cycle to complete. If, in the middle of that, the thread gets interrupted, then the current value of the variable might be corrupted.
Let me build off of your example:
Lets say sharedVar is some object with operator= defined as this:
object& operator=(const object& other) {
ready = false;
doStuff(other);
if (other.value == true) {
value = true;
doOtherStuff();
} else {
value = false;
}
ready = true;
return *this;
}
If thread A from your example is interrupted in the middle of this function, ready will still be false when thread B starts to run. This could mean that the object is only partially copied over, or is in some intermediate, invalid state when thread B attempts to copy it into a local variable.
For a particularly nasty example of this, think of a data structure with a removed node being deleted, then interrupted before it could be set to NULL.
(For some more information regarding structures that don't need a lock (aka, are atomic), here is another question that talks a bit more about that.)
This could go wrong, because threads can be suspended and resumed by the thread scheduler, so you can't be sure about the order these instructions are executed. It might just as well be in this order:
Thread B:
localVar = sharedVar
Thread A:
sharedVar = someFunction()
In which case localvar will be null or 0 (or some completeley unexpected value in an unsafe language), probably not what you intended.
Mutexes actually won't fix this particular issue by the way. The example you supply does not lend itself well for parallelization.

Simple POSIX threads question

I have this POSIX thread:
void subthread(void)
{
while(!quit_thread) {
// do something
...
// don't waste cpu cycles
if(!quit_thread) usleep(500);
}
// free resources
...
// tell main thread we're done
quit_thread = FALSE;
}
Now I want to terminate subthread() from my main thread. I've tried the following:
quit_thread = TRUE;
// wait until subthread() has cleaned its resources
while(quit_thread);
But it does not work! The while() clause does never exit although my subthread clearly sets quit_thread to FALSE after having freed its resources!
If I modify my shutdown code like this:
quit_thread = TRUE;
// wait until subthread() has cleaned its resources
while(quit_thread) usleep(10);
Then everything is working fine! Could someone explain to me why the first solution does not work and why the version with usleep(10) suddenly works? I know that this is not a pretty solution. I could use semaphores/signals for this but I'd like to learn something about multithreading, so I'd like to know why my first solution doesn't work.
Thanks!
Without a memory fence, there is no guarantee that values written in one thread will appear in another. Most of the pthread primitives introduce a barrier, as do several system calls such as usleep. Using a mutex around both the read and write introduces a barrier, and more generally prevents multi-byte values being visible in partially written state.
You also need to separate the idea of asking a thread to stop executing, and reporting that it has stopped, and appear to be using the same variable for both.
What's most likely to be happening is that your compiler is not aware that quit_thread can be changed by another thread (because C doesn't know about threads, at least at the time this question was asked). Because of that, it's optimising the while loop to an infinite loop.
In other words, it looks at this code:
quit_thread = TRUE;
while(quit_thread);
and thinks to itself, "Hah, nothing in that loop can ever change quit_thread to FALSE, so the coder obviously just meant to write while (TRUE);".
When you add the call to usleep, the compiler has another think about it and assumes that the function call may change the global, so it plays it safe and doesn't optimise it.
Normally you would mark the variable as volatile to stop the compiler from optimising it but, in this case, you should use the facilities provided by pthreads and join to the thread after setting the flag to true (and don't have the sub-thread reset it, do that in the main thread after the join if it's necessary). The reason for that is that a join is likely to be more efficient than a continuous loop waiting for a variable change since the thread doing the join will most likely not be executed until the join needs to be done.
In your spinning solution, the joining thread will most likely continue to run and suck up CPU grunt.
In other words, do something like:
Main thread Child thread
------------------- -------------------
fStop = false
start Child Initialise
Do some other stuff while not fStop:
fStop = true Do what you have to do
Finish up and exit
join to Child
Do yet more stuff
And, as an aside, you should technically protect shared variables with mutexes but this is one of the few cases where it's okay, one-way communication where half-changed values of a variable don't matter (false/not-false).
The reason you normally mutex-protect a variable is to stop one thread seeing it in a half-changed state. Let's say you have a two-byte integer for a count of some objects, and it's set to 0x00ff (255).
Let's further say that thread A tries to increment that count but it's not an atomic operation. It changes the top byte to 0x01 but, before it gets a chance to change the bottom byte to 0x00, thread B swoops in and reads it as 0x01ff.
Now that's not going to be very good if thread B want to do something with the last element counted by that value. It should be looking at 0x0100 but will instead try to look at 0x01ff, the effect of which will be wrong, if not catastrophic.
If the count variable were protected by a mutex, thread B wouldn't be looking at it until thread A had finished updating it, hence no problem would occur.
The reason that doesn't matter with one-way booleans is because any half state will also be considered as true or false so, if thread A was halfway between turning 0x0000 into 0x0001 (just the top byte), thread B would still see that as 0x0000 (false) and keep going (until thread A finishes its update next time around).
And if thread A was turning the boolean into 0xffff, the half state of 0xff00 would still be considered true by thread B so it would do its thing before thread A had finished updating the boolean.
Neither of those two possibilities is bad simply because, in both, thread A is in the process of changing the boolean and it will finish eventually. Whether thread B detects it a tiny bit earlier or a tiny bit later doesn't really matter.
The while(quite_thread); is using the value quit_thread was set to on the line before it. Calling a function (usleep) induces the compiler to reload the value on each test.
In any case, this is the wrong way to wait for a thread to complete. Use pthread_join instead.
You're "learning" multhithreading the wrong way. The right way is to learn to use mutexes and condition variables; any other solution will fail under some circumstances.

When should the Win32 InterlockedExchange function be used?

I came across the function InterlockedExchange and was wondering when I should use this function. In my opinion, setting a 32 Bit value on an x86 processor should always be atomic?
In the case where I want to use the function, the new value does not depend on the old value (it is not an increment operation).
Could you provide an example where this method is mandatory (I'm not looking for InterlockedCompareExchange)
InterlockedExchange is both a write and a read -- it returns the previous value.
This is necessary to ensure another thread didn't write a different value just after you did. For example, say you're trying to increment a variable. You can read the value, add 1, then set the new value with InterlockedExchange. The value returned by InterlockedExchange must match the value you originally read, otherwise another thread probably incremented it at the same time, and you need to loop around and try again.
As well as writing the new value, InterlockedExchange also reads and returns the previous value; this whole operation is atomic. This is useful for lock-free algorithms.
(Incidentally, 32-bit writes are not guaranteed to be atomic. Consider the case where the write is unaligned and straddles a cache boundary, for instance.)
In a multi-processor or multi-core machine each core has it's own cache - so each core has each own potentially different "view" of what the content of the system memory is.
Thread synchronization mechanisms take care of synchronizing between cores, for more information look at http://blogs.msdn.com/oldnewthing/archive/2008/10/03/8969397.aspx or google for acquire and release semantics
Setting a 32-bit value is atomic, but only if you're setting a literal.
b = a is 2 operations:
mov eax,dword ptr [a]
mov dword ptr [b],eax
Theoretically there could be some interruption between the first and second operation.
Writing a value is never atomic by default. When you write a value to a variable, several machine instructions are generated. With modern, preemptive OSes, the OS might switch to another thread between the individual operations of the write.
This is even more a problem on multi-processor machines, where several threads could be executing at the same time, and trying to write to a single memory location simultaneously.
Interlocked operations avoid this by using specialized instructions to make the write (x86 has dedicated instructions for this kind of situation), which do the read-modify-write in one instruction. These instructions also lock the memory bus of all processors, to ensure that no other executing thread could be writing to the value at the same time.
InterlockedExchange makes sure that the change of a variable and the return of its original value are not interrupted by other threads.
So, if 'i' is an int, these calls (taken individually) do not need InterlockedExchange around 'i':
a = i;
i = 9;
i = a;
i = a + 9;
a = i + 9;
if(0 == i)
None of these statements rely upon BOTH the initial AND final values of 'i'. But these following calls DO need InterlockedExchange around 'i':
a = i++; //a = InterlockedExchange(&i, i + 1);
Without it, two threads running through this same code might get the same value of 'i' assigned to 'a' or 'a' may unexpectedly skip two or more numbers.
if(0 == i++) //if(0 == InterlockedExchange(&i, i + 1))
Two threads may both execute the code that is only supposed to happen once.
etc.
wow, so many conflicting answers. Hard to sift through who's right, who's wrong, and what information is misleading.
I'm unsure of the answer too, given the above half-answers, but I think it works like this, I may be wrong, and it will be interesting to find out if I am:
32-bit read & writes ARE atomic, but depending on your code, that may not mean much.
don't worry about non-aligned read/writes. ALL 32-bit writes to a 32-bit variable have to be aligned or the machine page-faults.
don't worry about a write wrapping around the end of a cached page, that can't happen.
If you need to write-then-read on one thread, and you're writing on another thread, then you need to use InterlockedExchange. If you're simply reading the value on one thread, and writing it on another, then you don't need to use it, but those values may be wiggly because of multithreading.

Resources