Multithreading: Conflicting operations by threads vs data race - multithreading

Is there a difference between thread conflict and data race.
As per what I've learnt conflicting operations between occur when two threads try to access the same memory location and atleast one of them is a write operation.
Here is what Wikipedia has to say about data race/ race condition.
How are they different?

I have finally found a good answer to this question.
TL:DR :-
Conflicting operations -
Involve multiple threads
Accessing the same memory location
At least one of them is a write operation.
Data race - unordered conflicting operations.
LONG VERSION -
I am explaining with an example how conflicting operations occur and how to identify if they are data race free.
Consider Thread 1 and Thread 2, and shared variable done.
AtomicBoolean done = new AtomicBoolean(false);
int x = 0;
Thread 1
x = f();
done.set(true);
Thread 2
while(!done.get())
{
/* a block */
}
y = g(x);
here done.set() - done.get() and x=f() - y=g(x) are in conflict. However the programming memory model defines 2 relations :- synchronizes-with and happens-before. Because the done is atomic, its pair of operations synchronize with each other. Additionally, because of that we can choose which operation happens before the other in that pair.
Now because x = f() happens before done.set(true) in Thread 1 and done.get() happens before y = g(x) in Thread 2, we can say x = f() happens before y = g(x) because happens before is a transitive relation.
Thus the above example is ordered and consequently data-race free.

Related

Is this a race condition? If not, can I ever write a program like this (is this program WRONG)?

I'm really new to multi-thread programming.
Really confused about the definition.
Say I have two threads
x, y, z = 0;
Thread 1:
lock(x);
lock(y);
x = y + 1;
unlock(x);
unlock(y);
Thread 2:
lock(x);
lock(z);
z = x + 1;
unlock(x);
unlock(z);
You can see that the value of z is dependent on which thread execute first. If thread 1 happens before thread 2, z = 2; if thread 2 executes first, z = 1. According to many existing answers, e.g. Is this a race condition?, I believe many people think it's not. But the result is unpredictable - depend on thread scheduling still sounds weird to me. It this a semantic mistake? Do people ever write multi-thread programs like this? Thanks for any comments!!

Having thread-local arrays in cython so that I can resize them?

I have an interval-treeish algorithm I would like to run in parallel for many queries using threads. Problem is that then each thread would need its own array, since I cannot know in advance how many hits there will be.
There are other questions like this, and the solution suggested is always to have an array of size (K, t) where K is output length and t is number of threads. This does not work for me as K might be different for each thread and each thread might need to resize the array to fit all the results it gets.
Pseudocode:
for i in prange(len(starts)):
qs, qe, qx = starts[i], ends[i], index[i]
results = t.search(qs, qe)
if len(results) + nfound < len(output):
# add result to output
else:
# resize array
# then add results
An usual pattern is that every thread gets its own container, which is a trade-off between speed/complexity and memory-overhead:
there is no need to lock for access to this container, because only one thread accesses it.
there is much less overhead compared to "own container for every task (i.e. every i-value)".
After the parallel section, the data must be either collected in a final container in a post processing step (which also could happen in parallel) or the subsequent algorithms should be able to handle a collection of containers.
Here is an example using c++-vector (which already has memory management and increasing size built-in):
%%cython -+ -c=/openmp --link-args=/openmp
from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp
def calc_in_parallel(N):
cdef int i,k,tid
cdef int n = N
cdef vector[vector[int]] vecs
# every thread gets its own container
vecs.resize(openmp.omp_get_max_threads())
for i in prange(n, nogil=True):
tid = threadid()
for k in range(i):
# use container of the thread
vecs[tid].push_back(k) # dummy for calculation
return vecs
Using omp_get_max_threads() for the number of threads will overestimate the real number of threads in many cases. It is probably more robust to set the number of threads explicitly in prange, i.e.
...
NUM_THREADS = 2
vecs.resize(NUM_THREADS)
for i in prange(n, nogil=True, num_threads = NUM_THREADS):
...
A similar approach can be applied using pure C, but more boiler plate code (memory management) will be needed in this case.

Signal-handling between threads

I'm new to concurrent programming and I got around this question I can't really understand what's wrong with, consider the following pseudo-code with x being a shared (conditional) variable initialized with 0:
signal(c)
wait(c)
x = x + 1
signal(c)
What are the possible results for x, running 2 threads (concurrently) in a single processor?
After executing both threads either everything went fine and x = 2 or both "x=x+1" line are executed at the same time and then x = 1 because both thread read the initial value of x as 0 at the same time.

executing a command on a variable used by multiple threads

Given the threading scenario where two threads execute the computation:
x = x + 1 where x is a shared variable
What are the possible results and describe why your answer could happen.
This is a textbook problem from my OS book and I was curious if I needed more information to answer this such as what x is initialized too and how often the threads execute this command or just once. My answer originally was that it could be two possible answers depending on the order that the threads execute them by the OS.
This is a rather simple task, so there isnt probably too much that could go wrong.
The only issue i can immediately think of is if one thread uses on old value of x in its calculation.
eg:
start with x = 2
1) thread A reads x = 2
2) thread B reads x = 2
3) thread A writes x = 2 + 1
x = 3
4) thread B writes x = 2(old value of x) + 1
x = 3 when it should be 4
this would be even more apparent if more than 1 thread reads the value before the first thread writes.

Do atomic operations work the same across processes as they do across threads?

Obviously, atomic operations make sure that different threads don't clobber a value. But is this still true across processes, when using shared memory? Even if the processes happen to be scheduled by the OS to run on different cores? Or across different distinct CPUs?
Edit: Also, if it's not safe, is it not safe even on an operating system like Linux, where processes and threads are the same from the scheduler's point of view?
tl;dr: Read the fine print in the documentation of the atomic operations. Some will be atomic by design but may trip over certain variable types. In general, though, an atomic operation will maintain its contract between different processes just as it does between threads.
An atomic operation really only ensures that you won't have an inconsistent state if called by two entities simultaneously. For example, an atomic increment that is called by two different threads or processes on the same integer will always behave like so:
x = initial value (zero for the sake of this discussion)
Entity A increments x and returns the result to itself: result = x = 1.
Entity B increments x and returns the result to itself: result = x = 2.
where A and B indicate the first and second thread or process that makes the call.
A non-atomic operation can result in inconsistent or generally crazy results due to race conditions, incomplete writes to the address space, etc. For example, you can easily see this:
x = initial value = zero again.
Entity A calls x = x + 1. To evaluate x + 1, A checks the value of x (zero) and adds 1.
Entity B calls x = x + 1. To evaluate x + 1, B checks the value of x (still zero) and adds 1.
Entity B (by luck) finishes first and assigns the result of x + 1 = 1 (step 3) to x. x is now 1.
Entity A finishes second and assigns the result of x + 1 = 1 (step 2) to x. x is now 1.
Note the race condition as entity B races past A and completes the expression first.
Now imagine if x were a 64-bit double that is not ensured to have atomic assignments. In that case you could easily see something like this:
A 64 bit double x = 0.
Entity A tries to assign 0x1122334455667788 to x. The first 32 bits are assigned first, leaving x with 0x1122334400000000.
Entity B races in and assigns 0xffeeddccbbaa9988 to x. By chance, both 32 bit halves are updated and x is now = 0xffeeddccbbaa9988.
Entity A completes its assignment with the second half and x is now = 0xffeeddcc55667788.
These non-atomic assignments are some of the most hideous concurrent bugs you'll ever have to diagnose.

Resources