How to lock some thread using semaphores with OR premitive? - multithreading

I'm considering an example to evaluate my question further. Consider three threads, T_1, T_2 and T_3.
If T_2 cannot execute until T_1 has finished its main segment then I'll simply make place a semwait() in the beginning of T_2 and will place semsignal() of same semaphore.
If T_2 cannot execute until T_1 AND T_3 have finished their main code segment then I'll simply make T_2 wait for two semsignal() functions each at the end of T_1 and T_3.
But what if T_2 cannot execute until T_1 OR T_3 have finished their main code segment? If I make it wait on some OR function it will likely get blocked on first expression without even testing second one incase T_1 does not allow it to run but T_3 does. So my question is that how can I make it wait for such condition?

can you do like so?
T2
...
sem_wait(s1)
T1, T3
...
mutex_lock(m1)
if (!signaled) {
sem_signal(s1)
signaled = true;
}
mutex_unlock(m1)

Related

How to ensure the comparison result still hold in multi-threading?

Suppose there are 3 threads,
Thread 1 and 2 will increase or decrease a global variable X atomically.
thread 1:
atomic_increase(X)
thread 2:
atomic_decrease(X)
Thread 3 will check if the X is greater than some predefined value and do things accordingly.
thread 3:
if( X > 5 ) {... logic 1 ...}
else {... logic 2 ....}
I think the atomic_xxx operations are not enough. They can only synchronize the modifications between thread 1 and 2.
What if X is changed by thread 1 or 2 after thread 3 finishes the comparison and enters logic 1.
Do I have to use a mutex to synchronize all the 3 threads when modifying or reading the X?
ADD 1
BTW, logic 1 and logic 2 don't modify the X.
In short yes, reads also need to be synchronized in some way, otherwise the risk of inconsistent reads is real. A read performed between the read and write of atomic_increase will be inconsistent.
However if logic 1 or logic 2 do stuff to X, your problems doesn't seem to stop right there. I think then you need the concept of a transaction, where it starts with a read (the X > 5 thing) and then ends with a write (logic 1 or logic 2).
Yes, And the Answer is happens before link, Lets say Thread-1 started executing atomic_increase method. It will hold the lock and enter the synchronized block to update X.
private void atomic_increase() {
synchronized (lock) {
X = X + 1; // <-- Thread-1 entered synchronized block, yet to update variable X
}
}
Now, for Thread-3 to run the logic, it needs to read the variable X, and if it is not synchronized (on the same monitor), the variable X read can be an old value since it may not yet updated by Thread-1.
private void runLogic() {
if (X > 5) { // <-- Reading X here, can be inconsistent no
happens-before between atomic_increase and runLogic
} else {
}
}
We could have prevented this by maintaining a happens-before link between atomic operation and run_logic method. If the runLogic is synchronized (on the same monitor) , then it would have to wait until the variable X to be updated by the Thread-1. So we are guaranteed to get the last updated value of X
private void runLogic() {
synchronized (lock) {
if (X > 5) { // <-- Reading X here, will be consistent, since there
is happens-before between atomic_increase and runLogic
} else {
}
}
}
The answer depends on what your application does. If neither logic 1 nor logic 2 modifies X, it is quite possible that there is no need for additional synchronization (besides using an atomic_load to read X).
I assume you use intrinsics for atomic operations, and not simply an increment in a mutex (or in a synchronized block in Java). E.g. in Java there is an AtomicInteger class with methods such as 'incrementAndGet' and 'get'. If you use them, there is probably no need for additional synchronization, but it depends what you actually want to achieve with logic 1 or logic 2.
If you want to e.g. display a message when X > 5, then you can do it. By the time the message is displayed the value of X may have already changed, but it remains the fact, that the message was triggered by X being greater than 5 for at least some time.
In other words, without additional synchronization, you have only the guarantee that logic 1 will be called if X becomes greater than 5, but there is no guarantee that it will remain so during execution of logic 1. It may be ok for you, or not.

Main Thread wait for the fiber returning from Concurrent Effect or not?

we are doing IO operations that we want to run in a separate thread and the main thread should not wait for this operation.
def seperateThread(action: F[Unit]): F[Unit]
ConcurrentEffect[F].start(action).void
If I will call this function like below
for {
_ <- service.seperateThread(request, languageId, cacheItinerary, slices, pricing)
} yield {}
It will do the seperateThread operation in different fiber and return F[Unit] immediately or wait for the operation to complete?
Starting a fiber is a non-blocking operation, so the application flow will right away go to the next instruction.
In order to wait for the operation running in another fiber to complete, you need to invoke the join operation on the fiber object. You can't do it in your implementation since you've called void thus ignoring returned reference to fiber.
If you change your method like this:
def seperateThread[F[_]: ConcurrentEffect: Functor: Sync](action: F[Unit]): F[Fiber[F, Unit]] = ConcurrentEffect[F].start(action)
then you'd be able to use reference to created fiber to join:
for {
fiber <- ConcurrentEffect[IO].start(IO(println("Hello from another fiber!")))
// _ <- do some more operations in parallel ...
result <- fiber.join //here you can access value returned by fiber
//(in your case it's Unit so you can just ignore it).
} yield result
Using fiber's start directly is not advised in most cases, since it could lead to resource leaks. You should consider using background instead which creates Resource which will automatically cancel and clean up fiber at the end of processing.

Acquire/Release semantics

In the answer StoreStore reordering happens when compiling C++ for x86
#Peter Cordes has written
For Acquire/Release semantics to give you the ordering you want, the
last store has to be the release-store, and the acquire-load has to be
the first load. That's why I made y a std::atomic, even though you're
setting x to 0 or 1 more like a flag.
I would like to ask some questions to better understand it.
I have read http://preshing.com/20120913/acquire-and-release-semantics as well. And this article contains:
And it is written that it is ensured that r2 == 42. I don't understand why. On my eye it is possible:
1. Thread2 executed the first line. It is atomic and it is memory_order_acquire so it must be executed before following memory operations.
Now, Thread2 executes the second line: int r2 = A and r2 equals to 0.
Then, Thread1 will execute his code.
Why am I wrong?
The complete quote is:
If we let both threads run and find that r1 == 1, that serves as
confirmation that the value of A assigned in Thread 1 was passed
successfully to Thread 2. As such, we are guaranteed that r2 == 42.
The aquire-release semantics only guarantee that
A = 42 doesn't happen after Ready = 1 in thread 1
r2 = A doesn't happen before r1 = Ready in thread 2
So the value of r1 has to be checked in thread 2 to be sure that A has been written by thread 1. The scenario in the question can indeed happen, but r1 will be 0 in that case.

Multithread+Recursion strategies

I am just starting to learn the ins-and-outs of multithread programming and have a few basic questions that, once answered, should keep me occupied for quite sometime. I understand that multithreading loses its effectiveness once you have created more threads than there are cores (due to context switching and cache flushing). With that understood, I can think of two ways to employ multithreading of a recursive function...but am not quite sure what is the common way to approach the problem. One seems much more complicated, perhaps with a higher payoff...but thats what I hope you will be able to tell me.
Below is pseudo-code for two different methods of multithreading a recursive function. I have used the terminology of merge sort for simplicity, but it's not that important. It is easy to see how to generalize the methods to other problems. Also, I will personally be employing these methods using the pthreads library in C, so the thread syntax mildly reflects this.
Method 1:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk[NUM_CORES] = array of indices partitioning A into (N / NUM_CORES) sized chunks
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//start NUM_CORES threads on working on each chunk of A
for i = 0 to (NUM_CORES - 1) {
thread_id[i] = thread_start(thread[i], MergeSort, chunk[i])
}
//wait for all threads to finish
//Merge chunks appropriately
exit
}
MergeSort ( chunk )
{
MergeSort ( lowerSubChunk )
MergeSort ( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
//Merge(,) not shown
Method 2:
main ()
{
A = array of length N
NUM_CORES = get number of functional cores
chunk = indices 0 and N
thread_id[NUM_CORES] = array of thread id’s
thread[NUM_CORES] = array of thread type
//lock variable aka mutex
THREADS_IN_USE = 1
MergeSort( chunk )
exit
}
MergeSort ( chunk )
{
lock THREADS_IN_USE
if ( THREADS_IN_USE < NUM_CORES ) {
FREE_CORE = find index of unused core
thread_id[FREE_CORE] = thread_start(thread[FREE_CORE], MergeSort, lowerSubChunk)
THREADS_IN_USE++
unlock THREADS_IN_USE
MergeSort( higherSubChunk )
//wait for thread_id[FREE_CORE] and current thread to finish
lock THREADS_IN_USE
THREADS_IN_USE--
unlock THREADS_IN_USE
Merge(lowerSubChunk, higherSubChunk)
}
else {
unlock THREADS_IN_USE
MergeSort( lowerSubChunk )
MergeSort( higherSubChunk )
Merge(lowerSubChunk, higherSubChunk)
}
}
//Merge(,) not shown
Visually, one can think of the differences between these two methods as follows:
Method 1: creates NUM_CORES separate recursion trees, each one having a single core traversing it.
Method 2: creates a single recursion tree but has all cores traversing it. In particular, whenever there is a free core, it is set to work on the "left child subtree" of the first node where MergeSort is called after the core is freed.
The problem with Method 1 is that if it is the case that the running time of the recursive function varies with the distribution of values within each initial subchunk (i.e. the chunk[i]), one thread could finish much faster leaving a core sitting idle while the others finish. With Merge Sort this is not likely to be the case since the work of MergeSort happens in Merge whose runtime isn't affected much by the distribution of values in the (sorted) subchunks. However, with a more involved recursive function, the running time on one subchunk could be much longer!
With Method 2 it is possible to have the same problem. Again, with merge sort its not clear since the running time for each subchunk is likely to be similar, but the line //wait for thread_id[FREE_CORE] and current thread to finish would also require one core to wait for the other. However, with Method 2, all calls to Merge run ASAP as opposed to Method 1 where one must wait for NUM_CORES calls to MergeSort to finish and then do NUM_CORES - 1 merges afterward (although you can multithread this as well...to an extent)
(though the syntax might not be completely correct)
Are both of these methods used in practice? Are there situations where one is more beneficial over the other? Is this the correct way to implement Method 2? (in this case, THREADS_IN_USE is a semaphore?)
Thanks so much for your help!

Thread and Synchronization

I'm confused about how threads and synchronization works. I am working through a sample problem that is described like so:
There are two threads: P and Q. The variable, counter, is shared by both threads.
Modification of counter in one thread is visible to the other thread. The
increment instruction adds one to the variable, storing the new value.
1 global integer counter = 0
2
3 thread P()
4 incr(counter)
5 print counter
6 end
7
8 thread Q()
9 print counter
10 incr(counter)
11 print counter
12 incr(counter)
13 end
There are three print statements that output the value of counter. In the output
list below, indicate whether the given output is possible and if it is, give
the interleaving instructions (using thread and line numbers) of P and Q that
can lead to the output.
The example has output 122 is it possible? which can be produced by P4, Q9, Q10, P5, Q11, Q12. I can't wrap my head around how this works.
Assume thread P starts first and increments "counter" by one. Then it's suspended and thread Q starts, reads "counter" and prints its value ("1"). Next thread Q increments "counter", which is now "2". Then thread Q gets suspended and thread P continues. It now reads "counter" and prints its value ("2"). Thread P terminates. Thread Q continues, reads and prints "counter" ("2"). It then increments "counter" by one.
The output therefore is: "122"
That's one possible sequence of execution. Generally speaking you can never tell when a thread gets suspended and when it continues, that's the whole point of this exercise. By adding synchronization mechanisms (which this example is completely lacking) you can get control over the sequence of execution again.

Resources