Executing threads concurrently - multithreading

I am currently studying for my upcoming midterm and I am confused about this question which I have answered, but I am not completely sure if I answered it correctly:
Given two threads:
Initially x = 0
Thread 1: x = x+1
Thread 2: x = x + 2
What are the possible values of x in these cases?
My answer to this is three possible values:
If both of them do not run exactly at the same time, but in sequentially x can be 3. If they run at the same time then we get two values which is 1 and 2. Am I correct in this thinking? Since these are threads in the same process, I am thinking they share memory so that is also one of my reasoning.

Related

Difference between mutex/spinlock/semaphore in this use case?

I'm not sure if my answers to the following exercise is correct:
Suppose the program is run with two threads. Further suppose that the
following sequence of events occurs:
Time Thread 0 Thread 1
0 pthread_mutex_lock(&mut0) pthread_mutex_lock(&mut1)
1 pthread_mutex_lock(&mut1) pthread_mutex_lock(&mut0)
b. Would this be a problem if the program used busy-waiting (with two flag
variables) instead of mutexes?
c. Would this be a problem if the program used semaphores instead of mutexes?
Are the outcomes for this not the exact same in that they'd result in a deadlock?
For b) if we use a spinlock, the outcome would look something like this, right? (Let flag0 = flag1 = 1)
Time Thread 0 Thread 1
0 flag0--; flag1--;
1 while(flag1 == 0); while(flag2 == 0);
For c) I'm assuming they're referring to a binary semaphore, which is effectively the same thing as a mutex so it'd also result in deadlock.
Is there a mistake in my reasoning? My answers seem suspiciously simple but I don't see how spinlock/semaphores would change anything

Result of 100 concurrent threads, each incrementing variable to 100

I'm writing to ask about this question from 'The Little Book of Semaphores' by Allen B. Downey.
Question from 'The Little Book of Semaphores'
Puzzle: Suppose that 100 threads run the following program concurrently. (if you are not familiar with Python, the for loop runs the update 100 times.):
for i in range(100):
temp = count
count = temp + 1
What is the largest possible value of count after all threads have completed? What is the smallest possible value? Hint: the first question is easy; the second is not.
My understanding is that count is a variable shared by all threads, and that it's initial value is 0.
I believe that the largest possible value is 10,000, which occurs when there is no interleaving between threads.
I believe that the smallest possible value is 100. If line 2 is executed for each thread, they will each have a value of temp = 0. If line 3 is then executed for each thread, they will each set count = 1. If the same behaviour occurs in each iteration, the final value of count will be 100.
Is this correct, or is there another execution path that can result in a value smaller than 100 for count?
The worst case that I can think of will leave count equal to two. It's extremely unlikely that this would ever happen in practice, but in theory, it's possible. I'll need to talk about Thread A, Thread B, and 98 other threads:
Thread A reads count as zero, but then it is preempted before it can do anything else,
Thread B is allowed to run 99 iterations of its loop, and 98 other threads all run to completion before thread A finally is allowed to run again,
Thread A writes 1 to count before—are you ready to believe this?—it gets preempted again!
Thread B starts its 100th iteration. It gets as far as reading count as 1 (just now written by thread A) before thread A finally comes roaring back to life and runs to completion,
Thread B is last to cross the finish line after it writes 2 to count.

Where exactly is the synchronization point when using semaphores

I have a questions regarding the actual synchronization points in the following c - like psuedocode examples. In our slides the synchronization point is shown to occur at the point indicated below.
Two process 2 way synchronization, x and y = 0 to start
Process 1
signal(x);
//Marked as sync point
wait(y);
Process 2
signal(y);
//This arrow isn't as exact but appears to be near the middle again.
wait(x);
Now for just two process 2 way sync this seems to make sense. However, when expanding this two 3 process 3 way sync this logic seems to break down. There are no arrows given in the slide deck.
3 Process 3 Way Synchronization (S1, S2, S3 = 0 to start)
Process 0
signal(S0);
signal(S0);
wait(S1);
wait(S2);
Process 1
signal(S1);
signal(S1);
wait(S0);
wait(S2);
Process 2
signal(S2);
signal(S2);
wait(S0);
wait(S1);
Now I find the sync point couldn't actually be between the signal and the wait. For example:
Let's so Process 0 runs first and signals S0 once. Now S0 = 1. Now let's say that before the second signal(S0) can be run that the process is interrupted and Process 1 runs next. Let's say that only one signal(S1) can be run before the process is interrupted. Now the value of S1 = 1. Now let's say that Process 2 runs next. This signal(S2) is allowed to run so S2 = 2. Now the process is not interrupted so it is allowed to continue. Wait(S0) runs which decrements S0 by 1. S0 now equals 0. However, process 2 is allowed to continue running because S0's value is not a negative value. Now wait(S1) is allowed to run and a similar thing here happens.
At this point Process 2 is done running. However Process 0 and Process 1 did not finish their signal's. If the sync point is truly in between signals and wait then this solution to 3 way 3 process sync is incorrect.
A similar issue can arise in solution for 3 process 3 way synchronization that allows each process to run more than one instance of itself at a time. Attached is that slide but I will not explain why the "middle" point in the process can't be the sync point as I already have a huge wall of text.
Please let me know which way is correct, no amount of googling has given me an answer. I will include all relevant slides.

Parallel processing - Connected Data

Problem
Summary: Parallely apply a function F to each element of an array where F is NOT thread safe.
I have a set of elements E to process, lets say a queue of them.
I want to process all these elements in parallel using the same function f( E ).
Now, ideally I could call a map based parallel pattern, but the problem has the following constraints.
Each element contains a pair of 2 objects.( E = (A,B) )
Two elements may share an object. ( E1 = (A1,B1); E2 = (A1, B2) )
The function f cannot process two elements that share an object. so E1 and E2 cannot be processing in parallel.
What is the right way of doing this?
My thoughts are like so,
trivial thought: Keep a set of active As and Bs, and start processing an Element only when no other thread is already using A OR B.
So, when you give the element to a thread you add the As and Bs to the active set.
Pick the first element, if its elements are not in the active set spawn a new thread , otherwise push it to the back of the queue of elements.
Do this till the queue is empty.
Will this cause a deadlock ? Ideally when a processing is over some elements will become available right?
2.-The other thought is to make a graph of these connected objects.
Each node represents an object (A / B) . Each element is an edge connecting A & B, and then somehow process the data such that we know the elements are never overlapping.
Questions
How can we achieve this best?
Is there a standard pattern to do this ?
Is there a problem with these approaches?
Not necessary, but if you could tell the TBB methods to use, that'll be great.
The "best" approach depends on a lot of factors here:
How many elements "E" do you have and how much work is needed for f(E). --> Check if it's really worth it to work the elements in parallel (if you need a lot of locking and don't have much work to do, you'll probably slow down the process by working in parallel)
Is there any possibility to change the design that can make f(E) multi-threading safe?
How many elements "A" and "B" are there? Is there any logic to which elements "E" share specific versions of A and B? --> If you can sort the elements E into separate lists where each A and B only appears in a single list, then you can process these lists parallel without any further locking.
If there are many different A's and B's and you don't share too many of them, you may want to do a trivial approach where you just lock each "A" and "B" when entering and wait until you get the lock.
Whenever you do "lock and wait" with multiple locks it's very important that you always take the locks in the same order (e.g. always A first and B second) because otherwise you may run into deadlocks. This locking order needs to be observed everywhere (a single place in the whole application that uses a different order can cause a deadlock)
Edit: Also if you do "try lock" you need to ensure that the order is always the same. Otherwise you can cause a lifelock:
thread 1 locks A
thread 2 locks B
thread 1 tries to lock B and fails
thread 2 tries to lock A and fails
thread 1 releases lock A
thread 2 releases lock B
Goto 1 and repeat...
Chances that this actually happens "endless" are relatively slim, but it should be avoided anyway
Edit 2: principally I guess I'd just split E(Ax, Bx) into different lists based on Ax (e.g one list for all E's that share the same A). Then process these lists in parallel with locking of "B" (there you can still "TryLock" and continue if the required B is already used.

Reusable Barrier Algorithm

I'm looking into the Reusable Barrier algorithm from the book "The Little Book Of Semaphores" (archived here).
The puzzle is on page 31 (Basic Synchronization Patterns/Reusable Barrier), and I have come up with a 'solution' (or not) which differs from the solution from the book (a two-phase barrier).
This is my 'code' for each thread:
# n = 4; threads running
# semaphore = n max., initialized to 0
# mutex, unowned.
start:
mutex.wait()
counter = counter + 1
if counter = n:
semaphore.signal(4) # add 4 at once
counter = 0
mutex.release()
semaphore.wait()
# critical section
semaphore.release()
goto start
This does seem to work, I've even inserted different sleep timers into different sections of the threads, and they still wait for all the threads to come before continuing each and every loop. Am I missing something? Is there a condition that this will fail?
I've implemented this using the Windows library Semaphore and Mutex functions.
Update:
Thank you to starblue for the answer. Turns out that if for whatever reason a thread is slow between mutex.release() and semaphore.wait() any of the threads that arrive to semaphore.wait() after a full loop will be able to go through again, since there will be one of the N unused signals left.
And having put a Sleep command for thread number 3, I got this result where one can see that thread 3 missed a turn the first time, with thread 1 having done 2 turns, and then catching up on the second turn (which was in fact its 1st turn).
Thanks again to everyone for the input.
One thread could run several times through the barrier while some other thread doesn't run at all.

Resources