these two thread run concurrently in a shared memory (all variables are
shared between the two threads):
Thread A
for (i=0; i<5; i++) {
x = x + 1;
}
Thread B
for (j=0; j<5; j++) {
x = x + 2;
}
Assume a single-processor system
a- Give a concise proof why x≤15 when both threads have completed.
b- Suppose we replace x = x+2 in Thread B with x = x-1 what will be the value of X.
I do not understand this question and I google it and I found answer but I can not get it
I wanna some explanation for it.
If the threading works perfectly, the highest value 'x' can have is 15. This all depends on the scheduler of the operating system.
Note that I am assuming the initial value of x is 0!
Lets say that Thread A and Thread B are serialized.
The value of x after Thread A is complete will be 5.
i | x
-------
0 | 1
1 | 2
2 | 3
3 | 4
4 | 5
The value of x going into Thread B will be 5, resulting x to be a final value of 15
i | x
-------
0 | 7
1 | 9
2 | 11
3 | 13
4 | 15
Now, things typically don't happen this way, and a thread will both read an initial value of x and do their addition, then write their modified value back into memory. The following can happen.
Thread A reads the value 'x' as 0
Thread B reads the value 'x' as 0
Thread A adds 1 to x making its local copy of x, 1
Thread B adds 2 to x making its local copy of x, 2
Thread A writes its modified value of x as 1
Thread B writes its modified value of x as 2 (overwriting Thread A's modification)
Therefore, x will be no more than 15, but depending on the scheduler, will be less!
a)
Instruction x=x+1 is not single instruction at low level and It consist of sequence of read x, then add 1 to x and then update x's memory . Hence it may be happen two threads read same value of x.
Suppose, if two threads read same value of x variable then update x and write back to x this cause x < 15.
b)
For the same reason value of x may be between 0 to 5 if your instruction is x=x-1.
Try this you will learn more!
compile your c code with -S option it will create a compiled assemble code for your program. Then you can understand x = x + 1 is not a single instruction. And switching between threads can be possible before completion of x = x + 1 instruction hence its not atomic.
Related
You are working at the cash counter at a fun-fair, and you have different types of coins available to you in infinite quantities. The value of each coin is already given. Can you determine the number of ways of making change for a particular number of units using the given types of coins?
counter = 0
def helper(n,c):
global counter
if n == 0:
counter += 1
return
if len(c) == 0:
return
else:
if n >= c[0]:
helper(n - c[0], c)
helper(n,c[1:])
def getWays(n, c):
helper(n,c)
print(counter)
return counter ```
#the helper function takes n and c
#where
#n is amount whose change is to be made
#c is a list of available coins
Let n be the amount of currency units to return as change. You wish to find N(n), the number of possible ways to return change.
One easy solution would be to first choose the "first" coin you give (let's say it has value c), then notice that N(n) is the sum of all the values N(n-c) for every possible c. Since this appears to be a recursive problem, we need some base cases. Typically, we'll have N(1) = 1 (one coin of value one).
Let's do an example: 3 can be returned as "1 plus 1 plus 1" or as "2 plus 1" (assuming coins of value one and two exist). Therefore, N(3)=2.
However, if we apply the previous algorithm, it will compute N(3) to be 3.
+------------+-------------+------------+
| First coin | Second coin | Third coin |
+------------+-------------+------------+
| 2 | 1 | |
+------------+-------------+------------+
| | 2 | |
| 1 +-------------+------------+
| | 1 | 1 |
+------------+-------------+------------+
Indeed, notice that returning 3 units as "2 plus 1" or as "1 plus 2" is counted as two different solutions by our algorithm, whereas they are the same.
We therefore need to apply an additional restriction to avoid such duplicates. One possible solution is to order the coins (for example by decreasing value). We then impose the following restriction: if at a given step we returned a coin of value c0, then at the next step, we may only return coins of value c0 or less.
This leads to the following induction relation (noting c0 the value of the coin returned in the last step): N(n) is the sum of all the values of N(n-c) for all possible values of c less than or equal to c0.
Happy coding :)
def exp3(a,b):
if b == 1:
return a
if (b%2)*2 == b:
return exp3(a*a, b/2)
else: return a*exp3(a,b-1)
This is a recursive exponentiator program.
Question 1:
If b is even, it will exceute (b%2)2 == b. If b is odd, it will exceute aexp3(a,b-1). There is no problem in my program. If b is 4, (4%2)*2=0, and 0 is not equal to b. So I can't understand how to calculate b when it's even.
Question 2:
I want to calucate the number of steps in the program. so according to my textbook, I can get the formual as follows.
b even t(b) = 6 + t(b/2)
b odd t(b) = 6 + t(b-1)
Why is the first number 6? How can I get the number 3 in the beginning?
Your (b%2)*2 == b test is never true. I think you want b % 2 == 0 to test if b is even. The code still gets the right answer because the other recursive case (intended only for odd b values) works for even ones too (it's just less efficient).
As for your other question, I have no idea where the 6 is coming from either. It depends a lot on what you're counting as a "step". Usually it's most useful to discuss performance in terms of "Big-O" values rather than specific numbers.
I have n elements (e.g. A, B, C and D) and need to do calculations between all of those.
Calculation 1 = A with B
Calculation 2 = A with C
Calculation 3 = A with D
Calculation 4 = B with C
Calculation 5 = B with D
Calculation 6 = C with D
In reality there are more than 1000 elements and I want to parallelise the process.
Note that I can't access an element from 2 threads simultaneously. This for example makes it impossible to do Calculation 1 and Calculation 2 at the same time because they both use the element A.
Edit: I could access an element from 2 threads but it makes everything very slow if i just split up the calculations and depend on locks for threadsafety.
Is there already an distribution algorithm for these kind of problems?
It seems like a lot of people must have had the same problem already but i couldn't find anything in the great internet. ;)
Single thread example code:
for (int i = 0; i < elementCount; i++)
{
for (int j = i + 1; j < elementCount; j++)
{
Calculate(element[i], element[j]);
}
}
You can apply round-robin tournament algorithm that allows to organize all possible pairs (N*(N-1) results).
All set elements (players) form two rows, column is pair at the
current round. First element is fixed, others are shifted in cyclic manner.
So you can run up to N/2 threads to get results for the first pairs set, then reorder indexes and continue
Excerpt from wiki:
The circle method is the standard algorithm to create a schedule for a round-robin tournament. All competitors are assigned to numbers, and then paired in the first round:
Round 1. (1 plays 14, 2 plays 13, ... )
1 2 3 4 5 6 7
14 13 12 11 10 9 8
then fix one of the contributors in the first or last column of the table (number one in this example) and rotate the others clockwise one position
Round 2. (1 plays 13, 14 plays 12, ... )
1 14 2 3 4 5 6
13 12 11 10 9 8 7
Round 3. (1 plays 12, 13 plays 11, ... )
1 13 14 2 3 4 5
12 11 10 9 8 7 6
until you end up almost back at the initial position
Round 13. (1 plays 2, 3 plays 14, ... )
1 3 4 5 6 7 8
2 14 13 12 11 10 9
It is simple enough to prove there is no way to distribute your calculations so that collisions never occur (that is, unless you manually order the computations and place round-boundaries, like #Mbo suggests), meaning that there is no distribution amongst multiple threads that will allow you to never lock.
Proof :
Given your requirement that any computation involving data object A should happen on a given thread T (only way to make sure you never lock on A).
Then it follows that thread T has to deal with at least one pair containing each other objects (B, C, D) of the input list.
It follows from the basic requirement that T is also to handle everything object-B related. And C. And D. So everything.
Therefore, only T can work.
QED. There is no possible parallelization that will never lock.
Way around #1 : map/reduce
That said... This is a typical case of divide and conquer. You are right that simple additions can require critical section locks, without the order of execution mattering. That is because your critical operation (addition) has a nice property, associativeness : A+(B+C) = (A+B)+C, on top of being commutative.
In other words, this operation is a candidate for a (parralel-friendly) reduce operation.
So the key here is probably :
Emit a stream of all interesting pairs
Map each pair to one or more partial results
Group each partial result by its master object (A, B, C)
Reduce each group by combining the partial results
A sample (pseudo) code
static class Data { int i = 0; }
static class Pair { Data d1; Data d2; }
static class PartialComputation { Data d; int sum; }
Data[] data = ...
Stream<Pair> allPairs = ... // Something like IntStream(0, data.length-1).flatMap(idx -> IntStream(idx+1 to data.length ).map(idx2 -> new Pair(data[idx], data[idx2])))
allPairs.flatMap(pair -> Stream.of(new ParticalComputation(pair.d1, pair.d1.i + pair.d2.i), new PartialComputation(pair.d2, pair.d2.i+pair.d1.i)) // Map everything, parallely, to partial results keyable by the original data object
allPairs.collect(Collectors.groupByParallel(
partialComp -> partialComp.d, // Regroup by the original data object
Collectors.reducing(0, (sum1, sum2) -> sum1.sum + sum2.sum)) // reduce by summing
))
Way around 2 : trust the implementations
Fact is, uncontended locks in java have gotten cheaper. On top of that, pure locking sometimes has better alternatives, like Atomic types in Java (e.g. AtomicLong if you are summing stuff), that use CAS instead of locking, which can be faster (google for it, I usually refer to the Java Concurrency In Practice book for hard numbers.)
The fact is, if you have 1000 to 10k different elements (which translates to at least millions of pairs) and, like, 8 CPUs, the contention (or probability that at least 2 of your 8 threads will be processing the same element) is pretty low. And I would rather measure it first-hand rather than saying upfront "I can not affor the locks", especially if the operation can be implemented using Atomic types.
I was struggling to see how this function worked. For the nth number it should calculate the sum of the previous three elements.
f' :: Integer->Integer
f' = helper 0 0 1
where
helper a b c 0 = a
helper a b c n = helper b c (a+b+c) (n-1)
Thanks for your time
Perhaps the part that you're missing is that
f' = helper 0 0 1
is the same thing as
f' x = helper 0 0 1 x
Otherwise, see Dave's answer.
It's a fairly simple recursive function. When called with three elements (I'm guessing seeds for the sequence) and a number of terms, it calls itself, cycling the seed left by one and adding the new term (a+b+c). When the "number of steps remaining" counter reaches 0, the edge case kicks in and just returns the current sequence value. This value is passed back up all the function calls, giving the final output.
The f' function provides a simple wrapper around the helper function (which does the work I described above), providing a standard seed and passing the requested term as the 4th parameter (MathematicalOrchid explains this nicely).
say its called with f' 5
below is the sequence in which it will get executed:
iteration 1: helper 0 0 1 5
iteration 2: helper 0 1 (0+0+1) 4
iteration 3: helper 1 1 (0+1+1) 3
iteration 4: helper 1 2 (1+1+2) 2
iteration 5: helper 2 4 (1+2+4) 1
iteration 6: helper 4 7 (2+4+7) 0 => 4
It is like a Fibonacci sequence, but for 3 numbers, not 2:
F'_n = F'_{n-1} + F'_{n-2} + F'_{n-3}
where Fibonacci sequence is
F_n = F_{n-1} + F_{n-2}
I have a global shared variable and that is being updated 5 times by each of the 5 threads spawned. As per my understanding the increment operation is consisting of 3 instructions
load reg, M
inc reg
store reg, M
So I want to ask that in this scenario what would be the maximum and minimum value given arbitrary interleaving in the 5 threads.
So according to me the maximum value will be 25 ( I am 100% sure that it can be more than 25) and the minimum value is 5. But I am not so sure on minimum value. Can it be less that 5 in some arbitrary interleaving ?
Any inputs will be greatly appreciated.
/* Global Variable */
int var = 0;
/* Thread function */
void thread_func()
{
for(int c = 0; c < 5; c++)
var++;
}
Given your definition of increment, I agree with your max of 25.
However, I believe the min can be 2 under the following scenario. I've named the 5 threads A, B, C, D and E.
A loads 0
C, D, E run to completion
B runs through 4 of its 5 iterations.
A increments 0 to 1 and stores the result (1).
B loads 1
A runs to completion
B increments 1 to 2 and stores 2.
If I use the same logic given by jtdubs, the minimum value should be 1 in the following case.
Lets use the same naming of 5 threads as A, B, C, D, and E.
A loads 0
B, C, D, E run to completion and incremented to maximum value 20 (5 increments by each of 4 threads).
A increments 0 to 1 and store the result 1.
I agree with a minimum of 2 (not 1).
The minimum equals 1 solution ignores the fact that A still hasn't run to completion after it stores 1 in the shared memory.
With no other thread left to "interfere", thread A must still run through the remaining 4 iterations ending with the result 5.
What the minimum of 2 solution enables is an end-game between the two remaining threads A and B, after all other threads have finished running, leading to the minimum possible outcome.
B "wastes" 4 iterations only to load 1 again, increment it and store 2 after A has run to completion.