I have the following Interview question:
class someClass
{
int sum=0;
public void foo()
{
for(int i=0; i<100; i++)
{
sum++
}
}
}
There are two parallel threads running through the foo method.
the value of sum at the end will vary from 100 to 200.
the question is why. As I understand only one thread gets a cpu and threads get preempted while running. At what point can the disturbance cause the sum not reaching 200?
The increment isn't atomic. It reads the value, then writes out the incremented value. In between these two operations, the other thread can interact with the sum in complex ways.
The range of values is not in fact 100 to 200. This range is based on the incorrect assumption that threads either take turns or they perform each read simultaneously. There are many more possible interleavings, some of which yield markedly different values. The worst case is as follows (x represents the implicit temporary used in the expression sum++):
Thread A Thread B
---------------- ----------------
x ← sum (0)
x ← sum (0)
x + 1 → sum (1)
x ← sum (1)
x + 1 → sum (2)
⋮
x ← sum (98)
x + 1 → sum (99)
x + 1 → sum (1)
x ← sum (1)
x ← sum (1)
x + 1 → sum (2)
⋮
x ← sum (99)
x + 1 → sum (100)
x + 1 → sum (2)
Thus the lowest possible value is 2. In simple terms, the two threads undo each the other's hard work. You can't go below 2 because thread B can't feed a zero to thread A — it can only output an incremented value — and thread A must in turn increment the 1 that thread B gave it.
The line sum++ is a race condition.
Both threads can read the value of sum as say 0, then each increments its value to 1 and stores that value back. This means that the value of sum will be 1 instead of 2.
Continue like that and you will get a result between 100 and 200.
Most CPU's have multiple cores now. So if we lock a mutex at the beginning of the function foo() and unlock the mutex after the for loop finishes then the 2 threads running on different cores will still yield the answer 200.
Related
Problem: Choose an element from the array to maximize the sum after XOR all elements in the array.
Input for problem statement:
N=3
A=[15,11,8]
Output:
11
Approach:
(15^15)+(15^11)+(15^8)=11
My Code for brute force approach:
def compute(N,A):
ans=0
for i in A:
xor_sum=0
for j in A:
xor_sum+=(i^j)
if xor_sum>ans:
ans=xor_sum
return ans
Above approach giving the correct answer but wanted to optimize the approach to solve it in O(n) time complexity. Please help me to get this.
If you have integers with a fixed (constant) number of c bites then it should be possible because O(c) = O(1). For simplicity reasons I assume unsigned integers and n to be odd. If n is even then we sometimes have to check both paths in the tree (see solution below). You can adapt the algorithm to cover even n and negative numbers.
find max in array with length n O(n)
if max == 0 return 0 (just 0s in array)
find the position p of the most significant bit of max O(c) = O(1)
p = -1
while (max != 0)
p++
max /= 2
so 1 << p gives a mask for the highest set bit
build a tree where the leaves are the numbers and every level stands for a position of a bit, if there is an edge to the left from the root then there is a number that has bit p set and if there is an edge to the right there is a number that has bit p not set, for the next level we have an edge to the left if there is a number with bit p - 1 set and an edge to the right if bit p - 1 is not set and so on, this can be done in O(cn) = O(n)
go through the array and count how many times a bit at position i (i from 0 to p) is set => sum array O(cn) = O(n)
assign the root of the tree to node x
now for each i from p to 0 do the following:
if x has only one edge => x becomes its only child node
else if sum[i] > n / 2 => x becomes its right child node
else x becomes its left child node
in this step we choose the best path through the tree that gives us the most ones when xoring O(cn) = O(n)
xor all the elements in the array with the value of x and sum them up to get the result, actually you could have built the result already in the step before by adding sum[i] * (1 << i) to the result if going left and (n - sum[i]) * (1 << i) if going right O(n)
All the sequential steps are O(n) and therefore overall the algorithm is also O(n).
I'm new to concurrent programming and I got around this question I can't really understand what's wrong with, consider the following pseudo-code with x being a shared (conditional) variable initialized with 0:
signal(c)
wait(c)
x = x + 1
signal(c)
What are the possible results for x, running 2 threads (concurrently) in a single processor?
After executing both threads either everything went fine and x = 2 or both "x=x+1" line are executed at the same time and then x = 1 because both thread read the initial value of x as 0 at the same time.
Given the threading scenario where two threads execute the computation:
x = x + 1 where x is a shared variable
What are the possible results and describe why your answer could happen.
This is a textbook problem from my OS book and I was curious if I needed more information to answer this such as what x is initialized too and how often the threads execute this command or just once. My answer originally was that it could be two possible answers depending on the order that the threads execute them by the OS.
This is a rather simple task, so there isnt probably too much that could go wrong.
The only issue i can immediately think of is if one thread uses on old value of x in its calculation.
eg:
start with x = 2
1) thread A reads x = 2
2) thread B reads x = 2
3) thread A writes x = 2 + 1
x = 3
4) thread B writes x = 2(old value of x) + 1
x = 3 when it should be 4
this would be even more apparent if more than 1 thread reads the value before the first thread writes.
My book presents a simple example which I'm a bit confused about:
It says, "consider the following program, and assume that the fine-grained atomic actions are reading and writing the variables:"
int y = 0, z = 0;
co x = y+z; // y=1; z=2; oc;
"If x = y + z is implemented by loading a register with y and then adding z to it, the final value of x can be 0,1,2, or 3. "
2? How does 2 work?
Note: co starts a concurrent process and // denote parallel-running statements
In your program there are two parallel sequences:
Sequence 1: x = y+z;
Sequence 2: y=1; z=2;
The operations of sequence 1 are:
y Copy the value of y into a register.
+ z Add the value of z to the value in the register.
x = Copy the value of the register into x.
The operations of sequence 2 are:
y=1; Set the value of y to 1.
z=2; Set the value of z to 2.
These two sequences are running at the same time, though the steps within a sequence must occur in order. Therefore, you can get an x value of '2' in the following sequence:
y=0
z=0
y Copy the value of y into a register. (register value is now '0')
y=1; Set the value of y to 1. (has no effect on the result, we've already copied y to the register)
z=2; Set the value of z to 2.
+ z Add the value of z to the value in the register. (register value is now '2')
x = Copy the value of the register into x. (the value of x is now '2')
Since they are assumed to run in parallel, I think an even simpler case could be y=0, z=2 when the assignment x = y + z occurs.
Obviously, atomic operations make sure that different threads don't clobber a value. But is this still true across processes, when using shared memory? Even if the processes happen to be scheduled by the OS to run on different cores? Or across different distinct CPUs?
Edit: Also, if it's not safe, is it not safe even on an operating system like Linux, where processes and threads are the same from the scheduler's point of view?
tl;dr: Read the fine print in the documentation of the atomic operations. Some will be atomic by design but may trip over certain variable types. In general, though, an atomic operation will maintain its contract between different processes just as it does between threads.
An atomic operation really only ensures that you won't have an inconsistent state if called by two entities simultaneously. For example, an atomic increment that is called by two different threads or processes on the same integer will always behave like so:
x = initial value (zero for the sake of this discussion)
Entity A increments x and returns the result to itself: result = x = 1.
Entity B increments x and returns the result to itself: result = x = 2.
where A and B indicate the first and second thread or process that makes the call.
A non-atomic operation can result in inconsistent or generally crazy results due to race conditions, incomplete writes to the address space, etc. For example, you can easily see this:
x = initial value = zero again.
Entity A calls x = x + 1. To evaluate x + 1, A checks the value of x (zero) and adds 1.
Entity B calls x = x + 1. To evaluate x + 1, B checks the value of x (still zero) and adds 1.
Entity B (by luck) finishes first and assigns the result of x + 1 = 1 (step 3) to x. x is now 1.
Entity A finishes second and assigns the result of x + 1 = 1 (step 2) to x. x is now 1.
Note the race condition as entity B races past A and completes the expression first.
Now imagine if x were a 64-bit double that is not ensured to have atomic assignments. In that case you could easily see something like this:
A 64 bit double x = 0.
Entity A tries to assign 0x1122334455667788 to x. The first 32 bits are assigned first, leaving x with 0x1122334400000000.
Entity B races in and assigns 0xffeeddccbbaa9988 to x. By chance, both 32 bit halves are updated and x is now = 0xffeeddccbbaa9988.
Entity A completes its assignment with the second half and x is now = 0xffeeddcc55667788.
These non-atomic assignments are some of the most hideous concurrent bugs you'll ever have to diagnose.