In the answer StoreStore reordering happens when compiling C++ for x86
#Peter Cordes has written
For Acquire/Release semantics to give you the ordering you want, the
last store has to be the release-store, and the acquire-load has to be
the first load. That's why I made y a std::atomic, even though you're
setting x to 0 or 1 more like a flag.
I would like to ask some questions to better understand it.
I have read http://preshing.com/20120913/acquire-and-release-semantics as well. And this article contains:
And it is written that it is ensured that r2 == 42. I don't understand why. On my eye it is possible:
1. Thread2 executed the first line. It is atomic and it is memory_order_acquire so it must be executed before following memory operations.
Now, Thread2 executes the second line: int r2 = A and r2 equals to 0.
Then, Thread1 will execute his code.
Why am I wrong?
The complete quote is:
If we let both threads run and find that r1 == 1, that serves as
confirmation that the value of A assigned in Thread 1 was passed
successfully to Thread 2. As such, we are guaranteed that r2 == 42.
The aquire-release semantics only guarantee that
A = 42 doesn't happen after Ready = 1 in thread 1
r2 = A doesn't happen before r1 = Ready in thread 2
So the value of r1 has to be checked in thread 2 to be sure that A has been written by thread 1. The scenario in the question can indeed happen, but r1 will be 0 in that case.
Related
Suppose there are 3 threads,
Thread 1 and 2 will increase or decrease a global variable X atomically.
thread 1:
atomic_increase(X)
thread 2:
atomic_decrease(X)
Thread 3 will check if the X is greater than some predefined value and do things accordingly.
thread 3:
if( X > 5 ) {... logic 1 ...}
else {... logic 2 ....}
I think the atomic_xxx operations are not enough. They can only synchronize the modifications between thread 1 and 2.
What if X is changed by thread 1 or 2 after thread 3 finishes the comparison and enters logic 1.
Do I have to use a mutex to synchronize all the 3 threads when modifying or reading the X?
ADD 1
BTW, logic 1 and logic 2 don't modify the X.
In short yes, reads also need to be synchronized in some way, otherwise the risk of inconsistent reads is real. A read performed between the read and write of atomic_increase will be inconsistent.
However if logic 1 or logic 2 do stuff to X, your problems doesn't seem to stop right there. I think then you need the concept of a transaction, where it starts with a read (the X > 5 thing) and then ends with a write (logic 1 or logic 2).
Yes, And the Answer is happens before link, Lets say Thread-1 started executing atomic_increase method. It will hold the lock and enter the synchronized block to update X.
private void atomic_increase() {
synchronized (lock) {
X = X + 1; // <-- Thread-1 entered synchronized block, yet to update variable X
}
}
Now, for Thread-3 to run the logic, it needs to read the variable X, and if it is not synchronized (on the same monitor), the variable X read can be an old value since it may not yet updated by Thread-1.
private void runLogic() {
if (X > 5) { // <-- Reading X here, can be inconsistent no
happens-before between atomic_increase and runLogic
} else {
}
}
We could have prevented this by maintaining a happens-before link between atomic operation and run_logic method. If the runLogic is synchronized (on the same monitor) , then it would have to wait until the variable X to be updated by the Thread-1. So we are guaranteed to get the last updated value of X
private void runLogic() {
synchronized (lock) {
if (X > 5) { // <-- Reading X here, will be consistent, since there
is happens-before between atomic_increase and runLogic
} else {
}
}
}
The answer depends on what your application does. If neither logic 1 nor logic 2 modifies X, it is quite possible that there is no need for additional synchronization (besides using an atomic_load to read X).
I assume you use intrinsics for atomic operations, and not simply an increment in a mutex (or in a synchronized block in Java). E.g. in Java there is an AtomicInteger class with methods such as 'incrementAndGet' and 'get'. If you use them, there is probably no need for additional synchronization, but it depends what you actually want to achieve with logic 1 or logic 2.
If you want to e.g. display a message when X > 5, then you can do it. By the time the message is displayed the value of X may have already changed, but it remains the fact, that the message was triggered by X being greater than 5 for at least some time.
In other words, without additional synchronization, you have only the guarantee that logic 1 will be called if X becomes greater than 5, but there is no guarantee that it will remain so during execution of logic 1. It may be ok for you, or not.
I understand that, for 2 always blocks with the same trigger, their order of evaluation is completely unpredictable.
However, suppose I have:
always #(a) begin : blockX
c = 0;
d = a + 2;
if(c != 1) e = 2;
end
always #(a) begin : blockY
e = 3;
end
always #(d) begin : blockZ
c = 1;
e = 1;
end
Suppose block X evaluates first. Does changing d in blockX immediately jump to blockZ? If not, when is blockZ evaluated with respect to blockY?
My programmer's instinct thinks of the sequence of events as a stack, where evaluating blockX is like a function call to blockZ and I immediately jump there in the code, then finish evaluating blockX.
However, because we call the active events queue, well, a queue, this suggests blockZ is enqueued at the back of the active events queue, and I'm 100% guaranteed it will be evaluated last (unless there are other triggered always blocks).
There's also the intermediate possibility, where it's neither first nor last but is also evaluated in a random and unpredictable order.
So in this example, are 1, 2, or 3 all possible final values for e, depending on how the compiler is feeling at run time?
Additionally, while I understand, of course, this represents awful style, where might I find the specification for this kind of behvaior?
Always blocks are not function calls. See a recent answer I just gave for a similar question. These blocks are concurrent processes. The LRM only guarentees the ordering of statements within a begin/end block. There is no defined ordering between concurrently executing begin/end blocks (See Section 4.7 Nondeterminism in the 1800-2012 LRM) So a simulator is free to interleave the statements in any way as long as it honors the order within a single block.
So you are correct that e could have the final values 1, 2 or 3 depending on how a simulator decides to implement and optimize your code.
This program claims to solve makeWater() synchronization problem. However, I could not understand how. I am new in semaphores. I would appriciate if you can help me to understand this code.
So you need to make H2O (2Hs and one O) combinations out of number of simultaneously running H-threads and O-threads.
The thing is one 'O' needs two 'H' s. And no sharings between two different water molecules.
So assume number of O and H threads start their processes.
No O thread can go beyond P(o_wait) because o-wait is locked, and should wait.
One random lucky H thread(say H*-1) can go pass P(mutex)(now mutex = 0 and count = 1) and and will go inside if(count%2 == 1), then up-count 'mutex' (now mutex = 1) and block in P(h_wait). (This count actually refers to H count)
Because 'mutex' was up-counted another random H-thread(H*-2) will start to go pass P(mutex) (Now mutex = 0 and count =2). But now the count is even -> hence it goes inside else. Then it will V(o_wait)(now o_wait = 1) and stuck in P(h_wait).
Now H*-1 is still at the previous position inside if block. But because o_wait is up-counted to 1, a lucky O thread(O*) can continue its process. It will do two V(h_wait) s (Now o_wait = 0, h_wait = 2), so that 2 previous H threads can continue(No any other, now h_wait = 0). So all 3 (2 Hs and O) can finish its process, while H*-2 is up-counting the 'mutex' (now mutex = 1).
Now the final values of global variables after completion one molecule, mutex = 1, h_wait = 0 and o_wait = 0, so exactly the initial status. Now the previous process will happen again and again, hence H2O molecules will be created.
I think you get clear with it. Please raise questions if any. :))
I am not familiar with multi-thread and locks and atomic/nonatomic operations.
Recently I saw an interview question as below.
Put f1 and f2 in two separate threads and run them at the same time, when both of them return, what is the value of a?
int a = 2, b = 0, c = 0
func f1()
{
a = a * 2
a = b
}
func f2()
{
c = a + 11
a = c
}
I tried to implement the above code in objective c environment and what I got is a = 11. I'm not sure if this is right since what I did is put f1 in main queue and put f2 in a dispatch global queue and ran it async which could be incorrect.
If someone could give an answer and explain the process based on the level of register accessing, CPU processing, memory usage, that would be great.
The answer is - the result of A is random. It can be anything. Since access to A is not atomic and there is no synchronization, different threads might see a different value for a depending on random factors. If you manage to make a unaligned and run it on X86, you might even see a non-value for a.
I'm confused about how threads and synchronization works. I am working through a sample problem that is described like so:
There are two threads: P and Q. The variable, counter, is shared by both threads.
Modification of counter in one thread is visible to the other thread. The
increment instruction adds one to the variable, storing the new value.
1 global integer counter = 0
2
3 thread P()
4 incr(counter)
5 print counter
6 end
7
8 thread Q()
9 print counter
10 incr(counter)
11 print counter
12 incr(counter)
13 end
There are three print statements that output the value of counter. In the output
list below, indicate whether the given output is possible and if it is, give
the interleaving instructions (using thread and line numbers) of P and Q that
can lead to the output.
The example has output 122 is it possible? which can be produced by P4, Q9, Q10, P5, Q11, Q12. I can't wrap my head around how this works.
Assume thread P starts first and increments "counter" by one. Then it's suspended and thread Q starts, reads "counter" and prints its value ("1"). Next thread Q increments "counter", which is now "2". Then thread Q gets suspended and thread P continues. It now reads "counter" and prints its value ("2"). Thread P terminates. Thread Q continues, reads and prints "counter" ("2"). It then increments "counter" by one.
The output therefore is: "122"
That's one possible sequence of execution. Generally speaking you can never tell when a thread gets suspended and when it continues, that's the whole point of this exercise. By adding synchronization mechanisms (which this example is completely lacking) you can get control over the sequence of execution again.