semaphore p3 blocked forever? - semaphore

I have the following code, and I want to know what printf prints:
I did it, but I am not sure about my answer.
Variables: d=0, A=1,B=1, C=0.
p1 p2 p3
while(1) { while(1){ while(1){
P(A); P(B); P(A); P(B); P(C); P(C);
d = 2*d; d = d+1; printf("%d\n",d);
V(C) V(C); V(A); V(B);
} } }
My try was:
C=0, p3 is blocked
starting form process
p1 A=0, B=0 and d=0 and C=1.
p2 blocks because A=0 and B=0, p3
It also blocks in the second P(C). A=0, B=0, C=0 happens a deadlock, and printf does not print anything. Is this correct?

you are right - deadlock - nothing printed.
at deadlock: p3 is waiting at second P(C). either p1 or p2 has done its first iteration, the other - zero iteration. both p1 and p2 is waiting at P(A). value of d depends on race between p1 and p2 - which entered first iteration.


Can I determine the result of a data race without reading the value?

I'm trying to better understand lock-free programming:
Suppose we have two threads in a data race:
// Thread 1
x = 1
// Thread 2
x = 2
Is there a lock-free way a third thread can know the result of the race without being able to read x?
Suppose thread 3 consumes a lock-free queue, and the code is:
// Thread 1
x = 1
// Thread 2
x = 2
Then the operations could be ordered as:
x = 1
x = 2
x = 1
x = 2
So having a lock-free queue alone would not suffice for thread 3 to know the value of x after the race.
If you know the value of x before the race began, the following code using atomic Read-Modify-Write operations should do the job.
// Notes:
// x == 0
// x and winner are both atomic
// atomic_swap swaps the content of two variables atomically,
// meaning, that no other thread can interfere with this operation
t = 1;
atomic_swap(x, t);
if (t != 0) {
//x was non zero, when thread-1 called the swap operation
//--> thread-2 was faster
winner = 1;
t = 2;
atomic_swap(x, t);
if (t != 0) {
//x was non zero, when thread-2 called the swap operation
//--> thread-1 was faster
winner = 2;
while (winner == 0) {}
print("Winner is " + winner);

how to model a queue in promela?

Ok, so I'm trying to model a CLH-RW lock in Promela.
The way the lock works is simple, really:
The queue consists of a tail, to which both readers and writers enqueue a node containing a single bool succ_must_wait they do so by creating a new node and CAS-ing it with the tail.
The tail thereby becomes the node's predecessor, pred.
Then they spin-wait on pred.succ_must_wait until it is false.
Readers first increment a reader counter ncritR and then set their own flag to false, allowing multiple readers at in the critical section at the same time. Releasing a readlock simply means decrementing ncritR again.
Writers wait until ncritR reaches zero, then enter the critical section. They do not set their flag to false until the lock is released.
I'm kind of struggling to model this in promela, though.
My current attempt (see below) tries to make use of arrays, where each node basically consists of a number of array entries.
This fails because let's say A enqueues itself, then B enqueues itself. Then the queue will look like this:
S <- A <- B
Where S is a sentinel node.
The problem now is, that when A runs to completeness and re-enqueues, the queue will look like
S <- A <- B <- A'
In actual execution, this is absolutely fine because A and A' are distinct node objects. And since A.succ_must_wait will have been set to false when A first released the lock, B will eventually make progress, and therefore A' will eventually make progress.
What happens in the array-based promela model below, though, is that A and A' occupy the same array positions, causing B to miss the fact that A has released the lock, thereby creating a deadlock where B is (wrongly) waiting for A' instead of A and A' is waiting (correctly) for B.
A possible "solution" to this could be to have A wait until B acknowledges the release. But that would not be true to how the lock works.
Another "solution" would be to wait for a CHANGE in pred.succ_must_wait, where a release would increment succ_must_wait, rather than reset it to 0.
But I'm intending to model a version of the lock, where pred may change (i.e. where a node may be allowed to disregard some of its predecessors), and I'm not entirely convinced something like the increasing version wouldn't cause an issue with this change.
So what's the "smartest" way to model an implicit queue like this in promela?
/* CLH-RW Lock */
/*pid: 0 = init, 1-2 = reader, 3-4 = writer*/
ltl liveness{
([]<> reader[1]#progress_reader)
&& ([]<> reader[2]#progress_reader)
&& ([]<> writer[3]#progress_writer)
&& ([]<> writer[4]#progress_writer)
bool initialised = 0;
byte ncritR;
byte ncritW;
byte tail;
bool succ_must_wait[5]
byte pred[5]
assert(_pid == 0);
ncritR = 0;
ncritW = 0;
/*sentinel node*/
tail =0;
pred[0] = 0;
succ_must_wait[0] = 0;
initialised = 1;
active [2] proctype reader()
assert(_pid >= 1);
(initialised == 1)
:: else ->
succ_must_wait[_pid] = 1;
atomic {
pred[_pid] = tail;
tail = _pid;
(succ_must_wait[pred[_pid]] == 0)
succ_must_wait[_pid] = 0;
atomic {
/*freeing previous node for garbage collection*/
pred[_pid] = 0;
assert(ncritR >= 1);
assert(ncritW == 0);
atomic {
/*necessary to model the fact that the next access creates a new queue node*/
:: tail == _pid -> tail = 0;
:: else ->
active [2] proctype writer()
assert(_pid >= 1);
(initialised == 1)
:: else ->
succ_must_wait[_pid] = 1;
atomic {
pred[_pid] = tail;
tail = _pid;
(succ_must_wait[pred[_pid]] == 0)
(ncritR == 0)
atomic {
/*freeing previous node for garbage collection*/
pred[_pid] = 0;
assert(ncritR == 0);
assert(ncritW == 1);
succ_must_wait[_pid] = 0;
atomic {
/*necessary to model the fact that the next access creates a new queue node*/
:: tail == _pid -> tail = 0;
:: else ->
First of all, a few notes:
You don't need to initialize your variables to 0, since:
The default initial value of all variables is zero.
see the docs.
You don't need to enclose a single instruction inside an atomic {} statement, since any elementary statement is executed atomically. For better efficiency of the verification process, whenever possible, you should use d_step {} instead. Here you can find a related stackoverflow Q/A on the topic.
init {} is guaranteed to have _pid == 0 when one of the two following conditions holds:
no active proctype is declared
init {} is declared before any other active proctype appearing in the source code
Active Processes, includig init {}, are spawned in order of appearance inside the source code. All other processes are spawned in order of appearance of the corresponding run ... statement.
I identified the following issues on your model:
the instruction pred[_pid] = 0 is useless because that memory location is only read after the assignment pred[_pid] = tail
When you release the successor of a node, you set succ_must_wait[_pid] to 0 only and you don't invalidate the node instance onto which your successor is waiting for. This is the problem that you identified in your question, but was unable to solve. The solution I propose is to add the following code:
pid j;
for (j: 1..4) {
:: pred[j] == _pid -> pred[j] = 0;
:: else -> skip;
This should be enclosed in an atomic {} block.
You correctly set tail back to 0 when you find that the node that has just left the critical section is also the last node in the queue. You also correctly enclose this operation in an atomic {} block. However, it may happen that --when you are about to enter this atomic {} block-- some other process --who was still waiting in some idle state-- decides to execute the initial atomic block and copies the current value of tail --which corresponds to the node that has just expired-- into his own pred[_pid] memory location. If now the node that has just exited the critical section attempts to join it once again, setting his own value of succ_must_wait[_pid] to 1, you will get another instance of circular wait among processes. The correct approach is to merge this part with the code releasing the successor.
The following inline function can be used to release the successor of a given node:
inline release_succ(i)
d_step {
pid j;
for (j: 1..4) {
:: pred[j] == i ->
pred[j] = 0;
:: else ->
succ_must_wait[i] = 0;
:: tail == _pid -> tail = 0;
:: else -> skip;
The complete model, follows:
byte ncritR;
byte ncritW;
byte tail;
bool succ_must_wait[5];
byte pred[5];
inline release_succ(i)
d_step {
pid j;
for (j: 1..4) {
:: pred[j] == i ->
pred[j] = 0;
:: else ->
succ_must_wait[i] = 0;
:: tail == _pid -> tail = 0;
:: else -> skip;
active [2] proctype reader()
succ_must_wait[_pid] = 1;
d_step {
pred[_pid] = tail;
tail = _pid;
(succ_must_wait[pred[_pid]] == 0)
// critical section
assert(ncritR > 0);
assert(ncritW == 0);
goto loop;
active [2] proctype writer()
succ_must_wait[_pid] = 1;
d_step {
pred[_pid] = tail;
tail = _pid;
(succ_must_wait[pred[_pid]] == 0) && (ncritR == 0)
// critical section
assert(ncritR == 0);
assert(ncritW == 1);
goto loop;
I added the following properties to the model:
p0: the writer with _pid equal to 4 goes through its progress state infinitely often, provided that it is given the chance to execute some instruction infinitely often:
ltl p0 {
([]<> (_last == 4)) ->
([]<> writer[4]#progress_writer)
This property should be true.
p1: there is never more than one reader in the critical section:
ltl p1 {
([] (ncritR <= 1))
Obviously, we expect this property to be false in a model that matches your specification.
p2: there is never more than one writer in the critical section:
ltl p2 {
([] (ncritW <= 1))
This property should be true.
p3: there isn't any node that is the predecessor of two other nodes at the same time, unless such node is node 0:
ltl p3 {
[] (
(((pred[1] != 0) && (pred[2] != 0)) -> (pred[1] != pred[2])) &&
(((pred[1] != 0) && (pred[3] != 0)) -> (pred[1] != pred[3])) &&
(((pred[1] != 0) && (pred[4] != 0)) -> (pred[1] != pred[4])) &&
(((pred[2] != 0) && (pred[3] != 0)) -> (pred[2] != pred[3])) &&
(((pred[2] != 0) && (pred[4] != 0)) -> (pred[2] != pred[4])) &&
(((pred[3] != 0) && (pred[4] != 0)) -> (pred[3] != pred[4]))
This property should be true.
p4: it is always true that whenever writer with _pid equal to 4 tries to access the critical section then it will eventually get there:
ltl p4 {
[] (writer[4]#trying -> <> writer[4]#progress_writer)
This property should be true.
The outcome of the verification matches our expectations:
~$ spin -search -ltl p0 -a clhrw_lock.pml
Full statespace search for:
never claim + (p0)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by never claim)
State-vector 68 byte, depth reached 3305, errors: 0
~$ spin -search -ltl p1 -a clhrw_lock.pml
Full statespace search for:
never claim + (p1)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by never claim)
State-vector 68 byte, depth reached 1692, errors: 1
~$ spin -search -ltl p2 -a clhrw_lock.pml
Full statespace search for:
never claim + (p2)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by never claim)
State-vector 68 byte, depth reached 3115, errors: 0
~$ spin -search -ltl p3 -a clhrw_lock.pml
Full statespace search for:
never claim + (p3)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by never claim)
State-vector 68 byte, depth reached 3115, errors: 0
~$ spin -search -ltl p4 -a clhrw_lock.pml
Full statespace search for:
never claim + (p4)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by never claim)
State-vector 68 byte, depth reached 3115, errors: 0

How is it possible for some threads to never execute?

int x = 0 // global shared variable
T1: for (i=0; i++; i<100) x++;
T2: x++ // no loop, just one increment
T1 and T2 are separate threads. I am told the final value of x can be ANYTHING from the values of 1 and 101. How is this possible? I am wondering how this could possibly just be 1.
Obviously, something fails in the execution sequence, but I'm wondering what.
x++ is not atomic operation (at least in most languages), this operation actually works like this:
tmp = x;
tmp = tmp + 1;
x = tmp;
now assume next execution order:
T2: tmp = x; // tmp is 0
T1: run all loop iterations, finally x is 100
T2: tmp = tmp+1; x = tmp; // x is 1
to get any other number, imagine next order:
T1: started loop, at some point x is 45
T2: tmp = x; // tmp is 45
T1: finished loop, x is 100
T2: tmp = tmp+1; x = tmp; // x is 46
The reason for this behavior is memory caching. Since threads can be executed on independent cpu's following situation is possible:
T1: loads x value
T2: loads x value
T1: runs loop 10 times (T1_x=10)
T2: increments (T2_x=1)
T1: saves value 10 to memory
T2: saves value 1 to memory
This is why you need thread synchronization. You can read more here: Mutex example / tutorial?
Thank you #Lashane for correction.

Where is the race condition?

I had a question on a test recently that basically said to make 3 concurrent processes execute some block of code in order.
Example of execution order incase that did not make sense:
For my answer I wrote this pseudo-ish code
shared s[2] = {-1,-1};
void Process1(){
if(s[0] < 0 && s[1] < 0){
s[0] = 1;
void Process2(){
if(s[0] > 0 && s[1] < 0){
s[1] = 1;
void Process3(){
int i = 0;
if(s[1] > 0 && s[0] > 0){
s[0] = -1;
s[1] = -1;
My teacher wrote race condition and circled the last line in the if statement on Process3 and drew an arrow to the conditional statement in process2.
I am having trouble seeing how this could cause a race condition. I am sure it is obvious but I just can't see it.
Consider the following order of events:
After some time, s = [1, 1].
Within Process2, the thread is in the midst of evaluating the expression in the if statement, and just passed the truthy condition s[0] > 0 and is about to continue.
Within Process3, you modify s to be [-1, -1].
Process2 evaluates the rest of the expression and goes into action before Process1.

Go thread deadlock error - what is the correct way to use go routines?

I am writing a program that calculates a Riemann sum based on user input. The program will split the function into 1000 rectangles (yes I know I haven't gotten that math in there yet) and sum them up and return the answer. I am using go routines to compute the 1000 rectangles but am getting an
fatal error: all go routines are asleep - deadlock!
What is the correct way to handle multiple go routines? I have been looking around and haven't seen an example that resembles my case? I'm new and want to adhere to standards. Here is my code (it is runnable if you'd like to see what a typical use case of this is - however it does break)
package main
import "fmt"
import "time"
//Data type to hold 'part' of function; ie. "4x^2"
type Pair struct {
coef, exp int
//Calculates the y-value of a 'part' of the function and writes this to the channel
func calc(c *chan float32, p Pair, x float32) {
val := x
//Raise our x value to the power, contained in 'p'
for i := 1; i < p.exp; i++ {
val = val * val
//Read existing answer from channel
ans := <-*c
//Write new value to the channel
*c <- float32(ans + (val * float32(p.coef)))
var c chan float32 //Channel
var m map[string]Pair //Map to hold function 'parts'
func main() {
c = make(chan float32, 1001) //Buffered at 1001
m = make(map[string]Pair)
var counter int
var temp_coef, temp_exp int
var check string
var up_bound, low_bound float32
var delta float32
counter = 1
check = "default"
//Loop through as long as we have no more function 'parts'
for check != "n" {
fmt.Print("Enter the coefficient for term ", counter, ": ")
fmt.Print("Enter the exponent for term ", counter, ": ")
fmt.Print("Do you have more terms to enter (y or n): ")
//Put data into our map
m[string(counter)] = Pair{temp_coef, temp_exp}
fmt.Print("Enter the lower bound: ")
fmt.Print("Enter the upper bound: ")
//Calculate the delta; ie. our x delta for the riemann sum
delta = (float32(up_bound) - float32(low_bound)) / float32(1000)
//Make our go routines here to add
for i := low_bound; i < up_bound; i = i + delta {
//'counter' is indicative of the number of function 'parts' we have
for j := 1; j < counter; j++ {
//Go routines made here
go calc(&c, m[string(j)], i)
//Wait for the go routines to finish
time.Sleep(5000 * time.Millisecond)
//Read the result?
ans := <-c
fmt.Print("Answer: ", ans)
It dead locks because both the calc() and the main() function reads from the channel before anyone gets to write to it.
So you will end up having every (non-main) go routine blocking at:
ans := <-*c
waiting for someone other go routine to enter a value into the channel. There fore none of them gets to the next line where they actually write to the channel. And the main() routine will block at:
ans := <-c
Everyone is waiting = deadlock
Using buffered channels
Your solution should have the calc() function only writing to the channel, while the main() could read from it in a for-range loop, suming up the values coming from the go-routines.
You will also need to add a way for main() to know when there will be no more values arriving, perhaps by using a sync.WaitGroup (maybe not the best, since main isn't suppose to wait but rather sum things up) or an ordinary counter.
Using shared memory
Sometimes it is not necessarily a channel you need. Having a shared value that you update with the sync/atomic package (atomic add doesn't work on floats) lock with a sync.Mutex works fine too.
