Semaphore value greater than initialized value - linux

I am working on semaphores in Linux. I would like to know if the semaphore value can ever be incremented beyond the initialized value? If so, when can that happen?
For example, semaphore value is initialized to 1.
If I increment twice continuously using up(sem), will the value of semaphore increment beyond 1.
x(void){
sema_init(sem1, 1);
down(sem1);
{
.
. // some code implementation
.
}
up(sem1); // i understand this increment the value back to 1.
up(sem1);
/* what exactly does this statement do to the semaphore?
Will it increment the value to 2? If so what is the meaning of this statement? */
}

Yes it will increment it to 2. The effect is that the next two semaphore down calls will run without blocking. The general use case of semaphores is to protect a pool of resources. If there is 1 resource then the max expected value of the semaphore will be 1. If there are 2 resources then max expected value is 2 and so on. So whether incrementing the semaphore to 2 is correct or not depends upon the context. If only 1 process should get past the semaphore at any given time then incrementing to 2 is a bug in the code. If 2 or more processes are allowed then incrementing to 2 is allowable.
This is a simplified explanation. For more details look up "counting semaphores". The other type of semaphore which you may be thinking of is "binary semaphores" which are either 0 or 1.

Related

Result of 100 concurrent threads, each incrementing variable to 100

I'm writing to ask about this question from 'The Little Book of Semaphores' by Allen B. Downey.
Question from 'The Little Book of Semaphores'
Puzzle: Suppose that 100 threads run the following program concurrently. (if you are not familiar with Python, the for loop runs the update 100 times.):
for i in range(100):
temp = count
count = temp + 1
What is the largest possible value of count after all threads have completed? What is the smallest possible value? Hint: the first question is easy; the second is not.
My understanding is that count is a variable shared by all threads, and that it's initial value is 0.
I believe that the largest possible value is 10,000, which occurs when there is no interleaving between threads.
I believe that the smallest possible value is 100. If line 2 is executed for each thread, they will each have a value of temp = 0. If line 3 is then executed for each thread, they will each set count = 1. If the same behaviour occurs in each iteration, the final value of count will be 100.
Is this correct, or is there another execution path that can result in a value smaller than 100 for count?
The worst case that I can think of will leave count equal to two. It's extremely unlikely that this would ever happen in practice, but in theory, it's possible. I'll need to talk about Thread A, Thread B, and 98 other threads:
Thread A reads count as zero, but then it is preempted before it can do anything else,
Thread B is allowed to run 99 iterations of its loop, and 98 other threads all run to completion before thread A finally is allowed to run again,
Thread A writes 1 to count before—are you ready to believe this?—it gets preempted again!
Thread B starts its 100th iteration. It gets as far as reading count as 1 (just now written by thread A) before thread A finally comes roaring back to life and runs to completion,
Thread B is last to cross the finish line after it writes 2 to count.

Handling Multiple wait operation before entering into critical section in operating system?

I am stuck while solving a counting semaphore problem in Operating system subject.
S is a semaphore initialized to 5.
count = 0 (shared variable)
Assume that the increment operation in line #7 is not atomic.
Now,
1. int counter = 0;
2. Semaphore S = init(5);
3. void parop(void)
4. {
5. wait(S);
6. wait(S);
7. counter++;
8. signal(S);
9. signal(S);
10. }
If five threads execute the function parop concurrently, which of the following program behavior(s) is/are possible?
A. The value of counter is 5 after all the threads successfully complete the execution of parop
B. The value of counter is 1 after all the threads successfully complete the execution of parop
C. The value of counter is 0 after all the threads successfully complete the execution of parop
D. There is a deadlock involving all the threads
what i have understand till now is answer is A and D,because what if all process are executed one by one say(T1->T2->T3->T4->T5) and final value saved will be 5(so A is one of correct options)
Now, why D,because what if all process execute line 5 before 6 and will get blocked.
Now, please can any one help me to understand why B is another correct answer. ?
Thanks in advance,
Hope to here from you soon
Any help will be highly appreciated.
Imagine thread 1 gets to line 7 before any other thread, and line 7 is implemented as three instructions:
7_1: load counter, %r0
7_2: add $1, %r0
7_3: store %r0, counter
For some reason (eg. interrupt, preempted), thread 1 stops at instruction 7_2; so it has loaded the value 0 into register %r0.
Next, thread's 2..5 all run through this sequence, leaving counter at say 4.
Thread 1 is rescheduled, increments %r0 to the value 1 and stores it into counter.

Not understanding semaphore on low level

Just watched a video on semaphores and tried digging for more information. Not quite sure how a semaphore works on an assembly level.
P(s):
s = s - 1
if (s < 0) {wait on s}
CRITICAL SECTION
V(s):
s = s + 1
if(threads are waiting on s) {wake one}
I understand what the concept is behind these function, however I am having trouble wrapping my head around this.
say S = 1
and you have 2 Threads: Thread 1 and Thread 2
Thread One Thread Two
load s load s
subtract s,1 subtract s,1
save s save s
Then there is a context switch in between the subtract and the save for both setting s to 0 for both. Wont both threads see s as 0 entering the critical section. I am not sure how one thread becomes exclusive if it is possible on the assembly level to context switch so that both can see s = 0.
The key thing is that the increment and decrement use atomic instructions in some way. Within x86, there is a form of the add instruction which combined with the lock prefix lets you perform an addition to a memory location atomically. Because it is a single instruction, a context switch can't happen during its execution, and the lock prefix means that the CPU ensures that no other accesses appear to happen during the increment.
If an atomic add is not available then there are other options. One common one is an atomic compare and swap instruction. Found on most systems supporting parallel or concurrent code, it is an instruction that takes two values, an old and new, and if the memory location is equal to to the old, set it to the new value. This can be used in a loop to implement an atomic add:
l:
load r0 s
mov r1 r0
add r0 -1
cas s r1 r0
jmpf l
This loads a value, then subtracts 1 from a copy of the value. we then attempt to store the the lower value, but if it has changed we fail, and start again.

Difference between mutexes and memory coherence?

I know about memory coherence protocols for multi-core architectures. MSI for example allows at most one core to hold a cache line in M state with both read and write access enabled. S state allows multiple sharers of the same line to only read the data. I state allows no access to the currently acquired cache line. MESI extends that by adding an E state which allows only one sharer to read, allowing an easier transition to M state if there are no other sharers.
from what I wrote above, I understand that when we write this line of code as part of multi-threaded (pthreads) program:
// temp_sum is a thread local variable
// sum is a global shared variable
sum = sum + temp_sum;
It should allow one thread to access sum in M state invalidating all other sharers, then when another thread reaches the same line it will request M invalidating again the current sharers and so on. But in fact this doesn't happen unless I add a mutex:
pthread_mutex_lock(&locksum);
// temp_sum is a thread local variable
// sum is a global shared variable
sum = sum + temp_sum;
pthread_mutex_unlock(&locksum);
This is the only way to have this work correctly. Now why do we have to supply these mutexes? why isn't this handled by memory coherence directly? why do we need mutexes or atomic instructions?
Your line of code sum = sum + temp_sum; although it may seem trivially simple in C, it is not an atomic operation. It loads the value of sum from memory into a register, performs arithmetic on it (adding the value of temp_sum), then writes the result back to memory (wherever sum is stored).
Even though only one thread can read or write sum from memory at a time, there is still an opportunity for a synchronization problem. A second thread could modify sum in memory while the first is manipulating the value in a register. Then the first thread will write what it thinks is the updated value (the result of arithmetic) back to memory, overwriting whatever the second put there. It is this transitional location in a register that introduces the issue. There is more to the notion of "the value of a variable" than whatever currently resides in memory.
For example, suppose sum is initially 4. Two threads want to add 1 to it. The first thread loads the 4 from memory into a register, and adds 1 to make 5. But before this first thread can store the result back to memory, a second thread loads the 4, adds 1, and writes a 5 back to memory. The first thread then continues and stores its result (5) back to the same memory location. Both threads are convinced that they have done their duty and correctly updated the sum. The problem is that sum is 5 and not 6 as it should be.
The mutex ensures that only one thread will load, modify, and store sum at a time. Any second thread will have to wait (be blocked) until the first has finished.

pthreads: If I increment a global from two different threads, can there be sync issues?

Suppose I have two threads A and B that are both incrementing a ~global~ variable "count". Each thread runs a for loop like this one:
for(int i=0; i<1000; i++)
count++; //alternatively, count = count + 1;
i.e. each thread increments count 1000 times, and let's say count starts at 0. Can there be sync issues in this case? Or will count correctly equal 2000 when the execution is finished? I guess since the statement "count = count + 1" may break down into TWO assembly instructions, there is potential for the other thread to be swapped in between these two instructions? Not sure. What do you think?
Yes there can be sync issues in this case. You need to either protect the count variable with a mutex, or use a (usually platform specific) atomic operation.
Example using pthread mutexes
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
for(int i=0; i<1000; i++) {
pthread_mutex_lock(&mutex);
count++;
pthread_mutex_unlock(&mutex);
}
Using atomic ops
There is a prior discussion of platform specific atomic ops here:
UNIX Portable Atomic Operations
If you only need to support GCC, this approach is straightforward. If you're supporting other compilers, you'll probably have to make some per-platform decisions.
Count clearly needs to be protected with a mutex or other synchronization mechanism.
At a fundamental level, the count++ statment breaks down to:
load count into register
increment register
store count from register
A context switch could occur before/after any of those steps, leading to situations like:
Thread 1: load count into register A (value = 0)
Thread 2: load count into register B (value = 0)
Thread 1: increment register A (value = 1)
Thread 1: store count from register A (value = 1)
Thread 2: increment register B (value = 1)
Thread 2: store count from register B (value = 1)
As you can see, both threads completed one iteration of the loop, but the net result is that count was only incremented once.
You probably would also want to make count volatile to force loads & stores to go to memory, since a good optimizer would likely keep count in a register unless otherwise told.
Also, I would suggest that if this is all the work that's going to be done in your threads, performance will dramatically drop from all the mutex locking/unlocking required to keep it consistent. Threads should have much bigger work units to perform.
Yes, there can be sync problems.
As an example of the possible issues, there is no guarantee that an increment itself is an atomic operation.
In other words, if one thread reads the value for increment then gets swapped out, the other thread could come in and change it, then the first thread will write back the wrong value:
+-----+
| 0 | Value stored in memory (0).
+-----+
| 0 | Thread 1 reads value into register (r1 = 0).
+-----+
| 0 | Thread 2 reads value into register (r2 = 0).
+-----+
| 1 | Thread 2 increments r2 and writes back.
+-----+
| 1 | Thread 1 increments r1 and writes back.
+-----+
So you can see that, even though both threads have tried to increment the value, it's only increased by one.
This is just one of the possible problems. It may also be that the write itself is not atomic and one thread may update only part of the value before being swapped out.
If you have atomic operations that are guaranteed to work in your implementation, you can use them. Otherwise, use mutexes, That's what pthreads provides for synchronisation (and guarantees will work) so is the safest approach.
I guess since the statement "count = count + 1" may break down into TWO assembly instructions, there is potential for the other thread to be swapped in between these two instructions? Not sure. What do you think?
Don't think like this. You're writing C code and pthreads code. You don't have to ever think about assembly code to know how your code will behave.
The pthreads standard does not define the behavior when one thread accesses an object while another thread is, or might be, modifying it. So unless you're writing platform-specific code, you should assume this code can do anything -- even crash.
The obvious pthreads fix is to use mutexes. If your platform has atomic operations, you can use those.
I strongly urge you not to delve into detailed discussions about how it might fail or what the assembly code might look like. Regardless of what you might or might not think compilers or CPUs might do, the behavior of the code is undefined. And it's too easy to convince yourself you've covered every way you can think of that it might fail and then you miss one and it fails.

Resources