shared semaphores in linux do not post - linux

I'm having a problem trying to implement a solution to the producer-consumer problem in linux.
I'm supposed to use a shared semaphore to synchronize across different processes.
I understand the concept of signalling alternate semaphores like this (I'm also using sem_open() to make the semaphores shared across the two processes:
process 1
----------------------------------------------
semaphore s1 = sem_open("/s1", O_CREAT, 0666, 0);
semaphore s2 = sem_open("/s2", O_CREAT, 0666, BUFFSIZE);
sem_wait(s2);
/* do stuff */
printf("This is process 1!\n");
sem_post(s1);
process 2
----------------------------------------------
semaphore s1 = sem_open("/s1", 0);
semaphore s2 = sem_open("/s2", 0);
sem_wait(s1);
/* do stuff */
printf("This is process 2!\n");
sem_post(s2);
The problem I'm having is that both processes are deadlocked. From what I understand, process one should enter the critical section first (since the semaphore has an initial value of BUFFSIZE), then signal s1 so process 2 can proceed.
This isn't the case; both processes just sit there blankly, with no output to the screen.

Related

How does pthread mutex unlock work? And do threads come up at the same time?

I wanna ask you some basic thing but it really bothers me a lot.
I'm currently studying 'pthread mutex' for system programming and as far as I know, when 'pthread_mutex_lock' is called only current thread is executed not any others. Can I think like this?
And when it comes to 'pthread_mutex_unlock', when this function is called, does the current thread pass the lock permission to others and wait until some other thread calls unlock function again? Or does every thread including current thread execute simultaneously until one of them calls lock function?
Here's the code I was studying:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
enum { STATE_A, STATE_B } state = STATE_A;
pthread_cond_t condA = PTHREAD_COND_INITIALIZER;
pthread_cond_t condB = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void *threadA()
{
printf("A start\n");
int i = 0, rValue, loopNum;
while(i<3)
{
pthread_mutex_lock(&mutex);
while(state != STATE_A)
{
printf("a\n");
pthread_cond_wait(&condA, &mutex);
}
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&condB);
for(loopNum = 1; loopNum <= 5; loopNum++)
{
printf("Hello %d\n", loopNum);
}
pthread_mutex_lock(&mutex);
state = STATE_B;
printf("STATE_B\n");
pthread_cond_signal(&condB);
pthread_mutex_unlock(&mutex);
i++;
}
return 0;
}
void *threadB()
{
printf("B start\n");
int n = 0, rValue;
while(n<3)
{
pthread_mutex_lock(&mutex);
while (state != STATE_B)
{
printf("b\n");
pthread_cond_wait(&condB, &mutex);
}
pthread_mutex_unlock(&mutex);
printf("Goodbye\n");
pthread_mutex_lock(&mutex);
state = STATE_A;
printf("STATE_A\n");
pthread_cond_signal(&condA);
pthread_mutex_unlock(&mutex);
n++;
}
return 0;
}
int main(int argc, char *argv[])
{
pthread_t a, b;
pthread_create(&a, NULL, threadA, NULL);
pthread_create(&b, NULL, threadB, NULL);
pthread_join(a, NULL);
pthread_join(b, NULL);
}
I kind of modified some of the original parts to make sure what's going on in this code such as adding printf("A start\n"), printf("a\n") so on.
And here are some outputs:
Output 1
B start
b
A start
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
b
STATE_B
a
Goodbye
STATE_A
b
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
b
STATE_B
a
Goodbye
STATE_A
b
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
b
STATE_B
Goodbye
STATE_A
Output 2
B start
b
A start
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
STATE_B
a
Goodbye
STATE_A
b
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
STATE_B
a
Goodbye
STATE_A
b
Hello 1
Hello 2
Hello 3
Hello 4
Hello 5
STATE_B
Goodbye
STATE_A
So I learned that when threads are called, they are called simultaneously. Based on this logic, I added the 'printf("A start\n")' and 'printf("B start\n")' in the beginning of the each thread function 'threadA() and threadB()'. But always 'printf("B start\n")' comes up first. If they are called at the same time, don't they have to come up alternatively, at least randomly?
Also after the first 'Hello' loop, I'm assuming 'Goodbye' message always should be earlier than 'a' since I guess the 'pthread_mutex_unlock' in ThreadA calls ThreadB and waits until ThreadB calls unlock function. I want to know how this code works.
I'm guessing I would be totally wrong and misunderstood a lot of parts since I'm a newbie in this field. But wanna get the answer. Thank you for reading this :)
when 'pthread_mutex_lock' is called only current thread is executed
not any others. Can I think like this?
I guess you can think like that, but you'll be thinking incorrectly. pthread_mutex_lock() doesn't cause only the calling thread to execute. Rather, it does one of two things:
If the mutex wasn't already locked, it locks the mutex and returns immediately.
If the mutex was already locked, it puts the calling thread to sleep, to wait until the mutex has become unlocked. Only after pthread_mutex_lock() has successfully acquired the lock, will pthread_mutex_lock() return.
Note that in both cases, the promise that pthread_mutex_lock() makes to the calling thread is this: when pthread_mutex_lock() returns zero/success, the mutex will be locked and the calling thread will be the owner of the lock. (The other possibility is that phread_mutex_lock() will return a negative value indicating an error condition, but that's uncommon in practice so I won't dwell on it)
when it comes to 'pthread_mutex_unlock', does the current thread pass
the lock permission to others and wait until some other thread calls
unlock function again?
The first thing to clarify is that pthread_mutex_unlock() never waits for anything; unlike pthread_mutex_lock(), pthread_mutex_unlock() always returns immediately.
So what does pthread_mutex_unlock() do?
Unlocks the mutex (note that the mutex must have already been locked by a previous call to pthread_mutex_lock() in the same thread. If you call pthread_mutex_unlock() on a mutex without having previously called pthread_mutex_lock() to acquire that same mutex, then your program is buggy and won't work correctly)
Notifies the OS's thread-scheduler (through some mechanism that is deliberately left undocumented, since as a user of the pthreads library you don't need to know or care how it is implemented) that the mutex is now unlocked. Upon receiving that notification, the OS will check to see what other threads (if any) are blocked inside their own call to pthread_mutex_lock(), waiting to acquire this mutex, and if there are any, it will wake up one of those threads so that that thread may acquire the lock and its pthread_mutex_lock() call can then return. All that may happen before or after your thread's call to pthread_mutex_unlock() returns; the exact order of execution is indeterminate and doesn't really matter.
I guess the 'pthread_mutex_unlock' in ThreadA calls ThreadB and waits
until ThreadB calls unlock function.
pthread_mutex_unlock() does no such thing. In general, threads don't/can't call functions in other threads. For what pthread_mutex_unlock() does do, see my description above.
pthread_mutex_lock() doesn't mean only one thread will execute - it just means that any other thread that also tries to call pthread_mutex_lock() on the same mutex object will be suspended until the first thread releases the lock with pthread_mutex_unlock().
If the other threads aren't trying to lock the same mutex, they can continue running simultaneously.
If multiple threads have tried to lock the same mutex while it is locked by the first thread, then when the mutex is released by the first thread with pthread_mutex_unlock() only one of them will be able to proceed (and then when that thread itself calls pthread_mutex_unlock(), another waiting thread will be able to proceed and so on).
Note that a thread waiting for a mutex to be unlocked will not necessarily start executing immediately upon the mutex being unlocked.

Non-repeatable affinity for pthreads

I am trying to measure the time it takes for a thread from creation to actually start.
Using POSIX thread on a Debian 6.0 machine with 32-cores (no hyper-threading) and calling pthread_attr_setaffinity_np function to set the affinity.
In a loop, I am creating the threads, waiting for them to finish, repeatedly.
So, my code looks like the following (thread 0 is running this).
for (ni=0; ni<n; ni++)
{
pthread_t *thrds;
pthread_attr_t attr;
cpu_set_t cpuset;
ths = 1; // thread starts from 1
thrds = malloc(sizeof(pthread_t)*nt); // thrds[0] not used
assert(!pthread_attr_init(&attr));
for (i=ths; i<nt; i++)
{
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset); // setting i as the affinity for thread i
assert(!pthread_attr_setaffinity_np(&attr,
sizeof(cpu_set_t), &cpuset));
assert(!pthread_create(thrds+i, &attr, DoWork, i));
}
pthread_attr_destroy(&attr);
DoWork(0);
for (i=ths; i<nt; i++)
{
pthread_join(thrds[i], NULL);
}
if (thrds) free(thrds);
}
Inside the thread function, I am calling sched_getcpu() to verify that the affinity is working. The problem is, this verification only passes the first iteration of i-loop. For the second iteration, thrd[1] gets the affinity of nt-1 (instead of 1) and so on.
Can anyone please explain why? And/or how to fix it?
NOTE: I found a workaround that if I put the master thread to sleep for 1 second after the join finishes at each iteration, the affinity works correctly. But this sleep duration could different on other machines. So still need a real fix for the issue.

Providing Concurrency Between Pthreads

I am working on multithread programming and I am stuck on something.
In my program there are two tasks and two types of robots for carrying out the tasks:
Task 1 requires any two types of robot and
task 2 requires 2 robot1 type and 2 robot2 type.
Total number of robot1 and robot2 and pointers to these two types are given for initialization. Threads share these robots and robots are reserved until a thread is done with them.
Actual task is done in doTask1(robot **) function which takes pointer to a robot pointer as parameter so I need to pass the robots that I reserved. I want to provide concurrency. Obviously if I lock everything it will not be concurrent. robot1 is type of Robot **. Since It is used by all threads before one thread calls doTask or finish it other can overwrite robot1 so it changes things. I know it is because robot1 is shared by all threads. Could you explain how can I solve this problem? I don't want to pass any arguments to thread start routine.
rsc is my struct to hold number of robots and pointers that are given in an initialization function.
void *task1(void *arg)
{
int tid;
tid = *((int *) arg);
cout << "TASK 1 with thread id " << tid << endl;
pthread_mutex_lock (&mutexUpdateRob);
while (rsc->totalResources < 2)
{
pthread_cond_wait(&noResource, &mutexUpdateRob);
}
if (rsc->numOfRobotA > 0 && rsc->numOfRobotB > 0)
{
rsc->numOfRobotA --;
rsc->numOfRobotB--;
robot1[0] = &rsc->robotA[counterA];
robot1[1] = &rsc->robotB[counterB];
counterA ++;
counterB ++;
flag1 = true;
rsc->totalResources -= 2;
}
pthread_mutex_unlock (&mutexUpdateRob);
doTask1(robot1);
pthread_mutex_lock (&mutexUpdateRob);
if(flag1)
{
rsc->numOfRobotA ++;
rsc->numOfRobotB++;
rsc->totalResources += 2;
}
if (totalResource >= 2)
{
pthread_cond_signal(&noResource);
}
pthread_mutex_unlock (&mutexUpdateRob);
pthread_exit(NULL);
}
If robots are global resources, threads should not dispose of them. It should be the duty of the main thread exit (or cleanup) function.
Also, there sould be a way for threads to locate unambiguously the robots, and to lock their use.
The robot1 array seems to store the robots, and it seems to be a global array. However:
its access is not protected by a mutex (pthread_mutex_t), it seems now that you've taken care of that.
Also, the code in task1 is always modifying entries 0 and 1 of this array. If two threads or more execute that code, the entries will be overwritten. I don't think that it is what you want. How will that array be used afterwards?
In fact, why does this array need to be global?
The bottom line is this: as long as this array is shared by threads, they will have problems working concurrently. Think about it this way:
You have two companies using robots to work, but they're using the same truck (robot1) to move the robots around. How are these two companies supposed to function properly, and efficiently with only one truck?

Native mutex implementation

So in my ilumination days, i started to think about how the hell do windows/linux implement the mutex, i've implemented this synchronizer in 100... different ways, in many diferent arquitectures but never think how it is really implemented in big ass OS, for example in the ARM world i made some of my synchronizers disabling the interrupts but i always though that it wasn't a really good way to do it.
I tried to "swim" throgh the linux kernel but just like a though i can't see nothing that satisfies my curiosity. I'm not an expert in threading, but i have solid all the basic and intermediate concepts of it.
So does anyone know how a mutex is implemented?
A quick look at code apparently from one Linux distribution seems to indicate that it is implemented using an interlocked compare and exchange. So, in some sense, the OS isn't really implementing it since the interlocked operation is probably handled at the hardware level.
Edit As Hans points out, the interlocked exchange does the compare and exchange in an atomic manner. Here is documentation for the Windows version. For fun, I just now wrote a small test to show a really simple example of creating a mutex like that. This is a simple acquire and release test.
#include <windows.h>
#include <assert.h>
#include <stdio.h>
struct homebrew {
LONG *mutex;
int *shared;
int mine;
};
#define NUM_THREADS 10
#define NUM_ACQUIRES 100000
DWORD WINAPI SomeThread( LPVOID lpParam )
{
struct homebrew *test = (struct homebrew*)lpParam;
while ( test->mine < NUM_ACQUIRES ) {
// Test and set the mutex. If it currently has value 0, then it
// is free. Setting 1 means it is owned. This interlocked function does
// the test and set as an atomic operation
if ( 0 == InterlockedCompareExchange( test->mutex, 1, 0 )) {
// this tread now owns the mutex. Increment the shared variable
// without an atomic increment (relying on mutex ownership to protect it)
(*test->shared)++;
test->mine++;
// Release the mutex (4 byte aligned assignment is atomic)
*test->mutex = 0;
}
}
return 0;
}
int main( int argc, char* argv[] )
{
LONG mymutex = 0; // zero means
int shared = 0;
HANDLE threads[NUM_THREADS];
struct homebrew test[NUM_THREADS];
int i;
// Initialize each thread's structure. All share the same mutex and a shared
// counter
for ( i = 0; i < NUM_THREADS; i++ ) {
test[i].mine = 0; test[i].shared = &shared; test[i].mutex = &mymutex;
}
// create the threads and then wait for all to finish
for ( i = 0; i < NUM_THREADS; i++ )
threads[i] = CreateThread(NULL, 0, SomeThread, &test[i], 0, NULL);
for ( i = 0; i < NUM_THREADS; i++ )
WaitForSingleObject( threads[i], INFINITE );
// Verify all increments occurred atomically
printf( "shared = %d (%s)\n", shared,
shared == NUM_THREADS * NUM_ACQUIRES ? "correct" : "wrong" );
for ( i = 0; i < NUM_THREADS; i++ ) {
if ( test[i].mine != NUM_ACQUIRES ) {
printf( "Thread %d cheated. Only %d acquires.\n", i, test[i].mine );
}
}
}
If I comment out the call to the InterlockedCompareExchange call and just let all threads run the increments in a free-for-all fashion, then the results do result in failures. Running it 10 times, for example, without the interlocked compare call:
shared = 748694 (wrong)
shared = 811522 (wrong)
shared = 796155 (wrong)
shared = 825947 (wrong)
shared = 1000000 (correct)
shared = 795036 (wrong)
shared = 801810 (wrong)
shared = 790812 (wrong)
shared = 724753 (wrong)
shared = 849444 (wrong)
The curious thing is that one time the results showed now incorrect contention. That might be because there is no "everyone start now" synchronization; maybe all threads started and finished in order in that case. But when I have the InterlockedExchangeCall in place, it runs without failure (or at least it ran 100 times without failure ... that doesn't prove I didn't write a subtle bug into the example).
Here is the discussion from the people who implemented it ... very interesting as it shows the tradeoffs ..
Several posts from Linus T ... of course
In earlier days pre-POSIX etc I used to implement synchronization by using a native mode word (e.g. 16 or 32 bit word) and the Test And Set instruction lurking on every serious processor. This instruction guarantees to test the value of a word and set it in one atomic instruction. This provides the basis for a spinlock and from that a hierarchy of synchronization functions could be built. The simplest is of course just a spinlock which performs a busy wait, not an option for more than transitory sync'ing, then a spinlock which drops the process time slice at each iteration for a lower system impact. Notional concepts like Semaphores, Mutexes, Monitors etc can be built by getting into the kernel scheduling code.
As I recall the prime usage was to implement message queues to permit multiple clients to access a database server. Another was a very early real time car race result and timing system on a quite primitive 16 bit machine and OS.
These days I use Pthreads and Semaphores and Windows Events/Mutexes (mutices?) etc and don't give a thought as to how they work, although I must admit that having been down in the engine room does give one and intuitive feel for better and more efficient multiprocessing.
In windows world.
The mutex before the windows vista mas implemented with a Compare Exchange to change the state of the mutex from Empty to BeingUsed, the other threads that entered the wait on the mutex the CAS will obvious fail and it must be added to the mutex queue for furder notification. Those operations (add/remove/check) of the queue would be protected by an common lock in windows kernel.
After Windows XP, the mutex started to use a spin lock for performance reasons being a self-suficiant object.
In unix world i didn't get much furder but probably is very similar to the windows 7.
Finally for kernels that work on a single processor the best way is to disable the interrupts when entering the critical section and re-enabling then when exiting.

Linux synchronization with FIFO waiting queue

Are there locks in Linux where the waiting queue is FIFO? This seems like such an obvious thing, and yet I just discovered that pthread mutexes aren't FIFO, and semaphores apparently aren't FIFO either (I'm working on kernel 2.4 (homework))...
Does Linux have a lock with FIFO waiting queue, or is there an easy way to make one with existing mechanisms?
Here is a way to create a simple queueing "ticket lock", built on pthreads primitives. It should give you some ideas:
#include <pthread.h>
typedef struct ticket_lock {
pthread_cond_t cond;
pthread_mutex_t mutex;
unsigned long queue_head, queue_tail;
} ticket_lock_t;
#define TICKET_LOCK_INITIALIZER { PTHREAD_COND_INITIALIZER, PTHREAD_MUTEX_INITIALIZER }
void ticket_lock(ticket_lock_t *ticket)
{
unsigned long queue_me;
pthread_mutex_lock(&ticket->mutex);
queue_me = ticket->queue_tail++;
while (queue_me != ticket->queue_head)
{
pthread_cond_wait(&ticket->cond, &ticket->mutex);
}
pthread_mutex_unlock(&ticket->mutex);
}
void ticket_unlock(ticket_lock_t *ticket)
{
pthread_mutex_lock(&ticket->mutex);
ticket->queue_head++;
pthread_cond_broadcast(&ticket->cond);
pthread_mutex_unlock(&ticket->mutex);
}
If you are asking what I think you are asking the short answer is no. Threads/processes are controlled by the OS scheduler. One random thread is going to get the lock, the others aren't. Well, potentially more than one if you are using a counting semaphore but that's probably not what you are asking.
You might want to look at pthread_setschedparam but it's not going to get you where I suspect you want to go.
You could probably write something but I suspect it will end up being inefficient and defeat using threads in the first place since you will just end up randomly yielding each thread until the one you want gets control.
Chances are good you are just thinking about the problem in the wrong way. You might want to describe your goal and get better suggestions.
I had a similar requirement recently, except dealing with multiple processes. Here's what I found:
If you need 100% correct FIFO ordering, go with caf's pthread ticket lock.
If you're happy with 99% and favor simplicity, a semaphore or a mutex can do really well actually.
Ticket lock can be made to work across processes:
You need to use shared memory, process-shared mutex and condition variable, handle processes dying with the mutex locked (-> robust mutex) ... Which is a bit overkill here, all I need is the different instances don't get scheduled at the same time and the order to be mostly fair.
Using a semaphore:
static sem_t *sem = NULL;
void fifo_init()
{
sem = sem_open("/server_fifo", O_CREAT, 0600, 1);
if (sem == SEM_FAILED) fail("sem_open");
}
void fifo_lock()
{
int r;
struct timespec ts;
if (clock_gettime(CLOCK_REALTIME, &ts) == -1) fail("clock_gettime");
ts.tv_sec += 5; /* 5s timeout */
while ((r = sem_timedwait(sem, &ts)) == -1 && errno == EINTR)
continue; /* Restart if interrupted */
if (r == 0) return;
if (errno == ETIMEDOUT) fprintf(stderr, "timeout ...\n");
else fail("sem_timedwait");
}
void fifo_unlock()
{
/* If we somehow end up with more than one token, don't increment the semaphore... */
int val;
if (sem_getvalue(sem, &val) == 0 && val <= 0)
if (sem_post(sem)) fail("sem_post");
usleep(1); /* Yield to other processes */
}
Ordering is almost 100% FIFO.
Note: This is with a 4.4 Linux kernel, 2.4 might be different.

Resources