According to my understanding, a semaphore should be usable across related processes without it being placed in shared memory. If so, why does the following code deadlock?
#include <iostream>
#include <semaphore.h>
#include <sys/wait.h>
using namespace std;
static int MAX = 100;
int main(int argc, char* argv[]) {
int retval;
sem_t mutex;
cout << sem_init(&mutex, 1, 0) << endl;
pid_t pid = fork();
if (0 == pid) {
// sem_wait(&mutex);
cout << endl;
for (int i = 0; i < MAX; i++) {
cout << i << ",";
}
cout << endl;
sem_post(&mutex);
} else if(pid > 0) {
sem_wait(&mutex);
cout << endl;
for (int i = 0; i < MAX; i++) {
cout << i << ",";
}
cout << endl;
// sem_post(&mutex);
wait(&retval);
} else {
cerr << "fork error" << endl;
return 1;
}
// sem_destroy(&mutex);
return 0;
}
When I run this on Gentoo/Ubuntu Linux, the parent hangs. Apparently, it did not receive the post by child. Uncommenting sem_destroy won't do any good. Am I missing something?
Update 1:
This code works
mutex = (sem_t *) mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, 0, 0);
if (!mutex) {
perror("out of memory\n");
exit(1);
}
Thanks,
Nilesh.
The wording in the manual page is kind of ambiguous.
If pshared is nonzero, then the semaphore is shared between processes,
and should be located in a region of shared memory.
Since a child created by fork(2) inherits its parent's memory
mappings, it can also access the semaphore.
Yes, but it still has to be in a shared region. Otherwise the memory simply gets copied with the usual CoW and that's that.
You can solve this in at least two ways:
Use sem_open("my_sem", ...)
Use shm_open and mmap to create a shared region
An excellent article on this topic, for future passers-by:
http://blog.superpat.com/2010/07/14/semaphores-on-linux-sem_init-vs-sem_open/
Related
I am looking at multithreading and written a basic producer/consumer. I have two issues with the producer/consumer written below. 1) Even by setting the consumer sleep time lower than the producer sleep time, the producer still seems to execute quicker. 2) In the consumer I have duplicated the code in the case where the producer finishes adding to the queue, but there is still elements in the queue. Any advise for a better way of structuring the code?
#include <iostream>
#include <queue>
#include <mutex>
class App {
private:
std::queue<int> m_data;
bool m_bFinished;
std::mutex m_Mutex;
int m_ConsumerSleep;
int m_ProducerSleep;
int m_QueueSize;
public:
App(int &MaxQueue) :m_bFinished(false), m_ConsumerSleep(1), m_ProducerSleep(5), m_QueueSize(MaxQueue){}
void Producer() {
for (int i = 0; i < m_QueueSize; ++i) {
std::lock_guard<std::mutex> guard(m_Mutex);
m_data.push(i);
std::cout << "Producer Thread, queue size: " << m_data.size() << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(m_ProducerSleep));
}
m_bFinished = true;
}
void Consumer() {
while (!m_bFinished) {
if (m_data.size() > 0) {
std::lock_guard<std::mutex> guard(m_Mutex);
std::cout << "Consumer Thread, queue element: " << m_data.front() << " size: " << m_data.size() << std::endl;
m_data.pop();
}
else {
std::cout << "No elements, skipping" << std::endl;
}
std::this_thread::sleep_for(std::chrono::seconds(m_ConsumerSleep));
}
while (m_data.size() > 0) {
std::lock_guard<std::mutex> guard(m_Mutex);
std::cout << "Emptying remaining elements " << m_data.front() << std::endl;
m_data.pop();
std::this_thread::sleep_for(std::chrono::seconds(m_ConsumerSleep));
}
}
};
int main()
{
int QueueElements = 10;
App app(QueueElements);
std::thread consumer_thread(&App::Consumer, &app);
std::thread producer_thread(&App::Producer, &app);
producer_thread.join();
consumer_thread.join();
std::cout << "loop exited" << std::endl;
return 0;
}
You should use condition_variable. Don't use sleep for threads.
Main scheme:
Producer pushes value under lock and signals condition_variable.
Consumer waits under lock on condition variable and checks predicate to prevent spurious wakeups.
My version:
#include <iostream>
#include <queue>
#include <mutex>
#include <thread>
#include <condition_variable>
#include <atomic>
class App {
private:
std::queue<int> m_data;
std::atomic_bool m_bFinished;
std::mutex m_Mutex;
std::condition_variable m_cv;
int m_QueueSize;
public:
App(int MaxQueue)
: m_bFinished(false)
, m_QueueSize(MaxQueue)
{}
void Producer()
{
for (int i = 0; i < m_QueueSize; ++i)
{
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_data.push(i);
}
m_cv.notify_one();
std::cout << "Producer Thread, queue size: " << m_data.size() << std::endl;
}
m_bFinished = true;
}
void Consumer()
{
do
{
std::unique_lock<std::mutex> lock(m_Mutex);
while (m_data.empty())
{
m_cv.wait(lock, [&](){ return !m_data.empty(); }); // predicate an while loop - protection from spurious wakeups
}
while(!m_data.empty()) // consume all elements from queue
{
std::cout << "Consumer Thread, queue element: " << m_data.front() << " size: " << m_data.size() << std::endl;
m_data.pop();
}
} while(!m_bFinished);
}
};
int main()
{
int QueueElements = 10;
App app(QueueElements);
std::thread consumer_thread(&App::Consumer, &app);
std::thread producer_thread(&App::Producer, &app);
producer_thread.join();
consumer_thread.join();
std::cout << "loop exited" << std::endl;
return 0;
}
Also note, that it's better to use atomic for end flag, when you have deal with concurrent threads, because theoretically value of the m_bFinished will be stored in the cache-line and if there is no cache invalidation in the producer thread, the changed value can be unseen from the consumer thread. Atomics have memory fences, that guarantees, that value will be updated for other threads.
Also you can take a look on memory_order page.
First, you should use a condition variable instead of a delay on the consumer. This way, the consumer thread only wakes up when the queue is not empty and the producer notifies it.
That said, the reason why your producer calls are more frequent is the delay on the producer thread. It's executed while holding the mutex, so the consumer will never execute until the delay is over. You should release the mutex before calling sleep_for:
for (int i = 0; i < m_QueueSize; ++i) {
/* Introduce a scope to release the mutex before sleeping*/
{
std::lock_guard<std::mutex> guard(m_Mutex);
m_data.push(i);
std::cout << "Producer Thread, queue size: " << m_data.size() << std::endl;
} // Mutex is released here
std::this_thread::sleep_for(std::chrono::seconds(m_ProducerSleep));
}
Is there an alternative way to be sure that the threads are ready to recieve the broadcast signal. I want to replace the Sleep(1) function in main.
#include <iostream>
#include <pthread.h>
#define NUM 4
using namespace std;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_t tid[NUM];
void *threads(void *arg){
int tid = (int)arg;
while(true){
pthread_mutex_lock(&mutex);
pthread_cond_wait(&cond,&mutex);
//do some work
cout<<"Thread: "<<tid<<endl;;
pthread_mutex_unlock(&mutex);
}
}
int main(){
for(int i=0;i<NUM;i++){
pthread_create(&tid[i],NULL,threads,(void*)i);
}
Sleep(1);
pthread_cond_broadcast(&cond);
Sleep(1);
pthread_cond_broadcast(&cond);
Sleep(1);
pthread_cond_broadcast(&cond);
return 0;
}
I tried memory barriers before pthread_cond_wait and i thought of using an counter, but nothing worked for me yet.
Condition variables are usually connected to a predicate. In the other threads, check if predicate is already fulfilled (check while holding the mutex protecting the predicate), if so, do not wait on the condition variable. In main, acquire mutex, change predicate while holding the mutex. Then release mutex and signal or broadcast on the condvar. Here is a similar question:
Synchronisation before pthread_cond_broadcast
Here is some example code:
#include <iostream>
#include <pthread.h>
#include <unistd.h>
#include <cassert>
#define NUM 4
#define SIZE 256
using std::cout;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_t tid[NUM];
int work_available;
void *threads(void *arg)
{
int tid = *((int*)arg);
while (1) {
pthread_mutex_lock(&mutex);
while (work_available == 0) {
// While loop since cond_wait can have spurious wakeups.
pthread_cond_wait(&cond, &mutex);
cout << "Worker " << tid << " woke up...\n";
cout << "Work available: " << work_available << '\n';
}
if (work_available == -1) {
cout << "Worker " << tid << " quitting\n";
pthread_mutex_unlock(&mutex); // Easy to forget, better to use C++11 RAII mutexes.
break;
}
assert(work_available > 0);
work_available--;
cout << "Worker " << tid << " took one item of work\n";
pthread_mutex_unlock(&mutex);
//do some work
sleep(2); // simulated work
pthread_mutex_lock(&mutex);
cout << "Worker " << tid << " done with one item of work.\n";
pthread_mutex_unlock(&mutex);
}
}
int main()
{
work_available = 0;
int args[NUM];
for (int i=0; i<NUM; i++) {
args[i] = i;
pthread_create(&tid[i], NULL, threads, (void*)&args[i]);
}
const int MAX_TIME = 10;
for (int i = 0; i < MAX_TIME; i++)
{
pthread_mutex_lock(&mutex);
work_available++;
cout << "Main thread, work available: " << work_available << '\n';
pthread_mutex_unlock(&mutex);
pthread_cond_broadcast(&cond);
sleep(1);
}
pthread_mutex_lock(&mutex);
cout << "Main signalling threads to quit\n";
work_available = -1;
pthread_mutex_unlock(&mutex);
pthread_cond_broadcast(&cond);
for (int i = 0; i < NUM; i++)
{
pthread_join(tid[i], NULL);
}
return 0;
}
As far as I know, such use of static storage within lambda is legal. Essentially it counts number of entries into the closure:
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>
typedef std::pair<int,int> mypair;
std::ostream &operator<< (std::ostream &os, mypair const &data) {
return os << "(" << data.first << ": " << data.second << ") ";
}
int main()
{
int n;
std::vector<mypair> v;
std::cin >> n;
v.reserve(n);
std::for_each(std::begin(v), std::end(v), [](mypair& x) {
static int i = 0;
std::cin >> x.second;
x.first = i++;
});
std::for_each(std::begin(v), std::end(v), [](mypair& x) {
std::cout << x;
});
return 0;
}
Let assume I have a container 'workers' of threads.
std::vector<std::thread> workers;
for (int i = 0; i < 5; i++) {
workers.push_back(std::thread([]()
{
std::cout << "thread #" << "start\n";
doLengthyOperation();
std::cout << "thread #" << "finish\n";
}));
}
Code in doLengthyOperation() is contained and self-sufficient operation, akin a new process creation.
Provided I join them using for_each and the stored variable in question must count number of active tasks, not just number of entries, what possible implementations for such counter are there, if I want to avoid to rely onto global variables to avoid someone else messing up with it and allowing automatic support for separate "flavors" of threads.
std::for_each(workers.begin(), workers.end(), [](std::thread &t)
{
t.join();
});
Surrounding scope would die quickly after finishing thread starts, may repeat , adding new threads to the container is possible, and that must be global variable, which I want to avoid. More of, the whole operation is a template
The best way to handle this is to capture an instance of std::atomic<int> which provides a thread safe counter. Depending on the lifetime of lambdas and the surrounding scope, you may wish to capture by reference or shared pointer.
To take your example:
std::vector<std::thread> workers;
auto counter = std::make_shared<std::atomic<int>>(0);
for (int i = 0; i < 5; i++) {
workers.push_back(std::thread([counter]()
{
std::cout << "thread #" << "start\n";
(*counter)++;
doLengthyOperation();
(*counter)--;
std::cout << "thread #" << "finish\n";
}));
}
While studying the possibility of improving Recoll performance by using vfork() instead of fork(), I've encountered a fork() issue which I can't explain.
Recoll repeatedly execs external commands to translate files, so that's what the sample program does: it starts threads which repeatedly execute "ls" and read back the output.
The following problem is not a "real" one, in the sense that an actual program would not do what triggers the issue. I just stumbled on it while having a look at what threads were stopped or not between fork()/vfork() and exec().
When I have one of the threads busy-looping between fork() and exec(), the other thread never completes the data reading: the last read(), which should indicate eof, is blocked forever or until the other thread's looping ends (at which point everything resumes normally, which you can see by replacing the infinite loop with one which completes). While read() is blocked, the "ls" command has exited (ps shows <defunct>, a zombie).
There is a random aspect to the issue, but the sample program "succeeds" most of the time. I tested with Linux kernels 3.2.0 (Debian), 3.13.0 (Ubuntu) and 3.19 (Ubuntu). Works on a VM, but you need at least 2 procs, I could not make it work with one processor.
Here follows the sample program, I can't see what I'm doing wrong.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <memory.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <pthread.h>
#include <iostream>
using namespace std;
struct thread_arg {
int tnum;
int loopcount;
const char *cmd;
};
void* task(void *rarg)
{
struct thread_arg *arg = (struct thread_arg *)rarg;
const char *cmd = arg->cmd;
for (int i = 0; i < arg->loopcount; i++) {
pid_t pid;
int pipefd[2];
if (pipe(pipefd)) {
perror("pipe");
exit(1);
}
pid = fork();
if (pid) {
cerr << "Thread " << arg->tnum << " parent " << endl;
if (pid < 0) {
perror("fork");
exit(1);
}
} else {
// Child code. Either exec ls or loop (thread 1)
if (arg->tnum == 1) {
cerr << "Thread " << arg->tnum << " looping" <<endl;
for (;;);
//for (int cc = 0; cc < 1000 * 1000 * 1000; cc++);
} else {
cerr << "Thread " << arg->tnum << " child" <<endl;
}
close(pipefd[0]);
if (pipefd[1] != 1) {
dup2(pipefd[1], 1);
close(pipefd[1]);
}
cerr << "Thread " << arg->tnum << " child calling exec" <<
endl;
execlp(cmd, cmd, NULL);
perror("execlp");
_exit(255);
}
// Parent closes write side of pipe
close(pipefd[1]);
int ntot = 0, nread;
char buf[1000];
while ((nread = read(pipefd[0], buf, 1000)) > 0) {
ntot += nread;
cerr << "Thread " << arg->tnum << " nread " << nread << endl;
}
cerr << "Total " << ntot << endl;
close(pipefd[0]);
int status;
cerr << "Thread " << arg->tnum << " waiting for process " << pid
<< endl;
if (waitpid(pid, &status, 0) != -1) {
if (status) {
cerr << "Child exited with status " << status << endl;
}
} else {
perror("waitpid");
}
}
return 0;
}
int main(int, char **)
{
int loopcount = 5;
const char *cmd = "ls";
cerr << "cmd [" << cmd << "]" << " loopcount " << loopcount << endl;
const int nthreads = 2;
pthread_t threads[nthreads];
for (int i = 0; i < nthreads; i++) {
struct thread_arg *arg = new struct thread_arg;
arg->tnum = i;
arg->loopcount = loopcount;
arg->cmd = cmd;
int err;
if ((err = pthread_create(&threads[i], 0, task, arg))) {
cerr << "pthread_create failed, err " << err << endl;
exit(1);
}
}
void *status;
for (int i = 0; i < nthreads; i++) {
pthread_join(threads[i], &status);
if (status) {
cerr << "pthread_join: " << status << endl;
exit(1);
}
}
}
What's happening is that your pipes are getting inherited by both child processes instead of just one.
What you want to do is:
Create pipe with 2 ends
fork(), child inherits both ends of the pipe
child closes the read end, parent closes the write end
...so that the child ends up with just one end of one pipe, which is dup2()'ed to stdout.
But your threads race with each other, so what can happen is this:
Thread 1 creates pipe with 2 ends
Thread 0 creates pipe with 2 ends
Thread 1 fork()s. The child process has inherited 4 file descriptors, not 2!
Thread 1's child closes the read end of the pipe that thread 1 opened, but it keeps a reference to the read end and write end of thread 0's pipe too.
Later, thread 0 waits forever because it never gets an EOF on the pipe it is reading because the write end of that pipe is still held open by thread 1's child.
You will need to define a critical section that starts before pipe(), encloses the fork(), and ends after close() in the parent, and enter that critical section from only one thread at a time using a mutex.
I have a hard problem here, which I can not solve and do not find the right answer on the net:
I have created a detached thread with a clean up routing, the problem is that on my Imac and Ubuntu 9.1 (Dual Core). I am not able to correctly cancel the detached thread in the fallowing code:
#include <iostream>
#include <pthread.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
pthread_mutex_t mutex_t;
using namespace std;
static void cleanup(void *arg){
pthread_mutex_lock(&mutex_t);
cout << " doing clean up"<<endl;
pthread_mutex_unlock(&mutex_t);
}
static void *thread(void *aArgument)
{
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED,NULL);
pthread_cleanup_push(&cleanup,NULL);
int n=0;
while(1){
pthread_testcancel();
sched_yield();
n++;
pthread_mutex_lock(&mutex_t);
cout << " Thread 2: "<< n<<endl; // IF I remove this endl; --> IT WORKS!!??
pthread_mutex_unlock(&mutex_t);
}
pthread_cleanup_pop(0);
return NULL;
}
int main()
{
pthread_t thread_id;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED);
int error;
if (pthread_mutex_init(&mutex_t,NULL) != 0) return 1;
if (pthread_create(&thread_id, &attr, &(thread) , NULL) != 0) return 1;
pthread_mutex_lock(&mutex_t);
cout << "waiting 1s for thread...\n" <<endl;
pthread_mutex_unlock(&mutex_t);
int n =0;
while(n<1E3){
pthread_testcancel();
sched_yield();
n++;
pthread_mutex_lock(&mutex_t);
cout << " Thread 1: "<< n<<endl;
pthread_mutex_unlock(&mutex_t);
}
pthread_mutex_lock(&mutex_t);
cout << "canceling thread...\n" <<endl;
pthread_mutex_unlock(&mutex_t);
if (pthread_cancel(thread_id) == 0)
{
//This doesn't wait for the thread to exit
pthread_mutex_lock(&mutex_t);
cout << "detaching thread...\n"<<endl;
pthread_mutex_unlock(&mutex_t);
pthread_detach(thread_id);
while (pthread_kill(thread_id,0)==0)
{
sched_yield();
}
pthread_mutex_lock(&mutex_t);
cout << "thread is canceled";
pthread_mutex_unlock(&mutex_t);
}
pthread_mutex_lock(&mutex_t);
cout << "exit"<<endl;
pthread_mutex_unlock(&mutex_t);
return 0;
}
When I replace the Cout with printf() i workes to the end "exit" , but with the cout (even locked) the executable hangs after outputting "detaching thread...
It would be very cool to know from a Pro, what the problem here is?.
Why does this not work even when cout is locked by a mutex!?
THE PROBELM lies in that COUT has a implicit cancelation point!
We need to code like this:
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE,NULL);
pthread_testcancel();
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,NULL);
and make the thread at the beginning :
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,NULL);
That ensures that only pthread_cancel() has a cancelation point...
Try commenting out the line pthread_detach(thread_id); and run it. You are creating the thread as detached with your pthread_attr_t.
Either that, or try passing NULL instead of &attr in the pthread_create (so that the thread is not created detached) and run it.
I would guess that if the timing is right, the (already detached) thread is gone by the time the main thread attempts the pthread_detach, and you are going off into Never Never Land in pthread_detach.
Edit:
If cout has an implicit cancelation point as Gabriel points out, then most likely what happens is that the thread cancels while holding the mutex (it never makes it to pthreads_unlock_mutex after the cout), and so anybody else waiting on the mutex will be blocked forever.
If the only resource you need to worry about is the mutex, you could keep track of whether or not your thread has it locked and then unlock it in the cleanup, assuming that cleanup runs in the same thread.
Take a look here, page 157 on: PThreads Primer.