I'm doing a project for my OS exam. The picture is I have a process divided in:
1 thread producer that push messages in a queue.
n thread consumer that pop messages from the head of the queue.
1 thread collector that says to the producer to generate another set of tasks.
My initial design in pseudocode is this (if you need real code i can make a .tar with Makefile and all headers...).
producer
while(1) {
pushTasks(); /* push messages in the queue */
waitCollector(); /* wait on a CV the signal of the collector */
broadcastToWorkers(); /* tells to all workers to start a new elaboration */
}
all consumers do
while(1)
{
while(1)
{
popTask();
doTask();
if(message of end of stream) break;
}
signalToCollector(); /* increment a count and signal on a CV */
waitProducer(); /* wait signal from producer to restart the elaboration */
}
collector, it have to synchronize time (imagine each unit of time occurs when all tasks are done)
while(1) {
doStuff();
waitWorkers(); /* each worker increments a count when it's done... */
signalToProducer(); /* tells the producer to generate new tasks*/
}
But I got a deadlock somewhere. Do you know how many mutex and condition variables should I use? What are the conditions that make each thread do pthread_cond_signal or pthread_cond_wait?
Related
I got a question while I'm doing for ray-tracing stuff.
I have created multiple threads to split whole image to be processed and let it process its allocated task. Threads work well as it is intended. I would like to monitor the work progress in real-time.
To resolve this problem, I have created one more thread to monitor current state.
Here is the monitoring pseudo-code:
/* Global var */
int cnt = 0; // count the number of row processed
void* render_disp(void* arg){ // thread for monitoring current render-processing
/* monitoring global variable and calculate percentage to display */
double result = 100.*cnt/(h-1);
fprintf(stderr,"\r3.2%f%% of image is processed!", result);
}
void* process(void* arg){ // multiple threads work here
// Rendering process
for(........)
pthread_mutex_lock(&lock);
cnt++;
pthread_mutex_unlock(&lock);
for(........)
}
I wrote the code for initialization of pthread and mutex in main() function.
Basically, I think this monitoring thread should display current state but this thread seems to be called only once and quit.
How do I change this code to this thread function to be called until the whole rendering is finished?
I see that when the schedule_work function is invoked it will not put the work task into the queue if it is already queued. However I want to queue the same task to be run multiple times even if it is already on the queue. How can I do this?
From workqueue.h:
/**
* schedule_work - put work task in global workqueue
* #work: job to be done
*
* Returns %false if #work was already on the kernel-global workqueue and
* %true otherwise.
*
* This puts a job in the kernel-global workqueue if it was not already
* queued and leaves it in the same position on the kernel-global
* workqueue otherwise.
*/
static inline bool schedule_work(struct work_struct *work)
Workqueue expects every work structure to represent single "task", which is needed to be run once.
So, then simplest way to run a task several times - create new work structure every time.
Alternatively, as repeating the work while it is running is something unusual for workqueue, you may create your own kernel thread for execute some function repeatedly:
DECLARE_WAITQUEUE(repeat_wq); // Kernel thread will wait on this workqueue.
int n_works = 0; // Number of work requests to process.
// Thread function
void repeat_work(void* unused)
{
spin_lock_irq(repeat_wq.lock); // Reuse workqueue's spinlock for our needs
while(1) {
// Wait until work request or thread should be stopped
wait_event_interruptible_locked(&repeat_wq,
n_works || kthread_should_stop());
if(kthread_should_stop()) break;
spin_unlock_irq(repeat_wq.lock);
<do the work>
// Acquire the lock for decrement count and recheck condition
spin_lock_irq(repeat_wq.lock);
n_works--;
}
// Finally release the lock
spin_unlock_irq(repeat_wq.lock);
}
// Request new work.
void add_work(void)
{
unsigned long flags;
spin_lock_irqsave(repeat_wq.lock, flags);
n_works++;
wake_up_locked(&repeat_wq);
spin_unlock_irqrestore(repeat_wq.lock, flags);
}
Workqueues are kernel threads too, with a specific thread function kthread_worker_fn().
I am using boost ASIO for a TCP client. for the most part the ASIO is a glorified event loop for read and write. There is actually only one client managed by the ASIO.
The architecture is like this -
The TCP server streams continuous messages. The Client will read the messages, process it and ack back with proper code.
My code runs in client side. There is one thread running io_service. The io_service thread reads messages and distributes it to N number of worker threads using a boost lockfree SPSC queue. The workers after processing posts the replies to the io_service thread.
most important concern for me is the rate of read and write. So I am using synchronous reads and writes.
Read Code:
void read ()
{
if (_connected && !_readInProgress) {
_socket.async_read_some(boost::asio::null_buffers(),
make_boost_alloc_handler(_readAllocator,
[self = shared_from_this(), this] (ErrorType err, unsigned a)
{
connection()->handleRead(err);
_readInProgress = false;
if (err) disconnect();
else asyncRead();
});
_readInProgress = true;
}
}
Basically I use read_some with nullbuffer() and then directly use Unix system calls to read the messages. The read give N number of messages which are enqueued to threads in a loop.
I want use the boost SPSC queue in the reverse direction for writes to the socket from workers.
Write:
// Get the queue to post writes
auto getWriteQ ()
{
static thread_local auto q =
std::make_shared< LFQType >(_epoch);
return q;
}
So each thread gets a thread-local Q using getWriteQ. The writes to the queue looks like this:
void write (Buf& buf) override
{
auto q = getWriteQ();
while (!q->enqueue(buf) && _connected);
if (!_connected) return;
_ioService.post( [self = shared_from_this(), this, q]()
{
writeHelper(q); });
}
}
Now this is inefficent, as we do a ioservice post for each write. The write handler at a time actually writes upto 32 messages in a single system-call using sendmmsg()
So I am looking for help with 2 things:
Is the design any good?
Any fool proof way to minimize the no. of posts. I was thinking of keep an atomic enqueue count. The worker thread will do this -
the writing thread does this - (Pseudo code)
bool post = false;
if(enqueue_count == 0) post = true
// enqueue the message
++enqueue_count
if(post)
// post the queue event
The io-service thread does this -
enqueue_count -= num_processed;
if (enqueue_count)
// repost the queue for further processing
Would this work if the enqueue_count is atomic ?
I have N threads performing various task and these threads must be regularly synchronized with a thread barrier as illustrated below with 3 thread and 8 tasks. The || indicates the temporal barrier, all threads have to wait until the completion of 8 tasks before starting again.
Thread#1 |----task1--|---task6---|---wait-----||-taskB--| ...
Thread#2 |--task2--|---task5--|-------taskE---||----taskA--| ...
Thread#3 |-task3-|---task4--|-taskG--|--wait--||-taskC-|---taskD ...
I couldn’t find a workable solution, thought the little book of Semaphores http://greenteapress.com/semaphores/index.html was inspiring. I came up with a solution using std::atomic shown below which “seems” to be working using three std::atomic.
I am worried about my code breaking down on corner cases hence the quoted verb. So can you share advise on verification of such code? Do you have a simpler fool proof code available?
std::atomic<int> barrier1(0);
std::atomic<int> barrier2(0);
std::atomic<int> barrier3(0);
void my_thread()
{
while(1) {
// pop task from queue
...
// and execute task
switch(task.id()) {
case TaskID::Barrier:
barrier2.store(0);
barrier1++;
while (barrier1.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier3.store(0);
barrier2++;
while (barrier2.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier1.store(0);
barrier3++;
while (barrier3.load() != NUM_THREAD) {
std::this_thread::yield();
}
break;
case TaskID::Task1:
...
}
}
}
Boost offers a barrier implementation as an extension to the C++11 standard thread library. If using Boost is an option, you should look no further than that.
If you have to rely on standard library facilities, you can roll your own implementation based on std::mutex and std::condition_variable without too much of a hassle.
class Barrier {
int wait_count;
int const target_wait_count;
std::mutex mtx;
std::condition_variable cond_var;
Barrier(int threads_to_wait_for)
: wait_count(0), target_wait_count(threads_to_wait_for) {}
void wait() {
std::unique_lock<std::mutex> lk(mtx);
++wait_count;
if(wait_count != target_wait_count) {
// not all threads have arrived yet; go to sleep until they do
cond_var.wait(lk,
[this]() { return wait_count == target_wait_count; });
} else {
// we are the last thread to arrive; wake the others and go on
cond_var.notify_all();
}
// note that if you want to reuse the barrier, you will have to
// reset wait_count to 0 now before calling wait again
// if you do this, be aware that the reset must be synchronized with
// threads that are still stuck in the wait
}
};
This implementation has the advantage over your atomics-based solution that threads waiting in condition_variable::wait should get send to sleep by your operating system's scheduler, so you don't block CPU cores by having waiting threads spin on the barrier.
A few words on resetting the barrier: The simplest solution is to just have a separate reset() method and have the user ensure that reset and wait are never invoked concurrently. But in many use cases, this is not easy to achieve for the user.
For a self-resetting barrier, you have to consider races on the wait count: If the wait count is reset before the last thread returned from wait, some threads might get stuck in the barrier. A clever solution here is to not have the terminating condition depend on the wait count variable itself. Instead you introduce a second counter, that is only increased by the thread calling the notify. The other threads then observe that counter for changes to determine whether to exit the wait:
void wait() {
std::unique_lock<std::mutex> lk(mtx);
unsigned int const current_wait_cycle = m_inter_wait_count;
++wait_count;
if(wait_count != target_wait_count) {
// wait condition must not depend on wait_count
cond_var.wait(lk,
[this, current_wait_cycle]() {
return m_inter_wait_count != current_wait_cycle;
});
} else {
// increasing the second counter allows waiting threads to exit
++m_inter_wait_count;
cond_var.notify_all();
}
}
This solution is correct under the (very reasonable) assumption that all threads leave the wait before the inter_wait_count overflows.
With atomic variables, using three of them for a barrier is simply overkill that only serves to complicate the issue. You know the number of threads, so you can simply atomically increment a single counter every time a thread enters the barrier, and then spin until the counter becomes greater or equal to N. Something like this:
void barrier(int N) {
static std::atomic<unsigned int> gCounter = 0;
gCounter++;
while((int)(gCounter - N) < 0) std::this_thread::yield();
}
If you don't have more threads than CPU cores and a short expected waiting time, you might want to remove the call to std::this_thread::yield(). This call is likely to be really expensive (more than a microsecond, I'd wager, but I haven't measured it). Depending on the size of your tasks, this may be significant.
If you want to do repeated barriers, just increment the N as you go:
unsigned int lastBarrier = 0;
while(1) {
switch(task.id()) {
case TaskID::Barrier:
barrier(lastBarrier += processCount);
break;
}
}
I would like to point out that in the solution given by #ComicSansMS ,
wait_count should be reset to 0 before executing cond_var.notify_all();
This is because when the barrier is called a second time the if condition will always fail, if wait_count is not reset to 0.
I am writing an network application.
and have some problem regarding thread race condition.
"cd" is a socket descriptor.
one of my thread retrieves socket descriptor
and send some data through the socket.
lets say map_sd returns 5.
however another thread might close the socket 5 and
reassign another. which will destroy the logic of program.
// wait until there is valid descriptor mapping
while( !(cd = map_sd( sd )) ){
sleep(1);
}
// forward PAYLOAD header
if( send(cd, &payload, sizeof(PAYLOAD), MSG_NOSIGNAL) < 0 ){
printf("send fail 813\n");
}
what I want is to make the code above "atomic"
how can I do this when I am using pthread library in linux??
thank you in advance.
You need a condition variable and a mutex:
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
In the thread passing cd:
pthread_mutex_lock(&lock);
/* here you pass cd through a data structure or whatever */
pthread_cond_signal(&cond);
pthread_mutex_unlock(&lock);
In the thread waiting for cd:
pthread_mutex_lock(&lock);
if (pthread_cond_wait(&cond, &lock) != 0) {
/* handle error */
}
/* here you can acquire cd */
pthread_mutex_unlock(&lock);
That should be it - you use a condition variable and a lock for exclusive access and notifying the other thread that a resource is now available. pthread_cond_wait() releases the lock to wait and will reacquire it once it has been notified by the other thread with pthread_cond_signal().
Edit: formatting.