Looking for an optimum multithread message queue - multithreading

I want to run several threads inside a process. I'm looking for the most efficient way of being able to pass messages between the threads.
Each thread would have a shared memory input message buffer. Other threads would write the appropriate buffer.
Messages would have priority. I want to manage this process myself.
Without getting into expensive locking or synchronizing, what's the best way to do this? Or is there already a well proven library available for this? (Delphi, C, or C# is fine).

This is hard to get right without repeating a lot of mistakes other people already made for you :)
Take a look at Intel Threading Building Blocks - the library has several well-designed queue templates (and other collections) that you can test and see which suits your purpose best.

If you are going to work with multiple threads, it is hard to avoid synchronisation. Fortunately it is not very hard.
For a single process, a Critical Section is frequently the best choice. It is fast and easy to use. For simplicity, I normally wrap it in a class to handle initialisation and cleanup.
#include <Windows.h>
class CTkCritSec
{
public:
CTkCritSec(void)
{
::InitializeCriticalSection(&m_critSec);
}
~CTkCritSec(void)
{
::DeleteCriticalSection(&m_critSec);
}
void Lock()
{
::EnterCriticalSection(&m_critSec);
}
void Unlock()
{
::LeaveCriticalSection(&m_critSec);
}
private:
CRITICAL_SECTION m_critSec;
};
You can make it even simpler using an "autolock" class you lock/unlock it.
class CTkAutoLock
{
public:
CTkAutoLock(CTkCritSec &lock)
: m_lock(lock)
{
m_lock.Lock();
}
virtual ~CTkAutoLock()
{
m_lock.Unlock();
}
private:
CTkCritSec &m_lock;
};
Anywhere you want to lock something, instantiate an autolock. When the function finishes, it will unlock. Also, if there is an exception, it will automatically unlock (giving exception safety).
Now you can make a simple message queue out of an std priority queue
#include <queue>
#include <deque>
#include <functional>
#include <string>
struct CMsg
{
CMsg(const std::string &s, int n=1)
: sText(s), nPriority(n)
{
}
int nPriority;
std::string sText;
struct Compare : public std::binary_function<bool, const CMsg *, const CMsg *>
{
bool operator () (const CMsg *p0, const CMsg *p1)
{
return p0->nPriority < p1->nPriority;
}
};
};
class CMsgQueue :
private std::priority_queue<CMsg *, std::deque<CMsg *>, CMsg::Compare >
{
public:
void Push(CMsg *pJob)
{
CTkAutoLock lk(m_critSec);
push(pJob);
}
CMsg *Pop()
{
CTkAutoLock lk(m_critSec);
CMsg *pJob(NULL);
if (!Empty())
{
pJob = top();
pop();
}
return pJob;
}
bool Empty()
{
CTkAutoLock lk(m_critSec);
return empty();
}
private:
CTkCritSec m_critSec;
};
The content of CMsg can be anything you like. Note that the CMsgQue inherits privately from std::priority_queue. That prevents raw access to the queue without going through our (synchronised) methods.
Assign a queue like this to each thread and you are on your way.
Disclaimer The code here was slapped together quickly to illustrate a point. It probably has errors and needs review and testing before being used in production.

Related

How to move/swap a std::vector efficiently and thread safe?

Imagine a thread which continuously writes to a vector of strings which is being collected every now and then by another thread (see code).
#include <string>
#include <vector>
#include <chrono>
#include <thread>
#include <iostream>
#include <cassert>
// some public vector being filled by one and consumed by another
// thread
static std::vector<std::string> buffer;
// continuously writes data to buffer (has to be fast)
static const auto filler(std::thread([] {
for (size_t i = 0;; ++i) {
buffer.push_back(std::to_string(i));
}
}));
// returns collected data and clears the buffer being written to
std::vector<std::string> fetch() {
return std::move(buffer);
}
// continuously fetch buffered data and process it (can be slow)
int main() {
size_t expected{};
for(;;) {
std::this_thread::sleep_for(std::chrono::seconds(1));
const auto fetched(fetch());
for (auto && e : fetched) {
size_t read(std::stoi(e));
std::cout << read << " " << expected << std::endl;
assert(read == expected);
++expected;
}
}
}
The provided example generally does what I want it to do but it crashes because it's not thread safe. Obvious approaches would be
to secure the shared vector using a lock_guard
using two buffers and an atomic pointer
using a thread safe vector implementation.
The provided scenario seems very simple to me. I don't think I need a thread safe vector because that would cover a lot more scenarios at the cost of performance.
Using a mutex or swapping between two instances of the vector seem plausible to me but I wonder if there is some solution specially made to 'atomically grab all data and leave an empty container'.
Maybe there's an obvious solution and it's just time to go to bed for me?
Important note: this question is somewhat academical since performance is not (necessarily) a real issue here. The provided example gets throttled by about 15% but there is hardly any 'real' work being done. I think in a real world example the benefit would be about 2-5%
First of all I would not recommend to have a non-const static variable. So I propose to encapsulate vector with a class with the following interface
class ValuesHolder
{
public:
void push_back(std::string value);
std::vector<std::string> take();
};
The second note about 'atomically grab all data and leave an empty container' - you could make this trick with swapping pointers but the main issue is that push_back should be in a sync with it (during the push_back is executed vector shouldn't be moved). Otherwise there may be issues with the following workflow
Thead 1 Thread 2
auto values = holder.take(); // push_back starts before take
for (const auto& value : values) // but value is inserted during the iteration
{...}
So the first option is just to lock during both calls:
class ValuesHolder
{
public:
void push_back(std::string value)
{
std::lock_guard<std::mutex> lock(mut);
values.push_back(std::move(value));
}
std::vector<std::string> take()
{
std::lock_guard<std::mutex> lock(mut);
return std::move(values);
}
private:
std::mutex mut;
std::vector<std::string> values;
};
Otherwise you could switch from std::vector to lock-free stack container. However the performance should be accurately measured since the number of allocations can increase, so the performance can be worser.

compare and swap using atomic_compare_exchange_weak

In this code is std::swap thread safe so it can be called from two execution threads at the same time or do I need use atomic_compare_exchange_weak() instead of swap()?
How do I know if this will work on all CPUs? I am happy if it just works on Intel CPUs.
#include <utility>
class resource {
int x = 0;
};
class foo
{
public:
foo() : p{new resource{}}
{ }
foo(const foo& other) : p{new resource{*(other.p)}}
{ }
foo(foo&& other) : p{other.p}
{
other.p = nullptr;
}
foo& operator=(foo other)
{
swap(*this, other);
return *this;
}
virtual ~foo()
{
delete p;
}
friend void swap(foo& first, foo& second)
{
using std::swap;
swap(first.p, second.p);
}
private:
resource* p;
};
I understand it is overkill to swap a pointer, but this migth be good pracise.
is std::swap thread safe so it can be called from two execution threads at the same time
std::swap is thread-safe as long as different threads pass different objects into it. Otherwise a race condition arises.

An Efficient Non-Enforcing, Verifying, Mutex

Class foo has a method bar. According to some synchronization protocol, the bar method of a specific foo object, will be only called by at most one thread at any point in time.
I'd like to add a very lightweight verification_mutex to verify this / debug synchronization abuses. It will be used similarly to a regular mutex:
class foo {
public:
void bar() {
std::lock_guard<verification_mutex> lk{m};
...
}
private:
mutable verification_mutex m;
};
however, it will not in itself necessarily lock or unlock anything. Rather, it will just throw if multithreaded simultaneous access is detected. The point is to make its runtime footprint as low as possible (including its effect on other code, e.g., through memory barriers).
Here are three options for implementing verification_mutex:
A wrapper around std::mutex, but with lock implemented by a check that trylock succeeded (this is just to get the idea; clearly not very fast)
An atomic variable noting the current "locking" thread id, with atomic exchange operations (see implementation sketch below).
Same as 2, but without atomics.
Are these correct or incorrect (in particular, 2 and esp. 3)? How will they affect performance (esp. of surrounding code)? Is there an altogether superior alternative?
Edit The answer by #SergeyA below is fine, but I'm in particular curious about the memory barriers. A solution not utilizing them would be great, as would be an answer giving some intuitive explanation why any solution omitting them would necessarily fail.
Implementation Sketch
#include <atomic>
#include <thread>
#include <functional>
class verification_mutex {
public:
verification_mutex() : m_holder{0}{}
void lock() {
if(m_holder.exchange(get_this_thread_id()) != 0)
throw std::logic_error("lock");
}
void unlock() {
if(m_holder.exchange(0) != get_this_thread_id())
throw std::logic_error("unlock");
}
bool try_lock() {
lock();
return true;
}
private:
static inline std::size_t get_this_thread_id() {
return std::hash<std::thread::id>()(std::this_thread::get_id());
}
private:
std::atomic_size_t m_holder;
};
Option 3 is not viable. You need a memory barrier when reading/writing a variable from multiple threads.
Of all options, atomic boolean variable would be the fastest, since it won't require context switches (mutexes might). Something like that:
class verifying_mutex {
std::atomic<bool> locked{false};
public:
bool lock() {
if (!locked.compare_exchange_strong(false, true))
throw std::runtime_error("Incorrect mt-access pattern");
}
bool unlock() {
locked = false;
}
};
On a side note, your original version of lock used thread_id, which would slow you down unnecessary. Do not do this.

How to join a thread in Linux kernel?

The main question is: How we can wait for a thread in Linux kernel to complete? I have seen a few post concerned about proper way of handling threads in Linux kernel but i'm not sure how we can wait for a single thread in the main thread to be completed (suppose we need the thread[3] be done then proceed):
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/kthread.h>
#include <linux/slab.h>
void *func(void *arg) {
// doing something
return NULL;
}
int init_module(void) {
struct task_struct* thread[5];
int i;
for(i=0; i<5; i++) {
thread[i] = kthread_run(func, (void*) arg, "Creating thread");
}
return 0;
}
void cleanup_module(void) {
printk("cleaning up!\n");
}
AFAIK there is no equivalent of pthread_join() in kernel. Also, I feel like your pattern (of starting bunch of threads and waiting only for one of them) is not really common in kernel. That being said, there kernel does have few synchronization mechanism that may be used to accomplish your goal.
Note that those mechanisms will not guarantee that the thread finished, they will only let main thread know that they finished doing the work they were supposed to do. It may still take some time to really stop this tread and free all resources.
Semaphores
You can create a locked semaphore, then call down in your main thread. This will put it to sleep. Then you will up this semaphore inside of your thread just before exiting. Something like:
struct semaphore sem;
int func(void *arg) {
struct semaphore *sem = (struct semaphore*)arg; // you could use global instead
// do something
up(sem);
return 0;
}
int init_module(void) {
// some initialization
init_MUTEX_LOCKED(&sem);
kthread_run(&func, (void*) &sem, "Creating thread");
down(&sem); // this will block until thread runs up()
}
This should work but is not the most optimal solution. I mention this as it's a known pattern that is also used in userspace. Semaphores in kernel are designed for cases where it's mostly available and this case has high contention. So a similar mechanism optimized for this case was created.
Completions
You can declare completions using:
struct completion comp;
init_completion(&comp);
or:
DECLARE_COMPLETION(comp);
Then you can use wait_for_completion(&comp); instead of down() to wait in main thread and complete(&comp); instead of up() in your thread.
Here's the full example:
DECLARE_COMPLETION(comp);
struct my_data {
int id;
struct completion *comp;
};
int func(void *arg) {
struct my_data *data = (struct my_data*)arg;
// doing something
if (data->id == 3)
complete(data->comp);
return 0;
}
int init_module(void) {
struct my_data *data[] = kmalloc(sizeof(struct my_data)*N, GFP_KERNEL);
// some initialization
for (int i=0; i<N; i++) {
data[i]->comp = &comp;
data[i]->id = i;
kthread_run(func, (void*) data[i], "my_thread%d", i);
}
wait_for_completion(&comp); // this will block until some thread runs complete()
}
Multiple threads
I don't really see why you would start 5 identical threads and only want to wait for 3rd one but of course you could send different data to each thread, with a field describing it's id, and then call up or complete only if this id equals 3. That's shown in the completion example. There are other ways to do this, this is just one of them.
Word of caution
Go read some more about those mechanisms before using any of them. There are some important details I did not write about here. Also those examples are simplified and not tested, they are here just to show the overall idea.
kthread_stop() is a kernel's way for wait thread to end.
Aside from waiting, kthread_stop() also sets should_stop flag for waited thread and wake up it, if needed. It is usefull for threads which repeat some actions infinitely.
As for single-shot tasks, it is usually simpler to use works for them, instead of kthreads.
EDIT:
Note: kthread_stop() can be called only when kthread(task_struct) structure is not freed.
Either thread function should return only after it found kthread_should_stop() return true, or get_task_struct() should be called before start thread (and put_task_struct() should be called after kthread_stop()).

Signals in Linux

I am studying signals from oreilly book. I came across this.
#include <signal.h>
typedef void (*sighandler_t)(int);----> func ptr returns void. uses typedef
sighandler_t signal (int signo, sighandler_t handler);
Later on in code. He just uses
void sigint_handler (int signo)----> normal function returning void
{
}
can typedef be applied on functions
I want to know how it works
can typedef be applied on functions
Yes.....
I want to know how it works
As the example you have read - the syntax is rather obscure (after 25 years of C I still have to think about it), but it is quite straight forward. Passing and storing pointers to functions is greatly simplified if you use typedefs.
I suggest either take a detour and learn about pointers to functions and typedefs of them, or take it as read for now and return to pointers to function later, as you cannot be a C programmer and avoid them.
A signal is just like a interrupt, when it is generated by user level, a call is made to the kernel of the OS and it will action accordingly. To create a signal, here I just show you an example
#include<stdio.h>
#include<signal.h>
#include<sys/types.h>
void sig_handler1(int num)
{
printf("You are here becoz of signal:%d\n",num);
signal(SIGQUIT,SIG_DFL);
}
void sig_handler(int num)
{
printf("\nHi! You are here becz of signal:%d\n",num);
}
int main()
{
signal(SIGINT,sig_handler1);
signal(SIGQUIT,sig_handler);
while(1)
{
printf("Hello\n");
sleep(2);
}
}
after running this code if you will press Ctrl+C then a message will show - "You are here becoz of signal:2" instead of quiting a process as we have changed a signal according to our action. As, Ctrl+C is a maskable signal.
To know more anbout signals and types of signals with examples please follow the link :
http://www.firmcodes.com/signals-in-linux/

Resources