How to move/swap a std::vector efficiently and thread safe? - multithreading

Imagine a thread which continuously writes to a vector of strings which is being collected every now and then by another thread (see code).
#include <string>
#include <vector>
#include <chrono>
#include <thread>
#include <iostream>
#include <cassert>
// some public vector being filled by one and consumed by another
// thread
static std::vector<std::string> buffer;
// continuously writes data to buffer (has to be fast)
static const auto filler(std::thread([] {
for (size_t i = 0;; ++i) {
buffer.push_back(std::to_string(i));
}
}));
// returns collected data and clears the buffer being written to
std::vector<std::string> fetch() {
return std::move(buffer);
}
// continuously fetch buffered data and process it (can be slow)
int main() {
size_t expected{};
for(;;) {
std::this_thread::sleep_for(std::chrono::seconds(1));
const auto fetched(fetch());
for (auto && e : fetched) {
size_t read(std::stoi(e));
std::cout << read << " " << expected << std::endl;
assert(read == expected);
++expected;
}
}
}
The provided example generally does what I want it to do but it crashes because it's not thread safe. Obvious approaches would be
to secure the shared vector using a lock_guard
using two buffers and an atomic pointer
using a thread safe vector implementation.
The provided scenario seems very simple to me. I don't think I need a thread safe vector because that would cover a lot more scenarios at the cost of performance.
Using a mutex or swapping between two instances of the vector seem plausible to me but I wonder if there is some solution specially made to 'atomically grab all data and leave an empty container'.
Maybe there's an obvious solution and it's just time to go to bed for me?
Important note: this question is somewhat academical since performance is not (necessarily) a real issue here. The provided example gets throttled by about 15% but there is hardly any 'real' work being done. I think in a real world example the benefit would be about 2-5%

First of all I would not recommend to have a non-const static variable. So I propose to encapsulate vector with a class with the following interface
class ValuesHolder
{
public:
void push_back(std::string value);
std::vector<std::string> take();
};
The second note about 'atomically grab all data and leave an empty container' - you could make this trick with swapping pointers but the main issue is that push_back should be in a sync with it (during the push_back is executed vector shouldn't be moved). Otherwise there may be issues with the following workflow
Thead 1 Thread 2
auto values = holder.take(); // push_back starts before take
for (const auto& value : values) // but value is inserted during the iteration
{...}
So the first option is just to lock during both calls:
class ValuesHolder
{
public:
void push_back(std::string value)
{
std::lock_guard<std::mutex> lock(mut);
values.push_back(std::move(value));
}
std::vector<std::string> take()
{
std::lock_guard<std::mutex> lock(mut);
return std::move(values);
}
private:
std::mutex mut;
std::vector<std::string> values;
};
Otherwise you could switch from std::vector to lock-free stack container. However the performance should be accurately measured since the number of allocations can increase, so the performance can be worser.

Related

An Efficient Non-Enforcing, Verifying, Mutex

Class foo has a method bar. According to some synchronization protocol, the bar method of a specific foo object, will be only called by at most one thread at any point in time.
I'd like to add a very lightweight verification_mutex to verify this / debug synchronization abuses. It will be used similarly to a regular mutex:
class foo {
public:
void bar() {
std::lock_guard<verification_mutex> lk{m};
...
}
private:
mutable verification_mutex m;
};
however, it will not in itself necessarily lock or unlock anything. Rather, it will just throw if multithreaded simultaneous access is detected. The point is to make its runtime footprint as low as possible (including its effect on other code, e.g., through memory barriers).
Here are three options for implementing verification_mutex:
A wrapper around std::mutex, but with lock implemented by a check that trylock succeeded (this is just to get the idea; clearly not very fast)
An atomic variable noting the current "locking" thread id, with atomic exchange operations (see implementation sketch below).
Same as 2, but without atomics.
Are these correct or incorrect (in particular, 2 and esp. 3)? How will they affect performance (esp. of surrounding code)? Is there an altogether superior alternative?
Edit The answer by #SergeyA below is fine, but I'm in particular curious about the memory barriers. A solution not utilizing them would be great, as would be an answer giving some intuitive explanation why any solution omitting them would necessarily fail.
Implementation Sketch
#include <atomic>
#include <thread>
#include <functional>
class verification_mutex {
public:
verification_mutex() : m_holder{0}{}
void lock() {
if(m_holder.exchange(get_this_thread_id()) != 0)
throw std::logic_error("lock");
}
void unlock() {
if(m_holder.exchange(0) != get_this_thread_id())
throw std::logic_error("unlock");
}
bool try_lock() {
lock();
return true;
}
private:
static inline std::size_t get_this_thread_id() {
return std::hash<std::thread::id>()(std::this_thread::get_id());
}
private:
std::atomic_size_t m_holder;
};
Option 3 is not viable. You need a memory barrier when reading/writing a variable from multiple threads.
Of all options, atomic boolean variable would be the fastest, since it won't require context switches (mutexes might). Something like that:
class verifying_mutex {
std::atomic<bool> locked{false};
public:
bool lock() {
if (!locked.compare_exchange_strong(false, true))
throw std::runtime_error("Incorrect mt-access pattern");
}
bool unlock() {
locked = false;
}
};
On a side note, your original version of lock used thread_id, which would slow you down unnecessary. Do not do this.

Danger of using std::sort with vector of shared_ptr in multi-threaded world

This is simplified version(C++11) of issue I am facing when I upgraded an app to multithreaded world. Essentially I have vector of shared_ptr and I am doing std::sort on it. When multiple threads try to sort it, I can understand, its dangerous as while sorting, first time, iterators may have to move around. But, here, I already have a sorted vector . Now calling, std::sort on it shouldn't impose any trouble(that's what I thought as nothing needs to move) but it's crashing, randomly.( now why I call std::sort on a sorted container, actually, in original code, data is unsorted, but that doesn't matter for end result it seems). Here is sample code
#include <iostream>
#include <thread>
#include <vector>
#include <boost/shared_ptr.hpp>
const int MAX = 4;
#define LOOP_COUNT 200
struct Container {
int priority;
Container(int priority_)
: priority( priority_)
{}
};
struct StrategySorter {
int operator()( const boost::shared_ptr<Container>& v1_,
const boost::shared_ptr<Container>& v2_ )
{
return v1_->priority > v2_->priority;
}
};
std::vector<boost::shared_ptr<Container>> _creators;
void func() {
for(int i=0; i < LOOP_COUNT; ++i) {
std::sort( _creators.begin(), _creators.end(), StrategySorter() );
}
}
int main()
{
int priority[] = {100, 245, 312, 423, 597, 656, 732 };
size_t size = sizeof(priority)/sizeof(int);
for(int i=0; i < size; ++i)
{
_creators.push_back(boost::shared_ptr<Container>(new Container(priority[i])));
}
std::thread t[MAX];
for(int i=0;i < MAX; i++)
{
t[i] = std::thread(func);
}
for(int i=0;i < MAX; i++)
{
t[i].join();
}
}
Error :
../boost_1_56_0/include/boost/smart_ptr/shared_ptr.hpp:648: typename boost::detail::sp_member_access::type boost::shared_ptr::operator->() const [with T = Container; typename boost::detail::sp_member_access::type = Container*]: Assertion `px != 0' failed.
Having raw pointers doesn't crash it, so it's specific to shared_ptr.
Protecting std::sort with mutex is preventing crash.
I am not able to understand why this scenario should result into inconsistent behavior.
When more than one thread accesses the same data without synchronisation and at least one of them is doing a modifying operation, it is a race condition and as such, Undefined Behaviour. Anything can happen.
std::sort requires mutable iterators to operate, so it is by definition a modifying operation, therefore applying it concurrently to overlapping ranges withough synchronisation is a race condition (and thus UB).
There is no guarantee that a sort that ends up not moving elements will not write.
It could want to move pivots around, sort some stuff backwards in an intermediate stage, or even call swap(a,a) without a self-check optimization (as the check might be more expensive than the swap).
In any case, an operation that if it doesn't do nothing is UB is a horrible operation to invoke.
Here is a sort guaranteed to do nothing if nothing is to be done:
template<class C, class Cmp>
void my_sort( C& c, Cmp cmp ) {
using std::begin; using std::end;
if (std::is_sorted( begin(c), end(c), cmp ))
return;
std::sort( begin(c), end(c), cmp );
}
but I wouldn't use it.

passing std::string to native thread

I need to pass binary data to a native thread.
I am using std::string to keep the binary data, I came up with an idea how to pass std::string to a native string, and want to know if it is safe.
#include <Windows.h>
#include <iostream>
#include <string>
using namespace std;
DWORD WINAPI MyThreadFunction(LPVOID lpParam)
{
string binaryDataInThread = string(*(string*)lpParam); // copy data to current thread
while (true)
{// do some stuff with binaryDataInThread
cout << "thread binaryData size: " << binaryDataInThread.size() << endl;
Sleep(1000);
}
return 0;
}
int main(int argc, char *argv[])
{
{
string binaryDataInMain;
for (int i = 0; i < 500; i++)
binaryDataInMain += (char)i;
CloseHandle(CreateThread(NULL, NULL, MyThreadFunction, &binaryDataInMain, NULL, NULL));
Sleep(1000); // wait for thread to copy data
} // destroy binaryDataInMain
system("pause");
ExitProcess(0);
}
the size of binaryDataInThread is 500, so all binary data has passed successfully. but is it safe?
No your method is not save, although it is unlikely to fail. Under odd circumstances, even the second you wait for the thread to copy the data is not enough.
Suggestions:
Use C++ threads.
If for some odd reason you can't use C++ threads, fix those reasons!
If you pass data to a thread that you want to avoid sharing, allocate the data dynamically using new. Of course, the thread will then be responsible for releasing the data using delete!
If you want to share data with a thread, synchronize access to it using the various sychronization primitives available (mutex, events etc).
As alternative, use atomic operations on primitive types. However, lock-free programming is hard and correct lock-free programming is even harder. This is not for beginners!

How to join a thread in Linux kernel?

The main question is: How we can wait for a thread in Linux kernel to complete? I have seen a few post concerned about proper way of handling threads in Linux kernel but i'm not sure how we can wait for a single thread in the main thread to be completed (suppose we need the thread[3] be done then proceed):
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/kthread.h>
#include <linux/slab.h>
void *func(void *arg) {
// doing something
return NULL;
}
int init_module(void) {
struct task_struct* thread[5];
int i;
for(i=0; i<5; i++) {
thread[i] = kthread_run(func, (void*) arg, "Creating thread");
}
return 0;
}
void cleanup_module(void) {
printk("cleaning up!\n");
}
AFAIK there is no equivalent of pthread_join() in kernel. Also, I feel like your pattern (of starting bunch of threads and waiting only for one of them) is not really common in kernel. That being said, there kernel does have few synchronization mechanism that may be used to accomplish your goal.
Note that those mechanisms will not guarantee that the thread finished, they will only let main thread know that they finished doing the work they were supposed to do. It may still take some time to really stop this tread and free all resources.
Semaphores
You can create a locked semaphore, then call down in your main thread. This will put it to sleep. Then you will up this semaphore inside of your thread just before exiting. Something like:
struct semaphore sem;
int func(void *arg) {
struct semaphore *sem = (struct semaphore*)arg; // you could use global instead
// do something
up(sem);
return 0;
}
int init_module(void) {
// some initialization
init_MUTEX_LOCKED(&sem);
kthread_run(&func, (void*) &sem, "Creating thread");
down(&sem); // this will block until thread runs up()
}
This should work but is not the most optimal solution. I mention this as it's a known pattern that is also used in userspace. Semaphores in kernel are designed for cases where it's mostly available and this case has high contention. So a similar mechanism optimized for this case was created.
Completions
You can declare completions using:
struct completion comp;
init_completion(&comp);
or:
DECLARE_COMPLETION(comp);
Then you can use wait_for_completion(&comp); instead of down() to wait in main thread and complete(&comp); instead of up() in your thread.
Here's the full example:
DECLARE_COMPLETION(comp);
struct my_data {
int id;
struct completion *comp;
};
int func(void *arg) {
struct my_data *data = (struct my_data*)arg;
// doing something
if (data->id == 3)
complete(data->comp);
return 0;
}
int init_module(void) {
struct my_data *data[] = kmalloc(sizeof(struct my_data)*N, GFP_KERNEL);
// some initialization
for (int i=0; i<N; i++) {
data[i]->comp = &comp;
data[i]->id = i;
kthread_run(func, (void*) data[i], "my_thread%d", i);
}
wait_for_completion(&comp); // this will block until some thread runs complete()
}
Multiple threads
I don't really see why you would start 5 identical threads and only want to wait for 3rd one but of course you could send different data to each thread, with a field describing it's id, and then call up or complete only if this id equals 3. That's shown in the completion example. There are other ways to do this, this is just one of them.
Word of caution
Go read some more about those mechanisms before using any of them. There are some important details I did not write about here. Also those examples are simplified and not tested, they are here just to show the overall idea.
kthread_stop() is a kernel's way for wait thread to end.
Aside from waiting, kthread_stop() also sets should_stop flag for waited thread and wake up it, if needed. It is usefull for threads which repeat some actions infinitely.
As for single-shot tasks, it is usually simpler to use works for them, instead of kthreads.
EDIT:
Note: kthread_stop() can be called only when kthread(task_struct) structure is not freed.
Either thread function should return only after it found kthread_should_stop() return true, or get_task_struct() should be called before start thread (and put_task_struct() should be called after kthread_stop()).

Looking for an optimum multithread message queue

I want to run several threads inside a process. I'm looking for the most efficient way of being able to pass messages between the threads.
Each thread would have a shared memory input message buffer. Other threads would write the appropriate buffer.
Messages would have priority. I want to manage this process myself.
Without getting into expensive locking or synchronizing, what's the best way to do this? Or is there already a well proven library available for this? (Delphi, C, or C# is fine).
This is hard to get right without repeating a lot of mistakes other people already made for you :)
Take a look at Intel Threading Building Blocks - the library has several well-designed queue templates (and other collections) that you can test and see which suits your purpose best.
If you are going to work with multiple threads, it is hard to avoid synchronisation. Fortunately it is not very hard.
For a single process, a Critical Section is frequently the best choice. It is fast and easy to use. For simplicity, I normally wrap it in a class to handle initialisation and cleanup.
#include <Windows.h>
class CTkCritSec
{
public:
CTkCritSec(void)
{
::InitializeCriticalSection(&m_critSec);
}
~CTkCritSec(void)
{
::DeleteCriticalSection(&m_critSec);
}
void Lock()
{
::EnterCriticalSection(&m_critSec);
}
void Unlock()
{
::LeaveCriticalSection(&m_critSec);
}
private:
CRITICAL_SECTION m_critSec;
};
You can make it even simpler using an "autolock" class you lock/unlock it.
class CTkAutoLock
{
public:
CTkAutoLock(CTkCritSec &lock)
: m_lock(lock)
{
m_lock.Lock();
}
virtual ~CTkAutoLock()
{
m_lock.Unlock();
}
private:
CTkCritSec &m_lock;
};
Anywhere you want to lock something, instantiate an autolock. When the function finishes, it will unlock. Also, if there is an exception, it will automatically unlock (giving exception safety).
Now you can make a simple message queue out of an std priority queue
#include <queue>
#include <deque>
#include <functional>
#include <string>
struct CMsg
{
CMsg(const std::string &s, int n=1)
: sText(s), nPriority(n)
{
}
int nPriority;
std::string sText;
struct Compare : public std::binary_function<bool, const CMsg *, const CMsg *>
{
bool operator () (const CMsg *p0, const CMsg *p1)
{
return p0->nPriority < p1->nPriority;
}
};
};
class CMsgQueue :
private std::priority_queue<CMsg *, std::deque<CMsg *>, CMsg::Compare >
{
public:
void Push(CMsg *pJob)
{
CTkAutoLock lk(m_critSec);
push(pJob);
}
CMsg *Pop()
{
CTkAutoLock lk(m_critSec);
CMsg *pJob(NULL);
if (!Empty())
{
pJob = top();
pop();
}
return pJob;
}
bool Empty()
{
CTkAutoLock lk(m_critSec);
return empty();
}
private:
CTkCritSec m_critSec;
};
The content of CMsg can be anything you like. Note that the CMsgQue inherits privately from std::priority_queue. That prevents raw access to the queue without going through our (synchronised) methods.
Assign a queue like this to each thread and you are on your way.
Disclaimer The code here was slapped together quickly to illustrate a point. It probably has errors and needs review and testing before being used in production.

Resources