I am writing a heavy multi threaded [>170 threads] c++11 program. Each thread is logging information into one file used by all threads. For performance reasons I want to create a log thread which is writing the information via fprintf() into the global file. I have no idea how to organize the structure into which the worker threads are writing the information which can be then read by the log thread.
Why do I not call sprintf() in each worker thread and then just provide the output buffer to the log thread? For the formatted output into the log file I am using a locale in the fprintf() functions which is different than in the rest of the thread. Therefore I would have to switch and lock/guard permanently the xprintf() calls in order to differ the locale output.
In the log thread I have one locale setting used for the whole output while the worker threads have their locale version.
Another reason for the log thread is that I have to "group" the output otherwise the information from each worker thread would not be in a block:
Wrong:
Information A Thread #1
Information A Thread #2
Information B Thread #1
Information B Thread #2
Correct:
Information A Thread #1
Information B Thread #1
Information A Thread #2
Information B Thread #2
In order to achieve this grouping I have to guard the output in each worker thread which is slowing the thread execution time.
How can I save the va_list into a structure that way it can be read by the log thread and passed back to fprintf()?
I don't see how this would be done easily using the legacy C vprintf with va_lists. As you want to pass things around between threads, sooner or later you will need to use the heap in some way.
Below is a solution that uses Boost.Format for the formatting and Boost.Variant for parameter passing. The example is complete and working if you concatenate the following code blocks in order. If you compile with GCC, you need to pass the -pthread linker flag. And of course, you'll also need the two Boost libraries which are header-only, however. Here are the headers we will use.
#include <condition_variable>
#include <iostream>
#include <list>
#include <locale>
#include <mutex>
#include <random>
#include <string>
#include <thread>
#include <utility>
#include <vector>
#include <boost/format.hpp>
#include <boost/variant.hpp>
At first, we need some mechanism to asynchronously execute some tasks, in this case, print our logging messages. Since the concept is general, I use an “abstract” base class Spooler for this. Its code is based on Herb Sutter's talk “Lock-Free Programming (or, Juggling Razor Blades)” on CppCon 2014 (part 1, part 2). I'm not going into detail about this code because it is mostly scaffolding not directly related to your question and I assume you already have this piece of functionality in place. My Spooler uses a std::list protected by a std::mutex as a task queue. It might be worthwhile to consider using a lock-free data structure instead.
class Spooler
{
private:
bool done_ {};
std::list<std::function<void(void)>> queue_ {};
std::mutex mutex_ {};
std::condition_variable condvar_ {};
std::thread worker_ {};
public:
Spooler() : worker_ {[this](){ work(); }}
{
}
~Spooler()
{
auto poison = [this](){ done_ = true; };
this->submit(std::move(poison));
if (this->worker_.joinable())
this->worker_.join();
}
protected:
void
submit(std::function<void(void)> task)
{
// This is basically a push_back but avoids potentially blocking
// calls while in the critical section.
decltype(this->queue_) tmp {std::move(task)};
{
std::unique_lock<std::mutex> lck {this->mutex_};
this->queue_.splice(this->queue_.cend(), tmp);
}
this->condvar_.notify_all();
}
private:
void
work()
{
do
{
std::unique_lock<std::mutex> lck {this->mutex_};
while (this->queue_.empty())
this->condvar_.wait(lck);
const auto task = std::move(this->queue_.front());
this->queue_.pop_front();
lck.unlock();
task();
}
while (!this->done_);
}
};
From the Spooler, we now derive a Logger that (privately) inherits its asynchronous capabilities from the Spooler and adds the logging specific functionality. It has only one function member called log that takes as parameters a format string and zero or more arguments to format into it as a std::vector of boost::variants.
Unfortunately, this limits us to a fixed number of types we can support but that shouldn't be a large problem since the C printf doesn't support arbitrary types either. For the sake of this example, I'm only using int and double but you can extend the list with std::strings, void * pointers or what have you.
The log function constructs a lambda expression that creates a boost::format object, feeds it all the arguments and then writes it to std::log or wherever you want the formatted message to go.
The constructor of boost::format has an overload that accepts the format string and a locale. You might be interested in this one since you have mentioned setting a custom locale in the comments. The usual constructor only takes a single argument, the format string.
Note how all formatting and outputting is done on the spooler's thread.
class Logger : Spooler
{
public:
void
log(const std::string& fmt,
const std::vector<boost::variant<int, double>>& args)
{
auto task = [fmt, args](){
boost::format msg {fmt, std::locale {"C"}}; // your locale here
for (const auto& arg : args)
msg % arg; // feed the next argument
std::clog << msg << std::endl; // print the formatted message
};
this->submit(std::move(task));
}
};
This is all it takes. We can now use the Logger like in this example. It is important that all worker threads are join() ed before the Logger is destructed or it won't process all messages.
int
main()
{
Logger logger {};
std::vector<std::thread> threads {};
std::random_device rnddev {};
for (int i = 0; i < 4; ++i)
{
const auto seed = rnddev();
auto task = [&logger, i, seed](){
std::default_random_engine rndeng {seed};
std::uniform_real_distribution<double> rnddist {0.0, 0.5};
for (double p = 0.0; p < 1.0; p += rnddist(rndeng))
logger.log("thread #%d is %6.2f %% done", {i, 100.0 * p});
logger.log("thread #%d has completed its work", {i});
};
threads.emplace_back(std::move(task));
}
for (auto& thread : threads)
thread.join();
}
Possible output:
thread #1 is 0.00 % done
thread #0 is 0.00 % done
thread #0 is 26.84 % done
thread #0 is 76.15 % done
thread #3 is 0.00 % done
thread #0 has completed its work
thread #3 is 34.70 % done
thread #3 is 78.92 % done
thread #3 is 91.89 % done
thread #3 has completed its work
thread #1 is 26.98 % done
thread #1 is 73.84 % done
thread #1 has completed its work
thread #2 is 0.00 % done
thread #2 is 10.17 % done
thread #2 is 29.85 % done
thread #2 is 79.03 % done
thread #2 has completed its work
Related
Have a query on timeout calling and GMainContext. It is really confusing to me
Suppose I have the codes below (a bit incomplete, just for demonstration). I use normal Pthreads to create a thread. Within the thread, I run Glib functionality and created a GMainContext (stored within l_app.context).
I then created a source to run the function check_cmd iteratively at about 1 sec interval. This callback (or could we call it a thread?) will check for command from other threads( Pthreads not shown here for update in cmd status). From here onwards, there are two specific command
One to start a looping function
The other to end the looping function
I have done and thought of two ways to create the function and set them to run iteratively.
To create another timeout
using the same method of creating check_cmd
Essentially both to me are pretty much essentially the same method, when I tried both of them. Plan A (as I called it) does not work but Plan B ...actually run at least once. So I would like to know how to fix them...
Or maybe I should use g_source_add_child_source() instead?
In Summary, my question is
when you created a new context and push it to become the default context, do all subsequent function that require main_context will refer to this context?
in a nut shell, how do you add new sources when a loop is already running, ie like my cases
lastly, it is okay to quit the main loop within the callback you have created?
Here is my pseudocode
#include <glib.h>
#include <dirent.h>
#include <errno.h>
#include <pthread.h>
#define PLAN_A 0
typedef struct
{
GMainContext *context;
GMainLoop *loop;
}_App;
static _App l_app;
guint gID;
gboolean
time_cycle(gpointer udata)
{
g_print("I AM THREADING");
return true;
}
gboolean
check_cmd_session(NULL )
{
while(alive) /// alive is a boolean value that is shared with other threads(not shown)
{
if(start)
{
/// PLAN A
//// which context does this add to ??
#if PLAN_A
g_timeout_add_seconds(10, (GSourceFunc)timeout, NULL);
#else
/// or should i use PLAN B
GSource* source = g_timeout_source_new(1000);
gID = g_source_set_callback(source,
(GSourceFunc)time_cycle,
NULL,
NULL);
g_source_attach(source, l_app.context);
#endif
}
else
{
#if PLAN_A
g_source_remove(gID);
#else
}
}
g_main_loop_quit (l_app.loop);
return FALSE;
}
void*
liveService(Info *info)
{
l_app.context = g_main_context_new ();
g_main_context_push_thread_default(l_app.context);
GSource* source = g_timeout_source_new(1000);
g_source_set_callback(source,
(GSourceFunc)check_cmd_session,
NULL,
NULL);
/// make it run
g_source_attach(source, l_app.context);
g_main_loop_run (l_app.loop);
pthread_exit(NULL);
}
int main()
{
pthread_t tid[2];
int thread_counter = 0;
err = pthread_create(&(tid[thread_counter]), NULL, &live, &info);
if (err != 0)
{
printf("\n can't create live thread :[%s]", strerror(err));
}
else
{
printf("--> Thread for Live created successfully\n");
thread_counter++;
}
/**** other threads are build not shown here */
for(int i = 0; i < 2; i++)
{
printf("Joining the %d threads \n", i);
pthread_join(tid[i],NULL);
}
return 0;
}
In Summary, my question is
when you created a new context and push it to become the default context, do all subsequent function that require main_context will
refer to this context?
Functions that are documented as using the thread-default main context will use the GMainContext which has been most recently pushed with g_main_context_push_thread_default().
Functions that are documented as using the global default main context will not. They will use the GMainContext which is created at init time and which is associated with the main thread.
g_timeout_add_seconds() is documented as using the global default main context. So you need to go with plan B if you want the timeout source to be attached to a specific GMainContext.
in a nut shell, how do you add new sources when a loop is already running, ie like my cases
g_source_attach() works when a main context is being iterated.
lastly, it is okay to quit the main loop within the callback you have created?
Yes, g_main_loop_quit() can be called at any point.
From your code, it looks like you’re not creating a new GMainLoop for each GMainContext and are instead assuming that one GMainLoop will somehow work with all GMainContexts in the process. That’s not correct. If you’re going to use GMainLoop, you need to create a new one for each GMainContext you create.
All other things aside, you might find it easier to use GLib’s threading functions rather than using pthread directly. GLib’s threading functions are portable to other platforms and a little bit easier to use. Given that you’re already linking to libglib, using them would cost nothing extra.
I got a question while I'm doing for ray-tracing stuff.
I have created multiple threads to split whole image to be processed and let it process its allocated task. Threads work well as it is intended. I would like to monitor the work progress in real-time.
To resolve this problem, I have created one more thread to monitor current state.
Here is the monitoring pseudo-code:
/* Global var */
int cnt = 0; // count the number of row processed
void* render_disp(void* arg){ // thread for monitoring current render-processing
/* monitoring global variable and calculate percentage to display */
double result = 100.*cnt/(h-1);
fprintf(stderr,"\r3.2%f%% of image is processed!", result);
}
void* process(void* arg){ // multiple threads work here
// Rendering process
for(........)
pthread_mutex_lock(&lock);
cnt++;
pthread_mutex_unlock(&lock);
for(........)
}
I wrote the code for initialization of pthread and mutex in main() function.
Basically, I think this monitoring thread should display current state but this thread seems to be called only once and quit.
How do I change this code to this thread function to be called until the whole rendering is finished?
I wrote a simple multi-threaded application in OMNET++ that does not call any OMNET++ API in the working thread and is working as expected. I know that OMNET++ does not support multi-thread applications by design, but I was wondering if there is any mechanism that I can use to make a bridge between my worker thread and my code in the main simulation thread.
More specifically, I am saving some data in a vector in the working thread and I want to signal the code in the simulation thread to consume it (producer/consumer scenario). Is there any way to achieve this?
Do I need to design my own event scheduler?
METHOD 1
The simplest way to achieve your goal is to use a selfmessage in simulation thread and a small modification of worker thread. The worker thread should modify a common variable (visible by both threads). And the selfmessage should periodically check the state of this variable.
The sample code of this idea:
// common variable
bool vectorReady;
// worker thread
if (someCondition) {
vectorReady = true;
}
// simulation thread
void someclass::handleMessage(cMessage * msg) {
if (msg->isSelfMessage()) {
if (vectorReady) {
vectorReady = false;
// reads vector data
}
scheduleAt(simTime() + somePeriod, msg);
}
The place of declaration of common variable depends how you create and start the worker thread.
METHOD 2
The other way is to create own scheduler and adding a condition just before every event. By default OMNeT++ uses cSequentialScheduler scheduler. It has the method takeNextEvent() which is called to obtain next event. You can create a derived class and overwrite this method, for example:
// cThreadScheduler.h
#include <omnetpp.h>
using namespace omnetpp;
class cThreadScheduler : public cSequentialScheduler {
public:
virtual cEvent *takeNextEvent() override;
};
// cThreadScheduler.cc
#include "cThreadScheduler.h"
Register_Class(cThreadScheduler);
cEvent* cThreadScheduler::takeNextEvent() {
if (vectorReady) {
vectorReady = false;
// reads vector data
}
return cSequentialScheduler::takeNextEvent();
}
In omnetpp.ini add a line:
scheduler-class = "cThreadScheduler"
I have N threads performing various task and these threads must be regularly synchronized with a thread barrier as illustrated below with 3 thread and 8 tasks. The || indicates the temporal barrier, all threads have to wait until the completion of 8 tasks before starting again.
Thread#1 |----task1--|---task6---|---wait-----||-taskB--| ...
Thread#2 |--task2--|---task5--|-------taskE---||----taskA--| ...
Thread#3 |-task3-|---task4--|-taskG--|--wait--||-taskC-|---taskD ...
I couldn’t find a workable solution, thought the little book of Semaphores http://greenteapress.com/semaphores/index.html was inspiring. I came up with a solution using std::atomic shown below which “seems” to be working using three std::atomic.
I am worried about my code breaking down on corner cases hence the quoted verb. So can you share advise on verification of such code? Do you have a simpler fool proof code available?
std::atomic<int> barrier1(0);
std::atomic<int> barrier2(0);
std::atomic<int> barrier3(0);
void my_thread()
{
while(1) {
// pop task from queue
...
// and execute task
switch(task.id()) {
case TaskID::Barrier:
barrier2.store(0);
barrier1++;
while (barrier1.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier3.store(0);
barrier2++;
while (barrier2.load() != NUM_THREAD) {
std::this_thread::yield();
}
barrier1.store(0);
barrier3++;
while (barrier3.load() != NUM_THREAD) {
std::this_thread::yield();
}
break;
case TaskID::Task1:
...
}
}
}
Boost offers a barrier implementation as an extension to the C++11 standard thread library. If using Boost is an option, you should look no further than that.
If you have to rely on standard library facilities, you can roll your own implementation based on std::mutex and std::condition_variable without too much of a hassle.
class Barrier {
int wait_count;
int const target_wait_count;
std::mutex mtx;
std::condition_variable cond_var;
Barrier(int threads_to_wait_for)
: wait_count(0), target_wait_count(threads_to_wait_for) {}
void wait() {
std::unique_lock<std::mutex> lk(mtx);
++wait_count;
if(wait_count != target_wait_count) {
// not all threads have arrived yet; go to sleep until they do
cond_var.wait(lk,
[this]() { return wait_count == target_wait_count; });
} else {
// we are the last thread to arrive; wake the others and go on
cond_var.notify_all();
}
// note that if you want to reuse the barrier, you will have to
// reset wait_count to 0 now before calling wait again
// if you do this, be aware that the reset must be synchronized with
// threads that are still stuck in the wait
}
};
This implementation has the advantage over your atomics-based solution that threads waiting in condition_variable::wait should get send to sleep by your operating system's scheduler, so you don't block CPU cores by having waiting threads spin on the barrier.
A few words on resetting the barrier: The simplest solution is to just have a separate reset() method and have the user ensure that reset and wait are never invoked concurrently. But in many use cases, this is not easy to achieve for the user.
For a self-resetting barrier, you have to consider races on the wait count: If the wait count is reset before the last thread returned from wait, some threads might get stuck in the barrier. A clever solution here is to not have the terminating condition depend on the wait count variable itself. Instead you introduce a second counter, that is only increased by the thread calling the notify. The other threads then observe that counter for changes to determine whether to exit the wait:
void wait() {
std::unique_lock<std::mutex> lk(mtx);
unsigned int const current_wait_cycle = m_inter_wait_count;
++wait_count;
if(wait_count != target_wait_count) {
// wait condition must not depend on wait_count
cond_var.wait(lk,
[this, current_wait_cycle]() {
return m_inter_wait_count != current_wait_cycle;
});
} else {
// increasing the second counter allows waiting threads to exit
++m_inter_wait_count;
cond_var.notify_all();
}
}
This solution is correct under the (very reasonable) assumption that all threads leave the wait before the inter_wait_count overflows.
With atomic variables, using three of them for a barrier is simply overkill that only serves to complicate the issue. You know the number of threads, so you can simply atomically increment a single counter every time a thread enters the barrier, and then spin until the counter becomes greater or equal to N. Something like this:
void barrier(int N) {
static std::atomic<unsigned int> gCounter = 0;
gCounter++;
while((int)(gCounter - N) < 0) std::this_thread::yield();
}
If you don't have more threads than CPU cores and a short expected waiting time, you might want to remove the call to std::this_thread::yield(). This call is likely to be really expensive (more than a microsecond, I'd wager, but I haven't measured it). Depending on the size of your tasks, this may be significant.
If you want to do repeated barriers, just increment the N as you go:
unsigned int lastBarrier = 0;
while(1) {
switch(task.id()) {
case TaskID::Barrier:
barrier(lastBarrier += processCount);
break;
}
}
I would like to point out that in the solution given by #ComicSansMS ,
wait_count should be reset to 0 before executing cond_var.notify_all();
This is because when the barrier is called a second time the if condition will always fail, if wait_count is not reset to 0.
I have a problem in understanding how the winapi condition variables work.
On the more specific side, what I want is a couple of threads waiting on some condition. Then I want to use the WakeAllConditionVariable() call to wake up all the threads so that they can do work. Besides the fact that i just want the threads started, there isn't any other prerequisite for them to start working ( like you would have in an n producer / n consumer scenario ).
Here's the code so far:
#define MAX_THREADS 4
CONDITION_VARIABLE start_condition;
SRWLOCK cond_rwlock;
bool wake_all;
__int64 start_times[MAX_THREADS];
Main thread:
int main()
{
HANDLE h_threads[ MAX_THREADS ];
int tc;
for (tc = 0; tc < MAX_THREADS; tc++)
{
DWORD tid;
h_threads[tc] = CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)thread_routine,(void*)tc,0,&tid);
if( h_threads[tc] == NULL )
{
cout << "Error while creating thread with index " << tc << endl;
continue;
}
}
InitializeSRWLock( &cond_rwlock );
InitializeConditionVariable( &start_condition );
AcquireSRWLockExclusive( &cond_rwlock );
// set the flag to true, then wake all threads
wake_all = true;
WakeAllConditionVariable( &start_condition );
ReleaseSRWLockExclusive( &cond_rwlock );
WaitForMultipleObjects( tc, h_threads, TRUE, INFINITE );
return 0;
}
And here is the code for the thread routine:
DWORD thread_routine( PVOID p_param )
{
int t_index = (int)(p_param);
AcquireSRWLockShared( &cond_rwlock );
// main thread sets wake_all to true and calls WakeAllConditionVariable()
// so this thread should start doing the work (?)
while ( !wake_all )
SleepConditionVariableSRW( &start_condition,&cond_rwlock, INFINITE,CONDITION_VARIABLE_LOCKMODE_SHARED );
QueryPerformanceCounter((LARGE_INTEGER*)&start_times[t_index]);
// do the actual thread related work here
return 0;
}
This code does not do what i would expect it to do. Sometimes just one thread finishes the job, sometimes two or three, but never all of them. The main function never gets past the WaitForMultipleObjects() call.
I'm not exactly sure what I've done wrong, but I would assume some synchronization issue somewhere ?
Any help would be appreciated. (sorry if I re-posted older topic with different dressing :)
You initialize the cond_rwlock and start_condition variables too late. Move the code up, before you start the threads. A thread is likely to start running right away, especially on a multi-core machine.
And test the return values of api functions. You don't know why it doesn't work because you never check for failure.