Thread Communication - multithreading

Is there any tool available for tracing communication among threads;
1. running in a single process
2. running in different processes (IPC)

I am presuming you need to trace this for debugging. Under normal circumstances it's hard to do this, without custom written code. For a similar problem that I faced, I had a per-processor tracing buffer, which used to record briefly the time and interesting operation that was performed by the running thread. The log was a circular trace which used to store data like this:
struct trace_data {
int op;
void *data;
struct time t;
union {
struct {
int op1_field1;
int op1_field2;
} d1;
struct {
int op2_field1;
int op2_field2;
} d2
} u;
}
The trace log was an array of these structures of length 1024, one for each processor. Each thread used to trace operations, as well as time to determine causality of events. The fields which were used to store data in the "union" depended upon the operation being done. The "data" pointer's meaning depended upon the "op" as well. When the program used to crash, I'd open the core in gdb and I had a gdb script which would go through the logs in each processor and print out the ops and their corresponding data, to find out the history of events.
For different processes you could do such logging to a file instead - one per process. This example is in C, but you can do this in whatever language you want to use, as long as you can figure out the CPU id on which the thread is running currently.

You might be looking for something like the Intel Thread Checker as long as you're using pthreads in (1).
For communication between different processes (2), you can use Aspect-Oriented Programming (AOP) if you have the source code, or write your own wrapper for the IPC functions and LD_PRELOAD it.
Edit: Whoops, you said tracing, not checking.

It will depend so much on the operating system and development environment that you are using. If you're with Visual Studio, look at the tools in Visual Studio 2010.

Related

Process Vs Thread : Looking for best explanation with example c#

Apologized posting the above question here because i read few same kind of thread here but still things is not clear.
As we know that Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces. (quoted from this answer)
the above explanation is not enough to visualize the actual things. it will be better if anyone explain what is process with example and how it is different than thread with example.
suppose i start a MS-pain or any accounting program. can we say that accounting program is process ? i guess no. a accounting apps may have multiple process and each process can start multiple thread.
i want to visualize like which area can be called as process when we run any application. so please explain and guide me with example for better visualization and also explain how process and thread is not same. thanks
suppose i start a MS-pain or any accounting program. can we say that accounting program is process ?
Yes. Or rather the current running instance of it is.
i guess no. a accounting apps may have multiple process and each process can start multiple thread.
It is possible for a process to start another process, but relatively usual with windowed software.
The process is a given executable; a windowed application, a console application and a background application would all each involve a running process.
Process refers to the space in which the application runs. With a simple process like NotePad if you open it twice so that you have two NotePad windows open, you have two NotePad processes. (This is also true of more complicated processes, but note that some do their own work to keep things down to one, so e.g. if you have Firefox open and you run Firefox again there will briefly be two Firefox processes but the second one will tell the first to open a new window before exiting and the number of processes returns to one; having a single process makes communication within that application simpler for reasons we'll get to now).
Now each process will have to have at least one thread. This thread contains information about just what it is trying to do (typically in a stack, though that is certainly not the only possible approach). Consider this simple C# program:
static int DoAdd(int a, int b)
{
return a + b;
}
void Main()
{
int x = 2;
int y = 3;
int z = DoAdd(x, y);
Console.WriteLine(z);
}
With this simple program first 2 and 3 are stored in places in the stack (corresponding with the labels x and y). Then they are pushed onto the stack again and the thread moves to DoAdd. In DoAdd these are popped and added, and the result pushed to the stack. Then this is stored in the stack (corresponding with the labels z). Then that is pushed again and the thread moves to Console.WriteLine. That does its thing and the thread moves back to Main. Then it leaves and the thread dies. As the only foreground thread running its death leads to the process also ending.
(I'm simplifying here, and I don't think there's a need to nitpick all of those simplifications right now; I'm just presenting a reasonable mental model).
There can be more than one thread. For example:
static int DoAdd(int a, int b)
{
return a + b;
}
static void PrintTwoMore(object num)
{
Thread.Sleep(new Random().Next(0, 500));
Console.WriteLine(DoAdd(2, (int)num));
}
void Main()
{
for(int i = 0; i != 10; ++i)
new Thread(PrintTwoMore).Start(i);
}
Here the first thread creates ten more threads. Each of these pause for a different length of time (just to demonstrate that they are independent) and then do a similar task to the first example's only thread.
The first thread dies upon creating the 10th new thread and setting it going. The last of these 10 threads to be running will be the last foreground thread and so when it dies so does the process.
Each of these threads can "see" the same methods and can "see" any data that is stored in the application though there are limits on how likely they are to stamp over each other I won't get into now.
A process can also start a new process, and communicate with it. This is very common in command-line programs, but less so in windowed programs. In the case of Windowed programs its also more common on *nix than on Windows.
One example of this would be when Geany does a find-in-directory operation. Geany doesn't have its own find-in-directory functionality but rather runs the program grep and then interprets the results. So we start with one process (Geany) with its own threads running then one of those threads causes the grep program to run, which means we've also got a grep process running with its threads. Geany's threads and grep's threads cannot communicate to each other as easily as threads in the same process can, but when grep outputs results the thread in Geany can read that output and use that to display those results.

How do I Yield() to another thread in a Win8 C++/Xaml app?

Note: I'm using C++, not C#.
I have a bit of code that does some computation, and several bits of code that use the result. The bits that use the result are already in tasks, but the original computation is not -- it's actually in the callstack of the main thread's App::App() initialization.
Back in the olden days, I'd use:
while (!computationIsFinished())
std::this_thread::yield(); // or the like, depending on API
Yet this doesn't seem to exist for Windows Store apps (aka WinRT, pka Metro-style). I can't use a continuation because the bits that use the results are unconnected to where the original computation takes place -- in addition to that computation not being a task anyway.
Searching found Concurrency::Context::Yield(), but Context appears not to exist for Windows Store apps.
So... say I'm in a task on the background thread. How do I yield? Especially, how do I yield in a while loop?
First of all, doing expensive computations in a constructor is not usually a good idea. Even less so when it's the "App" class. Also, doing heavy work in the main (ASTA) thread is pretty much forbidden in the WinRT model.
You can use concurrency::task_completion_event<T> to interface code that isn't task-oriented with other pieces of dependent work.
E.g. in the long serial piece of code:
...
task_completion_event<ComputationResult> tce;
task<ComputationResult> computationTask(tce);
// This task is now tied to the completion event.
// Pass it along to interested parties.
try
{
auto result = DoExpensiveComputations();
// Successfully complete the task.
tce.set(result);
}
catch(...)
{
// On failure, propagate the exception to continuations.
tce.set_exception(std::current_exception());
}
...
Should work well, but again, I recommend breaking out the computation into a task of its own, and would probably start by not doing it during construction... surely an anti-pattern for a responsive UI. :)
Qt simply uses Sleep(0) in their WinRT yield implementation.

How do two or more threads share memory on the heap that they have allocated?

As the title says, how do two or more threads share memory on the heap that they have allocated? I've been thinking about it and I can't figure out how they can do it. Here is my understanding of the process, presumably I am wrong somewhere.
Any thread can add or remove a given number of bytes on the heap by making a system call which returns a pointer to this data, presumably by writing to a register which the thread can then copy to the stack.
So two threads A and B can allocate as much memory as they want. But I don't see how thread A could know where the memory that thread B has allocated is located. Nor do I know how either thread could know where the other thread's stack is located. Multi-threaded programs share the heap and, I believe, can access one another's stack but I can't figure out how.
I tried searching for this question but only found language specific versions that abstract away the details.
Edit:
I am trying not to be language or OS specific but I am using Linux and am looking at it from a low level perspective, assembly I guess.
My interpretation of your question: How can thread A get to know a pointer to the memory B is using? How can they exchange data?
Answer: They usually start with a common pointer to a common memory area. That allows them to exchange other data including pointers to other data with each other.
Example:
Main thread allocates some shared memory and stores its location in p
Main thread starts two worker threads, passing the pointer p to them
The workers can now use p and work on the data pointed to by p
And in a real language (C#) it looks like this:
//start function ThreadProc and pass someData to it
new Thread(ThreadProc).Start(someData)
Threads usually do not access each others stack. Everything starts from one pointer passed to the thread procedure.
Creating a thread is an OS function. It works like this:
The application calls the OS using the standard ABI/API
The OS allocates stack memory and internal data structures
The OS "forges" the first stack frame: It sets the instruction pointer to ThreadProc and "pushes" someData onto the stack. I say "forge" because this first stack frame does not arise naturally but is created by the OS artificially.
The OS schedules the thread. ThreadProc does not know it has been setup on a fresh stack. All it knows is that someData is at the usual stack position where it would expect it.
And that is how someData arrives in ThreadProc. This is the way the first, initial data item is shared. Steps 1-3 are executed synchronously by the parent thread. 4 happens on the child thread.
A really short answer from a bird's view (1000 miles above):
Threads are execution paths of the same process, and the heap actually belongs to the process (and as a result shared by the threads). Each threads just needs its own stack to function as a separate unit of work.
Threads can share memory on a heap if they both use the same heap. By default most languages/frameworks have a single default heap that code can use to allocate memory from the heap. In unmanaged languages you generally make explicit calls to allocate heap memory. In C, that might be malloc, etc. for example. In managed languages heap allocation is usually automatic and how allocation is done depends on the language--usually through the use of the new operator. but, that depends slightly on context. If you provide the OS or language context you're asking about, I might be able to provide more detail.
A Thread shared with other threads belonging to the same process: its code section, data section and other operating system resources such as open files and signals.
The part you are missing is static memory containing static variables.
This memory is allocated when the program is started, and assigned known adresses (determined at the linking time). All threads can access this memory without exchanging any data runtime, because the addresses are effectively hardcoded.
A simple example might look like this:
// Global variable.
std::atomic<int> common_var;
void thread1() {
common_var = compute_some_value();
}
void thread2() {
do_something();
int current_value = common_var;
do_more();
}
And of course the global value may be a pointer, that can be used to exchange heap memory. The producer allocates some objects, the consumer takes and uses them.
// Global variable.
std::atomic<bool> produced;
SomeData* data_pointer;
void producer_thread() {
while (true) {
if (!produced) {
SomeData* new_data = new SomeData();
data_pointer = new_data;
// Let the other thread know there is something to read.
produced = true;
}
}
}
void consumer_thread() {
while (true) {
if (produced) {
SomeData* my_data = data_pointer;
data_pointer = nullptr;
// Let the other thread know we took the data.
produced = false;
do_something_with(my_data);
delete my_data;
}
}
}
Please note: these are not examples of good concurrent code, but they show the general idea without too much clutter.

OpenGL Rendering in a secondary thread

I'm writing a 3D model viewer application as a hobby project, and also as a test platform to try out different rendering techniques. I'm using SDL to handle window management and events, and OpenGL for the 3D rendering. The first iteration of my program was single-threaded, and ran well enough. However, I noticed that the single-threaded program caused the system to become very sluggish/laggy. My solution was to move all of the rendering code into a different thread, thereby freeing the main thread to handle events and prevent the app from becoming unresponsive.
This solution worked intermittently, the program frequently crashed due to a changing (and to my mind bizarre) set of errors coming mainly from the X window system. This led me to question my initial assumption that as long as all of my OpenGL calls took place in the thread where the context was created, everything should still work out. After spending the better part of a day searching the internet for an answer, I am thoroughly stumped.
More succinctly: Is it possible to perform 3D rendering using OpenGL in a thread other than the main thread? Can I still use a cross-platform windowing library such as SDL or GLFW with this configuration? Is there a better way to do what I'm trying to do?
So far I've been developing on Linux (Ubuntu 11.04) using C++, although I am also comfortable with Java and Python if there is a solution that works better in those languages.
UPDATE: As requested, some clarifications:
When I say "The system becomes sluggish" I mean interacting with the desktop (dragging windows, interacting with the panel, etc) becomes much slower than normal. Moving my application's window takes time on the order of seconds, and other interactions are just slow enough to be annoying.
As for interference with a compositing window manager... I am using the GNOME shell that ships with Ubuntu 11.04 (staying away from Unity for now...) and I couldn't find any options to disable desktop effects such as there was in previous distributions. I assume this means I'm not using a compositing window manager...although I could be very wrong.
I believe the "X errors" are server errors due to the error messages I'm getting at the terminal. More details below.
The errors I get with the multi-threaded version of my app:
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0"
after 73 requests (73 known processed) with 0 events remaining.
X Error of failed request: BadColor (invalid Colormap parameter)
Major opcode of failed request: 79 (X_FreeColormap)
Resource id in failed request: 0x4600001
Serial number of failed request: 72
Current serial number in output stream: 73
Game: ../../src/xcb_io.c:140: dequeue_pending_request: Assertion `req == dpy->xcb->pending_requests' failed.
Aborted
I always get one of the three errors above, which one I get varies, apparently at random, which (to my eyes) would appear to confirm that my issue does in fact stem from my use of threads. Keep in mind that I'm learning as I go along, so there is a very good chance that in my ignorance I've something rather stupid along the way.
SOLUTION: For anyone who is having a similar issue, I solved my problem by moving my call to SDL_Init(SDL_INIT_VIDEO) to the rendering thread, and locking the context initialization using a mutex. This ensures that the context is created in the thread that will be using it, and it prevents the main loop from starting before initialization tasks have finished. A simplified outline of the startup procedure:
1) Main thread initializes struct which will be shared between the two threads, and which contains a mutex.
2) Main thread spawns render thread and sleeps for a brief period (1-5ms), giving the render thread time to lock the mutex. After this pause, the main thread blocks while trying to lock the mutex.
3) Render thread locks mutex, initializes SDL's video subsystem and creates OpenGL context.
4) Render thread unlocks mutex and enters its "render loop".
5) The main thread is no longer blocked, so it locks and unlocks the mutex before finishing its initialization step.
Be sure and read the answers and comments, there is a lot of useful information there.
As long as the OpenGL context is touched from only one thread at a time, you should not run into any problems. You said even your single threaded program made your system sluggish. Does that mean the whole system or only your own application? The worst that should happen in a single threaded OpenGL program is, that processing user inputs for that one program gets laggy but the rest of the system is not affected.
If you use some compositing window manager (Compiz, KDE4 kwin), please try out what happens if you disable all compositing effects.
When you say X errors do you mean client side errors, or errors reported in the X server log? The latter case should not happen, because any kind of kind of malformed X command stream the X server must be able to cope with and at most emit a warning. If it (the X server) crashes this is a bug and should reported to X.org.
If your program crashes, then there's something wrong in its interaction with X; in that case please provide us with the error output in its variations.
What I did in a similar situation was to keep my OpenGL calls in the main thread but move the vertex arrays preparation to a separate thread (or threads).
Basically, if you manage to separate the cpu intensive stuff from the OpenGL calls you don't have to worry about the unfortunately dubious OpenGL multithreading.
It worked out beautifully for me.
Just in case - the X-Server has its' own sync subsystem.
Try following while drawing:
man XInitThreads - for initialization
man XLockDisplay/XUnlockDisplay -- for drawing (not sure for events processing);
I was getting one of your errors:
../../src/xcb_io.c:140: dequeue_pending_request: Assertion `req ==
dpy->xcb->pending_requests' failed. Aborted
and a whole host of different ones as well. Turns out that SDL_PollEvent needs an a pointer with initialized memory. So this fails:
SDL_Event *event;
SDL_PollEvent(event);
while this works:
SDL_Event event;
SDL_PollEvent(&event);
In case anyone else runs across this from google.
This is half an answer and half a question.
Rendering in SDL in a separate thread is possible. It works usually on any OS. What you need to do is, that you make sure you make the GL context current when the render thread takes over. At the same time, before you do so, you need to release it from the main thread, e.g.:
Called from the main thread:
void Renderer::Init()
{
#ifdef _WIN32
m_CurrentContext = wglGetCurrentContext();
m_CurrentDC = wglGetCurrentDC();
// release current context
wglMakeCurrent( nullptr, nullptr );
#endif
#ifdef __linux__
if (!XInitThreads())
{
THROW( "XLib is not thread safe." );
}
SDL_SysWMinfo wm_info;
SDL_VERSION( &wm_info.version );
if ( SDL_GetWMInfo( &wm_info ) ) {
Display *display = wm_info.info.x11.gfxdisplay;
m_CurrentContext = glXGetCurrentContext();
ASSERT( m_CurrentContext, "Error! No current GL context!" );
glXMakeCurrent( display, None, nullptr );
XSync( display, false );
}
#endif
}
Called from the render thread:
void Renderer::InitGL()
{
// This is important! Our renderer runs its own render thread
// All
#ifdef _WIN32
wglMakeCurrent(m_CurrentDC,m_CurrentContext);
#endif
#ifdef __linux__
SDL_SysWMinfo wm_info;
SDL_VERSION( &wm_info.version );
if ( SDL_GetWMInfo( &wm_info ) ) {
Display *display = wm_info.info.x11.gfxdisplay;
Window window = wm_info.info.x11.window;
glXMakeCurrent( display, window, m_CurrentContext );
XSync( display, false );
}
#endif
// Init GLEW - we need this to use OGL extensions (e.g. for VBOs)
GLenum err = glewInit();
ASSERT( GLEW_OK == err, "Error: %s\n", glewGetErrorString(err) );
The risks here is, that SDL does not have a native MakeCurrent() function, unfortunately. So, we have to poke around a little in SDL internals (1.2, 1.3 might have solved this by now).
And one problem remains, that for some reason, I run into a problem when SDL is shutting down. Maybe someone can tell me how to safely release the context when the thread terminates.
C++, SDL, OpenGl:::
on main thread: SDL_CreateWindow( );
SDL_CreateSemaphore( );
SDL_SemWait( );
on renderThread: SDL_CreateThread( run, "rendererThread", (void*)this )
SDL_GL_CreateContext( )
"initialize the rest of openGl and glew"
SDL_SemPost( ) //unlock the previously created semaphore
P.S: SDL_CreateThread( ) only takes functions as its first parameter not methods, if a method is wanted than you simulate a method/function in your class by making it a friend function. this way it will have method traits while still able to be used as a functor for the SDL_CreateThread( ).
P.S.S: inside of the "run( void* data )" created for the thread, the "(void*)" this is important and in order to re-obtain "this" inside of the function this line is needed "ClassName* me = (ClassName*)data;"

Is there some kind of incompatibility with Boost::thread() and Nvidia CUDA?

I'm developing a generic streaming CUDA kernel execution Framework that allows parallel data copy & execution on the GPU.
Currently I'm calling the cuda kernels within a C++ static function wrapper, so I can call the kernels from a .cpp file (not .cu), like this:
//kernels.cu:
//kernel definition
__global__ void kernelCall_kernel( dataRow* in, dataRow* out, void* additionalData){
//Do something
};
//kernel handler, so I can compile this .cu and link it with the main project and call it within a .cpp file
extern "C" void kernelCall( dataRow* in, dataRow* out, void* additionalData){
int blocksize = 256;
dim3 dimBlock(blocksize);
dim3 dimGrid(ceil(tableSize/(float)blocksize));
kernelCall_kernel<<<dimGrid,dimBlock>>>(in, out, additionalData);
}
If I call the handler as a normal function, the data printed is right.
//streamProcessing.cpp
//allocations and definitions of data omitted
//copy data to GPU
cudaMemcpy(data_d,data_h,tableSize,cudaMemcpyHostToDevice);
//call:
kernelCall(data_d,result_d,null);
//copy data back
cudaMemcpy(result_h,result_d,resultSize,cudaMemcpyDeviceToHost);
//show result:
printTable(result_h,resultSize);// this just iterate and shows the data
But to allow parallel copy and execution of data on the GPU I need to create a thread, so when I call it making a new boost::thread:
//allocations, definitions of data,copy data to GPU omitted
//call:
boost::thread* kernelThreadOwner = new boost::thread(kernelCall, data_d,result_d,null);
kernelThreadOwner->join();
//Copy data back and print ommited
I just get garbage when printing the result on the end.
Currently I'm just using one thread, for testing purpose, so there should be no much difference in calling it directly or creating a thread. I have no clue why calling the function directly gives the right result, and when creating a thread not. Is this a problem with CUDA & boost? Am I missing something? Thank you in advise.
The problem is that (pre CUDA 4.0) CUDA contexts are tied to the thread in which they were created. When you are using two threads, you have two contexts. The context that the main thread is allocating and reading from, and the context that the thread which runs the kernel inside are not the same. Memory allocations are not portable between contexts. They are effectively separate memory spaces inside the same GPU.
If you want to use threads in this way, you either need to refactor things so that one thread only "talks" to the GPU, and communicates with the parent via CPU memory, or use the CUDA context migration API, which allows a context to be moved from one thread to another (via cuCtxPushCurrent and cuCtxPopCurrent). Be aware that context migration isn't free, and there is latency involved, so if you plan to migrating contexts around frequently, you might find it more efficient to change to a different design which preserves context-thread affinity.

Resources