Segmentation fault when running ccall using Threads.#threads - multithreading

I am using ccall from Julia (1.2.0) to call a c function that I have created in a loop that runs in multiple cores:
Threads.#threads for i in 1:10
ccall((:myfunction, "myclib", (...), input[i])
end
This implementation runs fine without Threads.#threads but crashes due to segmentation fault when I use Threads.#threads and I have no idea why. I checked all the c functions that I am using in myclib and they are all thread safe.
My question is the following: In this kind of implementations, are the functions that run on c thread independent or they are linked somehow? For example, do they share global variables? Or does the stack memory limit apply to each thread independently or to the memory used by all threads?
Thanks for your help,
Dylan

I solved the problem by removing all the global variables of myfunction in c. However, I don't really understand how the global variables work when with a c function is called using ccall with multi threads.

Related

How to pass a share value to Processes which has jit / njit function that read and modify the share value?

I am trying to have an integer value which would be assigned to a multiprocess programme and each process has a jit funtion to read and modify the value.
I came accross with multiprocessing.Manager().value which would pass a share value to each process, but numba.jit does not accept this type.
Is there any solution to work around it?
import numba
import multiprocessing
#numba.jit()
def jj (o, ii):
print (o.value)
o.value = ii
print (o.value)
if __name__ == '__main__':
o = multiprocessing.Manager().Value('i', 0 , lock=False)
y1 = multiprocessing.Process(target=jj, args=(o,10))
y1.daemon = True
y2 = multiprocessing.Process(target=jj, args=(o,20))
y2.daemon = True
y1.start()
y2.start()
y1.join()
y2.join()
You cannot modify a CPython object from an njit function so the function will (almost) not benefit from Numba (the only optimization Numba can do is looplifting but it cannot be used here anyway). What you try to archive is not possible with multiprocessing + njitted Numba functions. Numba can be fast because it does not operate on CPython types but native ones but multiprocessing's managers operate on only on CPython types. You can use the very experimental objmode scope of Numba so to execute pure-Python in a Numba function but be aware that this is slow (and it sometimes just crash currently).
Another big issue is that shared CPython objects are protected by the global interpreter lock (GIL) which basically prevent any parallel speed-up inside a process (unless on IO-based codes or similar things). The GIL is designed so to protect the interpreter of race conditions on the internal state of objects. AFAIK, managers can transfer pure-Python objects between processes thanks to pickling (which is slow), but using lock=False is unsafe and can also cause a race condition (not at the interpreter level thanks to the GIL).
Note the Numba function have to be recompiled for each process which is slow (caching can help the subsequent runs but not the first time because of concurrent compilation in multiple processes).

Printing to the Terminal from (parallel) Threads (Common Lisp)

In one of Timmy Jose's blog posts at https://z0ltan.wordpress.com/2016/09/02/basic-concurrency-and-parallelism-in-common-lisp-part-3-concurrency-using-bordeaux-and-sbcl-threads/ he gives an example of the wrong way to print to the top level from inside a thread (using Bordeaux Threads as an example, although I am using Lparallel):
(defun print-message-top-level-wrong ()
(bt:make-thread
(lambda ()
(format *standard-output* "Hello from thread!")))
nil)
(print-message-top-level-wrong) -> NIL
The explanation is that "The same code would have run fine if we had not run it in a separate thread. What happens is that each thread has its own stack where the variables are rebound. In this case, even for *standard-output*, which being a global variable, we would assume should be available to all threads, is rebound inside each thread!"
And this is exactly what happens if the function is run in Allegro CL. However, in SBCL the function does print the intended output at the terminal. Does this mean *standard-output* is not being rebound in SBCL? In general, is there a cross-platform way to print to *standard-output* from inside a thread?
In a multithreaded situation printing to the terminal should normally be coordinated to avoid potentially printing from several streams at the same time. But there don't seem to be any functions like atomic-format or atomic-print available. Is there a straightforward way to avoid printing interference when there are multiple threads (assuming that locks/mutexes are too expensive to use for each individual printing operation)?
If you actually have a global binding (a binding in the global environment), it does work for all threads; see the documentation for bt:make-thread. Only dynamic (re-)bindings are thread-local. Implementations differ in how/when they bind those streams; sometimes the binding that is actually in effect for user programs is global, sometimes not.
I like to use some sort of queue or channel to coordinate output where necessary; I have not yet run into situations where the locking overhead was prohibitive.
Maybe you could try something with optimistic locking, but I don't know what has been done for that librarywise (some Lisp implementations do have CAS operations that could be used). This should be orthogonal to the parallelism library used.
EDIT: Just found in the SBCL manual: sb-concurrency has lock-free queues and mailboxes.

C++11 thread safe singleton using lambda and call_once: main function (g++, clang++, Ubuntu 14.04)

All!
I am new to C++11 and many of its features.
I am looking for a C++11 (non boost) implementation of a thread safe singleton, using lambda and call_once (Sorry... I have no rights to include the call_once tag in the post).
I have investigated quite a lot (I am using g++ (4.8, 5.x, 6.2), clang++3.8, Ubuntu 14.04, trying to avoid using boost), and I have found the following links:
http://www.nuonsoft.com/blog/2012/10/21/implementing-a-thread-safe-singleton-with-c11/comment-page-1/
http://silviuardelean.ro/2012/06/05/few-singleton-approaches/ (which seems to be very similar to the previous one, but it is more complete, and provides at the end its own implementation).
But: I am facing these problems with the mentioned implementations: Or I am writing a wrong implementation of main function (probable), or there are mistakes in the posted codes (less probable), but I am receiving different compiling / linking errors (or both things at the same time, of course...).
Similar happens with following code, which seems to compile according to comments (but this one does not use lambda, neither call_once):
How to ensure std::call_once really is only called once (In this case, it compiles fine, but throws the following error in runtime):
terminate called after throwing an instance of 'std::system_error'
what(): Unknown error -1
Aborted (core dumped)
So, could you help me, please, with the correct way to call the getInstance() in the main function, to get one (and only one object) and then, how to call other functions that I might include in the Singleton? (Something like: Singleton::getInstance()->myFx(x, y, z);?
(Note: I have also found several references in StackOverflow, which are resolved as "thread safe", but there are similar implementations in other StackOverflow posts and other Internet places which are not considered "thread safe"; here are a few example of both (these do not use lambda) ):
Thread-safe singleton in C++11
c++ singleton implementation STL thread safe
Thread safe singleton in C++
Thread safe singleton implementation in C++
Thread safe lazy construction of a singleton in C++
Finally, I will appreciate very much if you can suggest to me the best books to study about these subjects. Thanks in advance!!
I just ran across this issue. In my case, I needed to add -lpthread to my compilation options.
Implementing a singleton with a static variable as e. g. suggested by Thread safe singleton implementation in C++ is thread safe with C++11. With C++11 the initialization of static variables is defined to happen on
exactly one thread, and no other threads will proceed until that initialization is complete. (I can also backup that with problems we recently had on an embedded platform when we used call_once to implement a singleton and it worked after we returned to the "classic" singleton implementation with the static variable.)
ISO/IEC 14882:2011 defines in §3.6.2 e. g. that
Static initialization shall be performed before any dynamic initialization takes place.
and as part of §6.7:
The zero-initialization (8.5) of all block-scope variables with static
storage duration (3.7.1) or thread storage duration (3.7.2) is
performed before any other initialization takes place.
(See also this answer)
A very good book I can recommend is "C++ Concurrency in Action" by A. Williams. (As part of Chapter 3 call_once and the Singleton pattern is discussed - that is why I know that the "classic Singleton" is thread safe since C++11.)

concurrence problems in c++11

Recently I have learned about multithreading library in c++11. I consider such a situation that there is a global variable int x=0 and there are two separate threads run in two separate cores. Whether the two threads may be write to memory of x simultaneously ? For example in thread#1 let x=0x0000, int thread#2 let x=0xffff, Could x be some invalidate value of 0x00ff ?
I have test it on x86-64 linux(windows) with g++ clang msvc, the answer is no, the value of x is 0x0000 or 0xffff. It looks like the assign operation is atomic or it just a coincidence.
Can someone help me about this?
Theoretically, speaking - you absolutely can end up with 0x00ff, or even 0xabcd. If two threads try to modify the value of an object, and these expressions are not sequenced (i.e. synchronized), the behavior of the program is undefined.
Now, whether or not this can happen in practice - it really depends on the OS and hardware architecture, and although the probability is low, it can still happen.
Use std::atomic<int> instead of int

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?
You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.
Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.
Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

Resources