sharedlibrary loaded through LD_PRELOAD, constructor of the same library calls dlopen("libc.so.6")
the problem is dlopen takes forever, debugging showes the following
dlopen calls __dlopen which calls calloc, and unknow function ??, then at last __GI___pthread_mutex_lock.
providing unlimited resources before dlopen as I suspected, but doesn't solve the problem.
the problem only happen if LD_PRELOAD is set with sharedlibrary (mentioned above) with target application Firefox at Linux any other application works without problems(dlopen doesn't block)!
when does dlopen blocks?
When it needs a lock that is not available for some reason.
debugging showes
You need more debugging. The dlopen calls calloc which requires a malloc lock. Nothing special about that.
It must be that some other thread is holding this malloc lock, and is waiting for your LD_PRELOADed library to finish its initialization (thus creating a deadlock). You should be able to find that other thread with (gdb) thread apply all where.
It may also matter what functions you are trying to interpose in your LD_PRELOADed library.
Related
I'm using a shared library that creates worker threads during initialization. The app is linked with uClibc. When it returns from main() it crashes in __pthread_cond_wait() or similar from a worker thread that the shared lib doesn't properly stop from its cleanup() code. The main() thread stack when it crashes is:
#0 _dl_munmap from uClibc.so
#1 _dl_fini
#2 __GI_exit
#3 __uClibc_main
Since I don't have source for the shared library I can't fix the worker cleanup code, but my question is:
Why are threads still running (crashing) once uClibc starts unloading shared libs ? I assume it's unloading them from the _dl_munmap stack entry above. Is there a way to make sure all threads are paused/stopped when main() exits ?
Why are threads still running
Because you (or the shared library you link against) left them running.
Is there a way to make sure all threads are paused/stopped when main() exits
Yes: you need to arrange for threads to terminate. Without access to the shared library source, you can't really do that; your only other choice is to call _exit (which should not run any cleanup) instead of exit (or instead of returning from main).
What happens if some thread is executing some code from a .so and main thread tries to dlclose it?
I am getting a segmentation fault during unloading an shared object. is that expected?
dlclose() calls munmap() for the memory segment which being executed by thread and thus being read. Any read from such a memory leads to this fault, so it is expected and proper behavior.
Currently I'm trying to understand what happens when a shared library spawns a thread, which does not terminate and the shared library is then unloaded.
What happens to the thread if the parent does not wait for the thread to exit?
Does the thread die or does it remain in the running state?
If it does then how can the parent detect when it's being unloaded and somehow terminate the thread?
Thanks for any help.
I assume the shared library is some plugin dynamically loaded at runtime using dlopen(3) and later explicitly unloaded using dlclose.
The dlopen and dlclose functions are internally using a reference counter and they are mmap(2)-ing (for dlopen) and munmap-ing (for dlclose) some segments inside the ELF shared object when appropriate (i.e. when the ref counter crosses the 0 border).
If a thread is running some function inside the dlclose-d shared library, the code of that function becomes munmap-ed and as soon as you jump (or return into) that function, you get a SIGBUS, SIGILL or SIGSEGV signal.
So you don't want that munmapto happen: hence you could:
avoid calling dlclose; this works very well in practice (unless you have a server program), because mmap consumes mostly address space for text read-only segments of shared object. As my manydl.c demonstrates, you can dlopen hundreds of thousands of shared objects on a desktop without reaching serious limits.
or pass RTLD_NODELETE to dlopen asking it to never unmap the library
Alternatively, use some facilities (i.e. destructor-attributed functions in the shared library) or conventions (perhaps atexit(3)?) to be sure that the thread has ended before dlclose
Shared library is loaded into process, so spawn thread will run in process address space.
Thread will keep running if not noticed to exit. and when process is exit, thread will also be terminated.
As shared library spawn the thread, so it is better that shared library also provide a function which will notice thread to exit so process can call the function to exit thread before unload library.
When I load a shared library dynamically, for example with dlopen on linux, do I have to worry about the visibility of the loaded library between processors, or will it be automatically fenced/ensured safe?
For example, say I have this function in the loaded library:
char const * get_string()
{ return "literal"; }
In the main program using such a string-literal pointer is safe between multiple threads as they are all guaranteed to see its initial value. However, I'm wondering how the rules of "initial values" really apply to a loaded library (as the standard doesn't deal much with it.
Say that I load the library, then immediately call the get_string function. I pass the pointer to another thread via a non-memory sequenced atomic (relaxed in C++11 parlance). Can the other thread use this pointer safely without having to issue any load fence or other syncronization instruction?
My assumption is that it is safe. Perhaps because the new library will be loaded into new pages the other core cannot have them loaded yet, and thus cannot have old visibility on them?
I would like some kind of authorative reference as part of the answer if possible. Or a technical description of how it is made thread-safe by default. Or of course a refutation if it isn't thread-safe on its own.
Your question is : will dlopen() load all my lib code properly before returning ? Yes it will. Otherwise you'd have the problem with only a single thread. It would be very difficult to handle if you had to sleep before dlopen completes asynchronously. It will also perform various checks and initialize what needs to be before you have a chance to get the function pointer you are looking for. That means that if you get that pointer, everything is here, you can use directly in any thread.
Now of course, you need to pass that pointer with the usual thread safety, but I assume you know how.
Please be aware that static initialization and modules don't play well together (see all the other questions on SO about that subject).
Your comment on cores is strange. Cores don't load memory. They prefetch it in their cache, but that's not a problem, just a bit slow.
I'll expand on what Basile said. I followed up with glibc and found out dlopen there does in deed use mmap. All guarantees of memory visibility are assumed from the mmap system call, dlopen itself doesn't make any additional guarantees.
Users of mmap generally assume that it will map memory correctly across all processors at the point of its return such that visibility is not a concern. This does not appear to be an explicit guarantee, but the OS would probably be unusable without such a guarantee. There is also no known system where this doesn't work as expected.
I have GDB attached to a deadlocked application written with pthreads. There are ~10 threads that are all blocked, and I'd like to know which locks are held by which threads. This is possible in WinDbg using SOS.dll; is this possible in GDB?
On at least one flavor of Linux, the C++11 std::mutex has a member called __owner which contains the thread id of the thread that currently has the mutex locked. Using "info threads" in gdb shows the thread numbers used by gdb along with the thread ids (see the "LWP" number), allowing you to switch to that thread ("thread N") and then examine the call stack ("backtrace").
It's not GDB you should be asking about, but rather the specific pthread library and OS you are using.
The pthread library implements mutexes in cooperation with the kernel via some set of system calls. If its implementation of mutexes embeds something to tie the last thread holding the mutex into the mutex data structure, then you can use GDB to get at that information.
It is possible your kernel tracks that information. Under Mac OS X, for example, the collection of GDB scripts bundled with the kernel debugging kit, kgmacros, includes a command showallmtx that will do exactly what you want. The catch: to use it, you have to be debugging the machine's kernel at the time, which means you need to be doing the debugging using a different machine.
Of course, you might have a /dev/kmem device file, which would let you poke around in the kernel's memory and access the necessary data structure, provided you can locate it.
But this all really depends on your system - your pthread library and OS kernel - not on GDB.
You could also try creating a mutex of type PTHREAD_MUTEX_ERRORCHECK; this will cause pthread_mutex_lock() to return EDEADLK instead of deadlocking. You can then break when that occurs and root around in your non-deadlocked process.
GDB could be able to display this information, but they didn't implement this functionality: it requires a cooperation between the debugger and the thread library, though the libthread_db library.
DBX under Solaris -- at least -- (both from Sun, it helps!) implements correctly this feature (look for the Locks part)