uClibc shared libs unloading during exit() - linux

I'm using a shared library that creates worker threads during initialization. The app is linked with uClibc. When it returns from main() it crashes in __pthread_cond_wait() or similar from a worker thread that the shared lib doesn't properly stop from its cleanup() code. The main() thread stack when it crashes is:
#0 _dl_munmap from uClibc.so
#1 _dl_fini
#2 __GI_exit
#3 __uClibc_main
Since I don't have source for the shared library I can't fix the worker cleanup code, but my question is:
Why are threads still running (crashing) once uClibc starts unloading shared libs ? I assume it's unloading them from the _dl_munmap stack entry above. Is there a way to make sure all threads are paused/stopped when main() exits ?

Why are threads still running
Because you (or the shared library you link against) left them running.
Is there a way to make sure all threads are paused/stopped when main() exits
Yes: you need to arrange for threads to terminate. Without access to the shared library source, you can't really do that; your only other choice is to call _exit (which should not run any cleanup) instead of exit (or instead of returning from main).

Related

How can one implement pthread_detach on Linux?

pthread_detach marks a thread so that when it terminates, its resources are automatically released without requiring the parent thread to call pthread_join. How can it do this? From the perspective of Linux in particular, there are two resources in particular I am curious about:
As an implementation detail, I would expect that if a wait system call is not performed on the terminated thread, then the thread would become a zombie. I assume that the pthread library's solution to this problem does not involve SIGCHLD, because (I think) it still works regardless of what action the program has specified to occur when SIGCHLD is received.
Threads are created using the clone system call. The caller must allocate memory to serve as the child thread's stack area before calling clone. Elsewhere on Stack Overflow, it was recommended that the caller use mmap to allocate the stack for the child. How can the stack be unmapped after the thread exits?
It seems to me that pthread_detach must somehow provide solutions to both of these problems, otherwise, a program that spawns and detaches many threads would eventually lose the ability to continue spawning new threads, even though the detached threads may have terminated already.
The pthreads library (on Linux, NPTL) provides a wrapper around lower-level primitives such as clone(2). When a thread is created with pthread_create, the function passed to clone is a wrapper function. That function allocates the stack and stores that information plus any other metadata into a structure, then calls the user-provided start function. When the user-provided start function returns, cleanup happens. Finally, an internal function called __exit_thread is called to make a system call to exit the thread.
When such a thread is detached, it still returns from the user-provided start function and calls the cleanup code as before, except the stack and metadata is freed as part of this since there is nobody waiting for this thread to complete. This would normally be handled by pthread_join.
If a thread is killed or exits without having run, then the cleanup is handled by the next pthread_create call, which will call any cleanup handlers yet to be run.
The reason a SIGCHLD is not sent to the parent nor is wait(2) required is because the CLONE_THREAD flag to clone(2) is used. The manual page says the following about this flag:
A new thread created with CLONE_THREAD has the same parent process as the process that made the clone call (i.e., like CLONE_PARENT), so that calls to getppid(2) return the same value for all of the threads in a thread group. When a CLONE_THREAD thread terminates, the thread that created it is not sent a SIGCHLD (or other termination) signal; nor can the status of such a thread be obtained using wait(2). (The thread is said to be detached.)
As you noted, this is required for the expected POSIX semantics to occur.

when does dlopen blocks?

sharedlibrary loaded through LD_PRELOAD, constructor of the same library calls dlopen("libc.so.6")
the problem is dlopen takes forever, debugging showes the following
dlopen calls __dlopen which calls calloc, and unknow function ??, then at last __GI___pthread_mutex_lock.
providing unlimited resources before dlopen as I suspected, but doesn't solve the problem.
the problem only happen if LD_PRELOAD is set with sharedlibrary (mentioned above) with target application Firefox at Linux any other application works without problems(dlopen doesn't block)!
when does dlopen blocks?
When it needs a lock that is not available for some reason.
debugging showes
You need more debugging. The dlopen calls calloc which requires a malloc lock. Nothing special about that.
It must be that some other thread is holding this malloc lock, and is waiting for your LD_PRELOADed library to finish its initialization (thus creating a deadlock). You should be able to find that other thread with (gdb) thread apply all where.
It may also matter what functions you are trying to interpose in your LD_PRELOADed library.

Is it safe to kill thread using only stack variables?

I have a fortran subroutine. It runs quite long time once started.
Now, I want to write a program which calls the fortran subroutine from C++ in a thread.
The thread should be stopped(or canceled) when user requested.
But the subroutine does not support any method to terminate the calculation during it running.
As far as I know, the subroutine uses only stack variables(no allocation).
The subroutine is given by static library for windows(.lib file).
In this case, may I assume that killing the subroutine thread does not causes any problem such as resource leaking?
FYI, here's the running environment:
OS: Windows 7 64bit or above
Compiler: MSVC 2015 for C++, Intel Parallel Studio for fortran
In general it's not safe -- there are other resources that the thread could acquire besides memory. For example, it could lock a mutex, and if you killed the thread while the mutex was locked, the mutex would remain locked forever, with the likely result that other threads would deadlock waiting forever to acquire the mutex. If you really have no way to get the thread to exit cleanly/voluntarily, then the only safe approach is to spawn a child process and run the routine inside the child process. You can safely kill the child process if you have to, because the OS will automatically clean up any resources that were allocated by the child process.

What is the difference between exit() and exit_group()

What is the difference between exit() and exit_group(). Any process that has multiple threads should use exit_group instead of exit?
To answer the question why do you ask - we are having an process that has around forty threads. When a thread is locked up, we automatically exit the process and then restart the process. And we print the backtrace of the thread that was locked up. We wanted to know whether calling exit in this case is any different from exit_group.
From the docs: This system call is equivalent to exit(2) except that it terminates not only the calling thread, but all threads in the calling process's thread group - However, what is the difference between exiting the process and exiting all the threads. Isn't exiting process == exiting all the threads.
All thread libraries I know (e.g. recent glibc or musl-libc) are using the low-level clone(2) system call for their thread implementations (and some C libraries are even using clone to fork a process).
clone is a difficult Linux syscall. Unless you are a thread library implementor, you should not use it directly but only thru library functions (like e.g. pthread_create(3)); see also futex(7) used in pthread_mutex* functions
The clone syscall is used to create tasks: either threads (sharing address space in a multi-threaded process) or processes.
The exit_group syscall is related to exiting these tasks.
In short, you'll never use directly exit_group or clone. Your libc is doing that for you. So don't care about exit_group or _Exit; you should use the standard library function exit(3) only, which deals notably with atexit(3) & on_exit(3) registered handlers and flushes <stdio.h> buffers. In the rare cases you don't want that to happen, use _exit(2) (but you probably don't need that).
Of course, if you are reimplementing your own libc from scratch, you need to care about exit_group & clone; but otherwise you don't care about them..
If you care about gory implementation details, dive into the source code of your libc. Details may be libc-version, kernel-version, and compiler specific!

Thread lifetime in linux

Currently I'm trying to understand what happens when a shared library spawns a thread, which does not terminate and the shared library is then unloaded.
What happens to the thread if the parent does not wait for the thread to exit?
Does the thread die or does it remain in the running state?
If it does then how can the parent detect when it's being unloaded and somehow terminate the thread?
Thanks for any help.
I assume the shared library is some plugin dynamically loaded at runtime using dlopen(3) and later explicitly unloaded using dlclose.
The dlopen and dlclose functions are internally using a reference counter and they are mmap(2)-ing (for dlopen) and munmap-ing (for dlclose) some segments inside the ELF shared object when appropriate (i.e. when the ref counter crosses the 0 border).
If a thread is running some function inside the dlclose-d shared library, the code of that function becomes munmap-ed and as soon as you jump (or return into) that function, you get a SIGBUS, SIGILL or SIGSEGV signal.
So you don't want that munmapto happen: hence you could:
avoid calling dlclose; this works very well in practice (unless you have a server program), because mmap consumes mostly address space for text read-only segments of shared object. As my manydl.c demonstrates, you can dlopen hundreds of thousands of shared objects on a desktop without reaching serious limits.
or pass RTLD_NODELETE to dlopen asking it to never unmap the library
Alternatively, use some facilities (i.e. destructor-attributed functions in the shared library) or conventions (perhaps atexit(3)?) to be sure that the thread has ended before dlclose
Shared library is loaded into process, so spawn thread will run in process address space.
Thread will keep running if not noticed to exit. and when process is exit, thread will also be terminated.
As shared library spawn the thread, so it is better that shared library also provide a function which will notice thread to exit so process can call the function to exit thread before unload library.

Resources