Making a clone'd thread pthread compatible - multithreading

I am programming in C on Linux x86-64. I'm using a library which creates a number of threads via a raw clone system call rather than using pthread_create. These threads run low-level code internal to the library.
I would like to hook one of these threads to introspect its behavior. Hooking the code is easy enough, but I've discovered that I can't call almost anything in libc because the thread state is not configured. pthread_create normally inserts a bunch of data into the thread-local storage area indexed by fs:. Some of that data, for example, is essential to libc's function, such as the function pointer encryption key (pointer_guard) and locale pointer.
So my question is: can I upgrade a clone'd thread to a full pthread via any mechanism? If not, is there any way that I can call C functions from a clone'd thread (such as printf, toupper, etc. which require libc's thread-local data)?

Some of that data, for example, is essential to libc's function, such as the function pointer encryption key (pointer_guard) and locale pointer.
Correct. Don't forget about errno, which is also in there.
can I upgrade a clone'd thread to a full pthread via any mechanism?
No.
is there any way that I can call C functions from a clone'd thread
No.
If you have sources to the library, it should be relatively easy to replace direct clone calls with pthread_create.
If you do not, but the library is available in archive form, you may be able to use obcopy --rename-symbol to redirect its clone calls to a replacement (e.g. my_clone), which can then create a new thread via pthread_create and invoke the target function in that thread. Whether this will succeed greatly depends on how much the library cares about details of the clone.
It's also probably not worth the trouble.
A better alternative may be to implement the introspection without calling into libc. Since your printf and toupper probably only need to deal with ASCII and C locale, it's not hard to implement limited versions of these functions and use direct system calls to write the output.

Related

How do I check if a given operation (or system call) is atomic on Linux?

I want to find a reliable way (other than reading the kernel source code) to check if a given operation (or system call) is atomic (in the sense that other process can only see the state before or after that operation, but not something in between) on Linux. The goal of this is to avoid using unnecessary locks for some operations if the kernel already does that for me.
So far I can only find resources like this about this topic, which is by no means authoritative or exhaustive. Also, the Linux man pages contains little information about this. For example, for most functions mentioned in the above link, I don't find anything about their atomicity in the man pages.
Could anyone tell me if there is a standard or official documentation which provides this information? Any help would be much appreciated.
I think POSIX thread-safe functions are a good starting point. Thread-safe functions are functions that will give the same results when called from different threads. This is not at all the same as being atomic, but at least it gives a hint about which functions certainly are not atomic.
POSIX.1-2001 and POSIX.1-2008 require that all functions specified in the standard shall be thread-safe, except for a specific set of functions (most of which are implemented in the standard library and not in the kernel).
As an example of a function that is thread-safe but not atomic, consider fwrite(). fwrite() will write to a per-process buffer under pthread locks, so it is thread-safe. However, the buffer may be flushed in separate write() chunks, so other processes don't see it as an atomic write.

shared libraries (dlopen) and thread-safety of library static pointers

When I load a shared library dynamically, for example with dlopen on linux, do I have to worry about the visibility of the loaded library between processors, or will it be automatically fenced/ensured safe?
For example, say I have this function in the loaded library:
char const * get_string()
{ return "literal"; }
In the main program using such a string-literal pointer is safe between multiple threads as they are all guaranteed to see its initial value. However, I'm wondering how the rules of "initial values" really apply to a loaded library (as the standard doesn't deal much with it.
Say that I load the library, then immediately call the get_string function. I pass the pointer to another thread via a non-memory sequenced atomic (relaxed in C++11 parlance). Can the other thread use this pointer safely without having to issue any load fence or other syncronization instruction?
My assumption is that it is safe. Perhaps because the new library will be loaded into new pages the other core cannot have them loaded yet, and thus cannot have old visibility on them?
I would like some kind of authorative reference as part of the answer if possible. Or a technical description of how it is made thread-safe by default. Or of course a refutation if it isn't thread-safe on its own.
Your question is : will dlopen() load all my lib code properly before returning ? Yes it will. Otherwise you'd have the problem with only a single thread. It would be very difficult to handle if you had to sleep before dlopen completes asynchronously. It will also perform various checks and initialize what needs to be before you have a chance to get the function pointer you are looking for. That means that if you get that pointer, everything is here, you can use directly in any thread.
Now of course, you need to pass that pointer with the usual thread safety, but I assume you know how.
Please be aware that static initialization and modules don't play well together (see all the other questions on SO about that subject).
Your comment on cores is strange. Cores don't load memory. They prefetch it in their cache, but that's not a problem, just a bit slow.
I'll expand on what Basile said. I followed up with glibc and found out dlopen there does in deed use mmap. All guarantees of memory visibility are assumed from the mmap system call, dlopen itself doesn't make any additional guarantees.
Users of mmap generally assume that it will map memory correctly across all processors at the point of its return such that visibility is not a concern. This does not appear to be an explicit guarantee, but the OS would probably be unusable without such a guarantee. There is also no known system where this doesn't work as expected.

Win32 alternative for the destructor in pthread_keycreate (when I cannot control dllmain)

In pthreads, you can associate a destructor function with each per-thread storage slot. When a thread dies, if the slot is non-0, the destructor is called.
In a Win32 DLL, the DLLMain function, called at thread exit, can do the same thing.
What can I do in code that lives in a purely static library?
This is a hard problem, and requires sticking callbacks in special locations. Luckily for you, it is solved in Boost.Thread. Use boost::this_thread::at_thread_exit, or boost::thread_specific_ptr
Windows support Thread Local Storage (TLS) in a DLL. It can be very practical if you want have some memory blocks per thread with the unique value (unique per thread). Inside of any other function from the DLL you can get the value which correspond to the current thread in very easy way. It is very useful in some scenarios. Look at here for more details.
I dun't use pthreads myself, but I suppose that per-thread storage slot introduced to make the work with TLS more comfortable.
UPDATED: From your comment I see that you misunderstand my answer. I'm not a POSIX developer, I develop in Win32 only and your question is about WIn32 API possibilities of per-thread allocation and deallocations. I try to explain the possibilities and you can decide yourself which one are better for your specific scenarios.
The equivalent of pthread_XXX functions in Win32 are following:
pthread_key_create TlsAlloc
pthread_setspecific TlsSetValue
pthread_getspecific TlsGetValue
pthread_key_delete TlsFree
I not recommended you to use the construct __declspec(thread), which is more compiler specific.
The example Using Thread Local Storage shows how to use thread local storage (TLS) without DLLs, but I personally like and use TLS only in DLL.
The destructor parameter of the pthread_key_create function has no analog in Win32, but I don't see here any problem. All C/C++ compilers support __try {/**/} __finally {/**/} construct of the Structured Exception Handling so you can use it in the body of your thread function and implement in the way any Cleaning up Resources exacly like you can do this in the main thread.
I find pity that you not included in your question an example which shows how you typically use destructor of the pthread_key_create function. I find that examples can clear much things without a lot of words. So if my answer do not help you we can better explain all in examples: you write an example and probably short comment what it should do and I could write the same code using Win32 API only in C or C++.

What interprocess locking calls should I monitor?

I'm monitoring a process with strace/ltrace in the hope to find and intercept a call that checks, and potentially activates some kind of globally shared lock.
While I've dealt with and read about several forms of interprocess locking on Linux before, I'm drawing a blank on what to calls to look for.
Currently my only suspect is futex() which comes up very early on in the process' execution.
Update0
There is some confusion about what I'm after. I'm monitoring an existing process for calls to persistent interprocess memory or equivalent. I'd like to know what system and library calls to look for. I have no intention call these myself, so naturally futex() will come up, I'm sure many libraries will implement their locking calls in terms of this, etc.
Update1
I'd like a list of function names or a link to documentation, that I should monitor at the ltrace and strace levels (and specifying which). Any other good advice about how to track and locate the global lock in mind would be great.
If you can start monitored process in valgrind, then there are two projects:
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer
and Helgrind
http://valgrind.org/docs/manual/hg-manual.html
Helgrind is aware of all the pthread
abstractions and tracks their effects
as accurately as it can. On x86 and
amd64 platforms, it understands and
partially handles implicit locking
arising from the use of the LOCK
instruction prefix.
So, this tools can detect even atomic memory accesses. And they will check pthread usage
flock is another good one
There are many system calls can be used for locking: flock, fcntl, and even create.
When you are using pthreads/sem_* locks they may be executed in user space so you'll never
see them in strace as futex is called only for pending operations. Like when you actually
need to wait.
Some operations can be done in user space only - like spinlocks - you'll never see them
unless they do some waits for timer - backoff so you may see only stuff like nanosleep when one lock waits for other.
So there is no "generic" way to trace them.
on systems with glibc ~ >= 2.5 (glibc + nptl) you can use process shared
semaphores (last parameter to sem_init), more precisely, posix unnamed semaphores
posix mutexes (with PTHREAD_PROCESS_SHARED to pthread_mutexattr_setpshared)
posix named semaphores (got from sem_open/sem_unlink)
system v (sysv) semaphores: semget, semop
On older systems with glibc 2.2, 2.3 with linuxthreads or on embedded systems with uClibc you can use ONLY system v (sysv) semaphores for iterprocess communication.
upd1: any IPC and socker must be checked.

Is it safe to call a dll function from multiple threads in a single application?

I am writing a server application in Delphi 2009 that implements several types of authentication. Each authentication method is stored in a separate dll. The first time an authentication method is used the appropriate dll is loaded. The dll is only released when the application closes.
Is it safe to access the dlls without any form of synchronisation between the server threads (connections)?
Short answer:
Yes, it is generally possible to call a DLL function from multiple threads, since every thread has it's own stack and calling a DLL function is more or less the same as calling any other function of your own code.
Long answer:
If it is actually possible depends on the DLL functions using a shared mutable state or not.
For example if you do something like this:
DLL_SetUser(UserName, Password)
if DLL_IsAuthenticated then
begin
...
end;
Then it is most certainly not safe to be used from different threads. In this example you can't guarantee that between DLL_SetUser and DLL_IsAuthenticated no other thread makes a different call to DLL_SetUser.
But if the DLL functions do not depend on some kind of predefined state, i.e. all necessary parameters are available at once and all other configuration is the same for all threads, you can assume it'll work.
if DLL_IsAuthenticated(UserName, Password) then
begin
...
end;
But be careful: It might be possible that a DLL function looks atomic, but internally uses something, which isn't. If for example if the DLL creates a temporary file with always the same name or it accesses a database which can only handle one request at a time, it counts as a shared state. (Sorry, I couldn't think of better examples)
Summary:
If the DLL vendors say, that their DLLs are thread-safe I'd use them from multiple threads without locking. If they are not - or even if the vendors don't know - you should play it safe and use locking.
At least until you run into performance problems. In that case you could try creating multiple applications/processes which wrap your DLL calls and use them as proxies.
For your DLLs to be thread-safe you need to protect all shared data structures that multiple threads in one process could access concurrently - there is no difference here between writing code for a DLL vs. writing code for an executable.
For multiple processes there is no risk in concurrent access, as each process gets its own data segment for the DLL, so variables with the same name are actually different when seen from different processes. It is actually much more difficult to provide data in a DLL that is the same from different processes, you would basically need to implement the same things you would use for data exchange between processes.
Note that a DLL is special in that you get notifications when a process or a thread attaches to or detaches from a DLL. See the documentation for the DllMain Callback Function for an explanation, and this article for an example how to use this in a Delphi-written DLL. So if your threads are not completely independent from each other (and no shared data is write-accessed), then you will need some shared data structures with synchronized access. The various notifications may help you with properly setting up any data structures in your DLL.
If your DLLs allow for completely independent execution of the exported functions, also do check out the threadvar thread-specific variables. Note that for them initialization and finalization sections are not usable, but maybe the thread notifications can help you there as well.
Here's the thing - you cannot assume that the DLL's are thread safe if you don't have control over the source code (or documentation stating it is) so therefore you should assume it is not
If you talking about Win32 DLLs, they're meant to be safe if called by multiple threads, applications. I don't know what your DLL does, but if your DLL is using a lockable resource like a file or a port, then there could be trouble based on the implementation inside the DLL.
I am not familiar with the working of Delphi 2009 authentication DLLs. Maybe you should add that information to the headline (that you're talking specifically about the Delphi 2009 DLLs)

Resources