I read that pthread is C library and is not compatible with C++ object model, especially when talking about exception handling.
So I wish to know on linux system, how gcc/clang implements std::thread, is it calling some linux native functions/kernel apis or something?
Also, how is std::thread_local implemented, related with __thread?
I read that pthread is C library and is not compatible with C++ object model, especially when talking about exception handling.
This information is inaccurate.
how gcc/clang implements std::thread
They call a platform-specific thread creation function. On Linux it is pthread_create. You can call this function directly.
When a thread throws an exception and it is not caught std::terminate is called.
Note that your application must be compiled and linked with -pthread flag (using -lpthread is unnecessary and insufficient with both C and C++).
I read that pthread is C library and is not compatible with C++ object model, especially when talking about exception handling.
There's a statement in the neighborhood of this that is true, but this statement as written is not true.
There are two facts here.
If YOU call pthreads functions yourself, it is indeed just a C library, and you had better make sure you do everything correctly in regards to exception safety. If you pass function pointers to pthread_create_... and those functions will throw exceptions... your program can have big problems. That should be obvious, it will be true whenever you talk to a C library from C++.
That does not mean it is impossible to use such a library with a C++ program!
pthread does not actually need to know about any of your objects, or any of their ctors or dtors, or any of that, in order to make your program multithreaded. All it needs to spawn a thread, is a function pointer, and that function pointer will have a completely C-compatible signature.
When the C++ compiler calls pthreads functions in order to implement std::thread, the compiler is going to emit code that talks to pthread correctly. If it uses pthread in an illegal way to implement your C++ program, it's a bug in the compiler or standard library.
Use ldd myExecutable on compiler output to find out.
Both libstdc++ and libc++ apparently use pthreads, but they are not required to do that. Evidence of it can be found in native_handle methods documentation here and here. The documents say:
Accesses the native handle of *this.
The meaning and the type of the result of this function is implementation-defined. On a POSIX system, this may be a value of type pthread_cond_t*. On a Windows system, this may be a PCONDITION_VARIABLE.
and
Returns the implementation defined underlying thread handle.
Related
I am programming in C on Linux x86-64. I'm using a library which creates a number of threads via a raw clone system call rather than using pthread_create. These threads run low-level code internal to the library.
I would like to hook one of these threads to introspect its behavior. Hooking the code is easy enough, but I've discovered that I can't call almost anything in libc because the thread state is not configured. pthread_create normally inserts a bunch of data into the thread-local storage area indexed by fs:. Some of that data, for example, is essential to libc's function, such as the function pointer encryption key (pointer_guard) and locale pointer.
So my question is: can I upgrade a clone'd thread to a full pthread via any mechanism? If not, is there any way that I can call C functions from a clone'd thread (such as printf, toupper, etc. which require libc's thread-local data)?
Some of that data, for example, is essential to libc's function, such as the function pointer encryption key (pointer_guard) and locale pointer.
Correct. Don't forget about errno, which is also in there.
can I upgrade a clone'd thread to a full pthread via any mechanism?
No.
is there any way that I can call C functions from a clone'd thread
No.
If you have sources to the library, it should be relatively easy to replace direct clone calls with pthread_create.
If you do not, but the library is available in archive form, you may be able to use obcopy --rename-symbol to redirect its clone calls to a replacement (e.g. my_clone), which can then create a new thread via pthread_create and invoke the target function in that thread. Whether this will succeed greatly depends on how much the library cares about details of the clone.
It's also probably not worth the trouble.
A better alternative may be to implement the introspection without calling into libc. Since your printf and toupper probably only need to deal with ASCII and C locale, it's not hard to implement limited versions of these functions and use direct system calls to write the output.
I've been looking for information about a relationship between the linking process and the calls but I didn't find any useful information. I would be very grateful if someone could help me.
Not really clear what you are asking.
The "linking process" connects up references between different compilation units and objects. When those references are function calls, it links them up so code in one compilation unit calls the code in another.
System calls happen in some code. When the system call is in a library and the library function is called from another compilation unit (common case), it links up the call, so that the code in your program calls the library which does the system call.
I derived a TMyThread object from TThread in Delphi, and in TMyThread.Execute, it will invoke a DLL written by Visual C++. In that case, must the DLL also compiled with the Multi-thread library and support multi-thread as well?
Older versions of the MSVC runtime come in both multi-threaded and single-threaded variants. The difference is that the single-threaded variant does not protect against potential race conditions. So, if the code that calls into the MSVC runtime does so from more than one thread, the single-threaded runtime cannot be safely used.
The scenario that you describe has only a single thread executing code inside your MSVC DLL. In which case the single-threaded MSVC runtime is safe to use. It does not matter that the host executable is multi-threaded. All that counts is whether multiple threads call into the MSVC runtime attached to your MSVC DLL.
MSVC stopped shipping separate single-threaded and multi-threaded runtimes many releases ago. One wonders whether or not it makes a difference to your application. Can you detect any performance difference between the two runtime options. If not then it would make sense to me to use the multi-threaded runtime. Choosing the single-threaded runtime is just storing up a potential debugging headache when you forget about this in a future change to the code and introduce extra threads to your MSVC DLL.
The C++ DLL should be MT, if you intend to use it MT. If you intend to use it from only one single thread of your application, then you don't have to do that. But you should clearly document this as soon as you have the slightest doubt that there could be a thread conflict, e.g. with data structures internally managed within the DLL. Or use MT anyway, take care of proper locking and forget about it. (My previous Delphi statement still stands true).
When I load a shared library dynamically, for example with dlopen on linux, do I have to worry about the visibility of the loaded library between processors, or will it be automatically fenced/ensured safe?
For example, say I have this function in the loaded library:
char const * get_string()
{ return "literal"; }
In the main program using such a string-literal pointer is safe between multiple threads as they are all guaranteed to see its initial value. However, I'm wondering how the rules of "initial values" really apply to a loaded library (as the standard doesn't deal much with it.
Say that I load the library, then immediately call the get_string function. I pass the pointer to another thread via a non-memory sequenced atomic (relaxed in C++11 parlance). Can the other thread use this pointer safely without having to issue any load fence or other syncronization instruction?
My assumption is that it is safe. Perhaps because the new library will be loaded into new pages the other core cannot have them loaded yet, and thus cannot have old visibility on them?
I would like some kind of authorative reference as part of the answer if possible. Or a technical description of how it is made thread-safe by default. Or of course a refutation if it isn't thread-safe on its own.
Your question is : will dlopen() load all my lib code properly before returning ? Yes it will. Otherwise you'd have the problem with only a single thread. It would be very difficult to handle if you had to sleep before dlopen completes asynchronously. It will also perform various checks and initialize what needs to be before you have a chance to get the function pointer you are looking for. That means that if you get that pointer, everything is here, you can use directly in any thread.
Now of course, you need to pass that pointer with the usual thread safety, but I assume you know how.
Please be aware that static initialization and modules don't play well together (see all the other questions on SO about that subject).
Your comment on cores is strange. Cores don't load memory. They prefetch it in their cache, but that's not a problem, just a bit slow.
I'll expand on what Basile said. I followed up with glibc and found out dlopen there does in deed use mmap. All guarantees of memory visibility are assumed from the mmap system call, dlopen itself doesn't make any additional guarantees.
Users of mmap generally assume that it will map memory correctly across all processors at the point of its return such that visibility is not a concern. This does not appear to be an explicit guarantee, but the OS would probably be unusable without such a guarantee. There is also no known system where this doesn't work as expected.
In pthreads, you can associate a destructor function with each per-thread storage slot. When a thread dies, if the slot is non-0, the destructor is called.
In a Win32 DLL, the DLLMain function, called at thread exit, can do the same thing.
What can I do in code that lives in a purely static library?
This is a hard problem, and requires sticking callbacks in special locations. Luckily for you, it is solved in Boost.Thread. Use boost::this_thread::at_thread_exit, or boost::thread_specific_ptr
Windows support Thread Local Storage (TLS) in a DLL. It can be very practical if you want have some memory blocks per thread with the unique value (unique per thread). Inside of any other function from the DLL you can get the value which correspond to the current thread in very easy way. It is very useful in some scenarios. Look at here for more details.
I dun't use pthreads myself, but I suppose that per-thread storage slot introduced to make the work with TLS more comfortable.
UPDATED: From your comment I see that you misunderstand my answer. I'm not a POSIX developer, I develop in Win32 only and your question is about WIn32 API possibilities of per-thread allocation and deallocations. I try to explain the possibilities and you can decide yourself which one are better for your specific scenarios.
The equivalent of pthread_XXX functions in Win32 are following:
pthread_key_create TlsAlloc
pthread_setspecific TlsSetValue
pthread_getspecific TlsGetValue
pthread_key_delete TlsFree
I not recommended you to use the construct __declspec(thread), which is more compiler specific.
The example Using Thread Local Storage shows how to use thread local storage (TLS) without DLLs, but I personally like and use TLS only in DLL.
The destructor parameter of the pthread_key_create function has no analog in Win32, but I don't see here any problem. All C/C++ compilers support __try {/**/} __finally {/**/} construct of the Structured Exception Handling so you can use it in the body of your thread function and implement in the way any Cleaning up Resources exacly like you can do this in the main thread.
I find pity that you not included in your question an example which shows how you typically use destructor of the pthread_key_create function. I find that examples can clear much things without a lot of words. So if my answer do not help you we can better explain all in examples: you write an example and probably short comment what it should do and I could write the same code using Win32 API only in C or C++.