Pthread library initialization - linux

I have created wrappers around the pthread functions using dlopen and dlsym, in order to debug and profile some issues occurring in my application. The profiler passes all of the unit tests. Unfortunately, it appears that I have bypassed some library initialization because getenv now returns null to all input. If I remove the profiler, the proper operation of getenv returns.
I believe that the issue is that the compiler has not linked in Libpthread because it does not see any symbols requested from the library at link time. I have looked through the glib source but have not found an obvious run time initialization function that I can load and execute with dlsym.
I have stepped into getenv and found that __environ=null with the overrides compiled in. However, environ contains the proper values. __environ has the proper variables after removing the profiler.
Also, getenv appears to work with the pthread overrides on Ubuntu 10.04, with glibc 2.11. Unfortunately, upgrading is not an appealing option due to existing product distribution.
Linux 2.6.31
Glib 2.5
My init code:
inline int init_pthreads_debugger(void)
{
static int recursion=0;
if(!real_pthread_create)
{
// we know that we are single threaded here, because we override pthread_create and
// call this function. Therefore, recursion does not have to be guarded.
if(recursion)
{
return 0;
}
recursion = 1;
init_heap();
void * handle = dlopen("libpthread.so.0",RTLD_NOW);
real_pthread_cond_timedwait =(real_pthread_cond_timedwait_t)dlsym(handle,"pthread_cond_timedwait");
// more pthread initialization functions here.
//do me last to make sure any recursion in dlsym/dlopen is caught
real_pthread_create =(real_pthread_create_t)dlsym(handle,"pthread_create");
recursion = 0;
}
return 1;
}
//an example override
int pthread_cond_timedwait(pthread_cond_t *c, pthread_mutex_t * m, const struct timespec * t)
{
if(!init_pthreads_debugger()) return 0; //no thread, no sync needed.
int ret;
int condition_count;
ptd_note_unblock((void *)m,&condition_count);
ret=real_pthread_cond_timedwait(c,m,t);
ptd_note_block((void *)m,&condition_count);
return ret;
}
thanks for any help.

I have created wrappers around the pthread functions using dlopen and dlsym
I suspect that you are attempting to build a library interposer, similar to this one.
This approach is very unlikely to succeed in general for pthread functions, because both dlopen and dlsym themselves call pthread_mutex_lock and pthread_mutex_unlock (and possibly others), as does the dynamic loader itself.

Related

C++ - User-Level Threads - sigaction by SIGVTALRM

I've found some evil bug in my user-level threads library.
My scheduler is actually a singleton class that initializes a signal timer this way:
sigAlarm_ is a member field of the scheduler, and its of type struct sigaction.
This is the related part of the scheduler initialization:
sigAlarm_.sa_handler = timerHandlerGlobal; // Assign the first field of sigAlarm (sa_handler) as needed, others zeroed
if (sigaction(SIGVTALRM, &sigAlarm_, nullptr) != 0) { uthreadSystemError("sigaction"); }
Now, this timerHandlerGlobal is a static function, and not a member function of the scheduler, as C++ doesn't permit passing function members this way.
Now, when I terminate the main thread of the library (which actually runs the scheduler), I'm invoking std::exit(1) which cleans the resources up.
When I'm running my tests with ASan (Address Sanitizer), in some executions, it gets into the timerHandlerGlobal while the scheduler is already nullptr!
Now, I've been already two days on that, inspecting what's the cause.
Now I see that if I'm adding this ugly condition, no problem appears with ASAN:
void timerHandlerGlobal(int signo)
{
if (scheduler_manager)
{
scheduler_manager->timerHandler(signo);
}
}
But, why is after std::exit(1) invoked by the scheduler, the sigaction.sa_handler (which is timerHandlerGlobal), is still running?
Please tell me you know why it is, I just want to omit this awful condition.

VC++ native mutex heap corruption

I have a native c++ library that gets used by a managed C++ application. The native library is compiled with no CLR support and the managed C++ application with it (/CLR compiler option).
When I use a std::mutex in the native library I get a heap corruption when the owning native class is deleted. The use of mutex.h is blocked by managed C++ so I'm guessing that could be part of the reason.
The minimal native class that demonstrates the issue is:
Header:
#pragma once
#include <stdio.h>
#ifndef __cplusplus_cli
#include <mutex>
#endif
namespace MyNamespace {
class SomeNativeLibrary
{
public:
SomeNativeLibrary();
~SomeNativeLibrary();
void DoSomething();
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
};
}
Implementation:
#include "SomeNativeLibrary.h"
namespace MyNamespace {
SomeNativeLibrary::SomeNativeLibrary()
{}
SomeNativeLibrary::~SomeNativeLibrary()
{}
void SomeNativeLibrary::DoSomething(){
printf("I did something.\n");
}
}
Managed C++ Console Application:
int main(array<System::String ^> ^args)
{
Console::WriteLine(L"Unit Test Console:");
MyNamespace::SomeNativeLibrary *someNativelib = new MyNamespace::SomeNativeLibrary();
someNativelib->DoSomething();
delete someNativelib;
getchar();
return 0;
}
The heap corruption debug error occurs when the attempt is made to delete the someNativeLib pointer.
Is there anything I can do to use a std::mutex safely in the native library or is there an alternative I could use? In my live code the mutex is used for is to ensure that only a single thread accesses a std::vector.
The solution was to use a CRITICAL_SECTION as the lock instead. It's actually more efficient than a mutex in my case anyway since the lock is only for threads in the same process.
Not sure were you reading your own post but there is a clue in your code:
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
Member 'aMutex' compiles only if compile condition '__cplusplus_cli' is undefined.
So the moment you included that header in Managed C++ it vanished from definition.
So your Native project and Managed project have mismatch in class definition for beginners == mostly ends in Access Violation if attempted to write to location beyond class memory (non existing member in CLI version if instantiated there).
Or just HEAP CORRUPTION in managed code to put it simply.
So what you have done is no go, ever!
But I was actually amazed that you managed to include native lib, and successfully compile both projects. I must ask how did you hack project properties to manage that. Or you just may have found yet another bug ;)
About question: for posterity
Yes CRITICAL_SECTION helps, Yes it's more faster from mutex since it is implemented in single process and some versions of it even in hardware (!). Also had plenty of changes since it's introduction and some nasty DEAD-LOCKS issues.
example: https://microsoft.public.win32.programmer.kernel.narkive.com/xS8sFPCG/criticalsection-deadlock-with-owningthread-of-zero
end up just killing entire OS. So lock only very small piece of code that actually only accesses the data, and exit locks immediately.
As replacement, you could just use plain "C" kernel mutex or events if not planning cross-platform support (Linux/iOS/Android/Win/MCU/...).
There is a ton of other replacements coming from Windows kernel.
// mutex
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexw
HANDLE hMutex = CreateMutex(NULL, TRUE, _T("MutexName"));
NOTE: mutex name rules 'Global' vs 'Local'.
// or event
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createeventw
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, _T("EventName"));
to either set, or clear/reset event state, just call SetEvent(hEvent), or ResetEvent(hEvent).
To wait for signal (set) again simply
int nret = WaitForSingleObject(hEvent, -1);
INFINITE define (-1) wait for infinity, a value which can be replaced with actual milliseconds timeout, and then return value could be evaluated against S_OK. If not S_OK it's most likely TIMEOUT.
There is a sea of synchronization functions/techniques depending on which you actually need to do !?:
https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-functions
But use it with sanity check.
Since you might be just trying to fix wrong thing in the end, each solution depends on the actual problem. And the moment you think it's too complex know it's wrong solution, simplest solutions were always best, but always verify and validate.

Thread local boost fast_pool_allocator

I've a multithreaded (Cilk) program where each thread use a temporary
std::set. There are a lot of allocations on these std::set so that I'm
trying to use some pool allocators namely boost::fast_pool_allocator:
using allocator = boost::fast_pool_allocator< SGroup::type >;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
But now the performances are much worse because of concurrent access to the
allocator. One crucial fact is that the sets are never communicated among the
threads so that I can use a thread local allocators. However, as shown in the
previous code, I'm not constructing allocator objects but passing template
parameters to the std::set constructor.
So here is my question: is it possible to construct multiple
boost::fast_pool_allocator to use them as thread local pool allocator ?
Edit : I removed stupid std::pair allocations.
EDIT
Mmm. I had an answer here that I pieced together from things I remembered seeing. However, upon further inspection it looks like all the allocators actually work with Singleton Pools that are never thread safe without synchronization. In fact, the null_mutex is likely in a detail namespace for this very reason: it only makes sense to use it if you know the program doesn't use threads (well, outisde the main thread) at all.
Aside from this apparent debacle, you could probably use object_pool directly. But it's not an allocator, so it wouldn't serve you for your container example.
Original Answer Text:
You can pass an allocator instance at construction:
#include <boost/pool/pool.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <boost/thread.hpp>
#include <set>
struct SGroup
{
int data;
typedef int type;
};
using allocator = boost::fast_pool_allocator<SGroup::type>;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
void thread_function()
{
allocator alloc; // thread local
set myset(set::key_compare(), alloc);
// do stuff
}
int main()
{
boost::thread_group group;
for (int i = 0; i<10; ++i)
group.create_thread(thread_function);
group.join_all();
}
Let me read the docs on how to disable thread-awareness on the allocator :)
Found it in an example:
typedef boost::fast_pool_allocator<SGroup::type,
boost::default_user_allocator_new_delete,
boost::details::pool::null_mutex> allocator;
The example in boost/libs/pool/example/time_pool_alloc.hpp should help you get started benchmarking the difference(s) in performance

Mutex not initialized

Does anyone know if when creating a mutex, it's a must to initialize it or can i lock it directly without calling pthread_mutex_init?
I have done a sample application that simulates a deadlock just to make sure the mutex work and have declared 2 mutexes(to create the deadlock) in the following way:
static pthread_mutex_t fastmutex1 = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t fastmutex2 = PTHREAD_MUTEX_INITIALIZER;
The deadlock perfectly works which makes sense since it's initialized with some defaults.
On the other hand when doing the exact same thing with this:
static pthread_mutex_t fastmutex1;
static pthread_mutex_t fastmutex2;
I expected that not to work but the deadlock appeared in the exact same way as the previous example.
By the way I am running that on Linux kernel 2.6.18
Thx for help.
According to this documentation (And everything else I've ever read or personally done with pthreads):
Mutex variables must be declared with type pthread_mutex_t, and must be initialized before they can be used.
I suspect anything else is going to trigger undefined behavior.
On my Debian/Sid/AMD64 system, /usr/include/pthread.h contains
# define PTHREAD_MUTEX_INITIALIZER \
{ { 0, 0, 0, 0, 0, 0, { 0, 0 } } }
This means that (on my system) a pthread_mutex_t is valuably initialized to all zeros. And a static variable is initialized (in C) to all zeros, which happens to be the same at runtime (and explains the behavior you've got).
However, there is no guarantee that PTHREAD_MUTEX_INITIALIZER will stay the same, or that is is all zeros on other systems. So you better explicitly initialize a static pthread_mutex_t variable with it.
PTHREAD_MUTEX_INITIALIZER is usually used for statically allocated mutexes ( so syntax 1 is the right way to go).
In various implementation defined (AIX, LINUX, SOLARIS), seems to converge on your case.
In all other cases you should by default try to initialize the mutex as in
pthread_mutex_init(&mutex,0);
which will try to initialize it to PTHREAD_MUTEX_INITIALIZER .
Keep in mind that error checking can be done ( and is actually done as such in STL) later when trying to acquire the mutex;
static int e = pthread_mutex_lock(&mutex);
if( e ) {
throw std::string("Everything is crazy in here");
}
since the return value will equal EINVAL in case the mutex has not been initialized.

Thread specific memory with language features

Are there languages that support process common memory in one address space and thread specific memory in another address space using language features rather than through a mechanism like function calls?
process int x;
thread int y;
ThreadStatic attribute in C#
The Visual C++ compiler allows the latter through the nonstandard __declspec(thread) extension - however, it is severly limited, since it isn't supported in dynamically loaded DLLs.
The first is mostly supported through an extern declaration - unless dynamically linked libraries come into play (which is probably the scenario you are looking for).
I am not aware of any environment that makes this as simple as you describe.
C++0x adds the "thread_local" storage specifier, so in namespace (or global) scope your example would be
int x; // normal process-wide global variable
thread_local int y; // per-thread global variable
You can also use thread_local with static when declaring class members or local variables in a function:
class Foo {
static thread_local int x;
};
void f() {
static thread_local int x;
}
Unfortunately, this doesn't appear to be one of the C++0x features supported by Visual Studio 2010 or planned GCC releases.

Resources