Thread specific memory with language features - multithreading

Are there languages that support process common memory in one address space and thread specific memory in another address space using language features rather than through a mechanism like function calls?
process int x;
thread int y;

ThreadStatic attribute in C#

The Visual C++ compiler allows the latter through the nonstandard __declspec(thread) extension - however, it is severly limited, since it isn't supported in dynamically loaded DLLs.
The first is mostly supported through an extern declaration - unless dynamically linked libraries come into play (which is probably the scenario you are looking for).
I am not aware of any environment that makes this as simple as you describe.

C++0x adds the "thread_local" storage specifier, so in namespace (or global) scope your example would be
int x; // normal process-wide global variable
thread_local int y; // per-thread global variable
You can also use thread_local with static when declaring class members or local variables in a function:
class Foo {
static thread_local int x;
};
void f() {
static thread_local int x;
}
Unfortunately, this doesn't appear to be one of the C++0x features supported by Visual Studio 2010 or planned GCC releases.

Related

VC++ native mutex heap corruption

I have a native c++ library that gets used by a managed C++ application. The native library is compiled with no CLR support and the managed C++ application with it (/CLR compiler option).
When I use a std::mutex in the native library I get a heap corruption when the owning native class is deleted. The use of mutex.h is blocked by managed C++ so I'm guessing that could be part of the reason.
The minimal native class that demonstrates the issue is:
Header:
#pragma once
#include <stdio.h>
#ifndef __cplusplus_cli
#include <mutex>
#endif
namespace MyNamespace {
class SomeNativeLibrary
{
public:
SomeNativeLibrary();
~SomeNativeLibrary();
void DoSomething();
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
};
}
Implementation:
#include "SomeNativeLibrary.h"
namespace MyNamespace {
SomeNativeLibrary::SomeNativeLibrary()
{}
SomeNativeLibrary::~SomeNativeLibrary()
{}
void SomeNativeLibrary::DoSomething(){
printf("I did something.\n");
}
}
Managed C++ Console Application:
int main(array<System::String ^> ^args)
{
Console::WriteLine(L"Unit Test Console:");
MyNamespace::SomeNativeLibrary *someNativelib = new MyNamespace::SomeNativeLibrary();
someNativelib->DoSomething();
delete someNativelib;
getchar();
return 0;
}
The heap corruption debug error occurs when the attempt is made to delete the someNativeLib pointer.
Is there anything I can do to use a std::mutex safely in the native library or is there an alternative I could use? In my live code the mutex is used for is to ensure that only a single thread accesses a std::vector.
The solution was to use a CRITICAL_SECTION as the lock instead. It's actually more efficient than a mutex in my case anyway since the lock is only for threads in the same process.
Not sure were you reading your own post but there is a clue in your code:
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
Member 'aMutex' compiles only if compile condition '__cplusplus_cli' is undefined.
So the moment you included that header in Managed C++ it vanished from definition.
So your Native project and Managed project have mismatch in class definition for beginners == mostly ends in Access Violation if attempted to write to location beyond class memory (non existing member in CLI version if instantiated there).
Or just HEAP CORRUPTION in managed code to put it simply.
So what you have done is no go, ever!
But I was actually amazed that you managed to include native lib, and successfully compile both projects. I must ask how did you hack project properties to manage that. Or you just may have found yet another bug ;)
About question: for posterity
Yes CRITICAL_SECTION helps, Yes it's more faster from mutex since it is implemented in single process and some versions of it even in hardware (!). Also had plenty of changes since it's introduction and some nasty DEAD-LOCKS issues.
example: https://microsoft.public.win32.programmer.kernel.narkive.com/xS8sFPCG/criticalsection-deadlock-with-owningthread-of-zero
end up just killing entire OS. So lock only very small piece of code that actually only accesses the data, and exit locks immediately.
As replacement, you could just use plain "C" kernel mutex or events if not planning cross-platform support (Linux/iOS/Android/Win/MCU/...).
There is a ton of other replacements coming from Windows kernel.
// mutex
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexw
HANDLE hMutex = CreateMutex(NULL, TRUE, _T("MutexName"));
NOTE: mutex name rules 'Global' vs 'Local'.
// or event
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createeventw
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, _T("EventName"));
to either set, or clear/reset event state, just call SetEvent(hEvent), or ResetEvent(hEvent).
To wait for signal (set) again simply
int nret = WaitForSingleObject(hEvent, -1);
INFINITE define (-1) wait for infinity, a value which can be replaced with actual milliseconds timeout, and then return value could be evaluated against S_OK. If not S_OK it's most likely TIMEOUT.
There is a sea of synchronization functions/techniques depending on which you actually need to do !?:
https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-functions
But use it with sanity check.
Since you might be just trying to fix wrong thing in the end, each solution depends on the actual problem. And the moment you think it's too complex know it's wrong solution, simplest solutions were always best, but always verify and validate.

Thread local boost fast_pool_allocator

I've a multithreaded (Cilk) program where each thread use a temporary
std::set. There are a lot of allocations on these std::set so that I'm
trying to use some pool allocators namely boost::fast_pool_allocator:
using allocator = boost::fast_pool_allocator< SGroup::type >;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
But now the performances are much worse because of concurrent access to the
allocator. One crucial fact is that the sets are never communicated among the
threads so that I can use a thread local allocators. However, as shown in the
previous code, I'm not constructing allocator objects but passing template
parameters to the std::set constructor.
So here is my question: is it possible to construct multiple
boost::fast_pool_allocator to use them as thread local pool allocator ?
Edit : I removed stupid std::pair allocations.
EDIT
Mmm. I had an answer here that I pieced together from things I remembered seeing. However, upon further inspection it looks like all the allocators actually work with Singleton Pools that are never thread safe without synchronization. In fact, the null_mutex is likely in a detail namespace for this very reason: it only makes sense to use it if you know the program doesn't use threads (well, outisde the main thread) at all.
Aside from this apparent debacle, you could probably use object_pool directly. But it's not an allocator, so it wouldn't serve you for your container example.
Original Answer Text:
You can pass an allocator instance at construction:
#include <boost/pool/pool.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <boost/thread.hpp>
#include <set>
struct SGroup
{
int data;
typedef int type;
};
using allocator = boost::fast_pool_allocator<SGroup::type>;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
void thread_function()
{
allocator alloc; // thread local
set myset(set::key_compare(), alloc);
// do stuff
}
int main()
{
boost::thread_group group;
for (int i = 0; i<10; ++i)
group.create_thread(thread_function);
group.join_all();
}
Let me read the docs on how to disable thread-awareness on the allocator :)
Found it in an example:
typedef boost::fast_pool_allocator<SGroup::type,
boost::default_user_allocator_new_delete,
boost::details::pool::null_mutex> allocator;
The example in boost/libs/pool/example/time_pool_alloc.hpp should help you get started benchmarking the difference(s) in performance

Dynamic TLS in C++11

I'm writing a C++11 class Foo, and I want to give each instance its own thread-local storage of type Bar. That is to say, I want one Bar to be allocated per thread and per Foo instance.
If I were using pthreads, Foo would have a nonstatic member of type pthread_key_t, which Foo's constructor would initialize with pthread_key_create() and Foo's destructor would free with pthread_key_delete(). Or if I were writing for Microsoft Windows only, I could do something similar with TlsAlloc() and TlsFree(). Or if I were using Boost.Thread, Foo would have a nonstatic member of type boost::thread_specific_ptr.
In reality, however, I am trying to write portable C++11. C++11's thread_local keyword does not apply to nonstatic data members. So it's fine if you want one Bar per thread, but not if you want one Bar per thread per Foo.
So as far as I can tell, I need to define a thread-local map from Foos to Bars, and then deal with the question of how to clean up appropriately whenever a Foo is destroyed. But before I undertake that, I'm posting here in the hope that someone will stop me and say "There's an easier way."
(Btw, the reason I'm not using either pthread_key_create() or boost::thread_specific_ptr is because, if I understand correctly, they assume that all threads will be spawned using pthreads or Boost.Thread respectively. I don't want to make any assumptions about how the users of my code will spawn threads.)
You would like Foo to contain a thread_local variable of type Bar. Since, as noted, thread_local cannot apply to a data member, we have to do something more indirect. The underlying behavior will be for N instances of Bar to exist for each instance of Foo, where N is the number of threads in existence.
Here is a somewhat inefficient way of doing it. With more code, it could be made faster. Basically, each Foo will contain a TLS map.
#include <unordered_map>
class Bar { ... };
class Foo {
private:
static thread_local std::unordered_map<Foo*, Bar> tls;
public:
// All internal member functions must use this too.
Bar *get_bar() {
auto I = tls.find(this);
if (I != tls.end())
return &I->second;
auto II = tls.emplace(this, Bar()); // Could use std::piecewise_construct here...
return &II->second.second;
}
};

Pthread library initialization

I have created wrappers around the pthread functions using dlopen and dlsym, in order to debug and profile some issues occurring in my application. The profiler passes all of the unit tests. Unfortunately, it appears that I have bypassed some library initialization because getenv now returns null to all input. If I remove the profiler, the proper operation of getenv returns.
I believe that the issue is that the compiler has not linked in Libpthread because it does not see any symbols requested from the library at link time. I have looked through the glib source but have not found an obvious run time initialization function that I can load and execute with dlsym.
I have stepped into getenv and found that __environ=null with the overrides compiled in. However, environ contains the proper values. __environ has the proper variables after removing the profiler.
Also, getenv appears to work with the pthread overrides on Ubuntu 10.04, with glibc 2.11. Unfortunately, upgrading is not an appealing option due to existing product distribution.
Linux 2.6.31
Glib 2.5
My init code:
inline int init_pthreads_debugger(void)
{
static int recursion=0;
if(!real_pthread_create)
{
// we know that we are single threaded here, because we override pthread_create and
// call this function. Therefore, recursion does not have to be guarded.
if(recursion)
{
return 0;
}
recursion = 1;
init_heap();
void * handle = dlopen("libpthread.so.0",RTLD_NOW);
real_pthread_cond_timedwait =(real_pthread_cond_timedwait_t)dlsym(handle,"pthread_cond_timedwait");
// more pthread initialization functions here.
//do me last to make sure any recursion in dlsym/dlopen is caught
real_pthread_create =(real_pthread_create_t)dlsym(handle,"pthread_create");
recursion = 0;
}
return 1;
}
//an example override
int pthread_cond_timedwait(pthread_cond_t *c, pthread_mutex_t * m, const struct timespec * t)
{
if(!init_pthreads_debugger()) return 0; //no thread, no sync needed.
int ret;
int condition_count;
ptd_note_unblock((void *)m,&condition_count);
ret=real_pthread_cond_timedwait(c,m,t);
ptd_note_block((void *)m,&condition_count);
return ret;
}
thanks for any help.
I have created wrappers around the pthread functions using dlopen and dlsym
I suspect that you are attempting to build a library interposer, similar to this one.
This approach is very unlikely to succeed in general for pthread functions, because both dlopen and dlsym themselves call pthread_mutex_lock and pthread_mutex_unlock (and possibly others), as does the dynamic loader itself.

Using structures to set the functions

While writing kernel modules/drivers, most of the time some structures are initialized to point to some specific functions. As a beginner in this could someone explain the importance of this.
I saw the struct file_operations while writing the character device driver
Also I found that eventhough the functions are declared they are not implemented always. Could anyone help on that too. For example, in the kernel source: kernel/dma.c, eventhough
static const struct file_operations proc_dma_operations = {
.open = proc_dma_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
are defined, only proc_dma_open is implemented.
The functions seq_read, seq_lseek and single_release are declared in the kernel source file linux-3.1.6/include/linux/seq_file.h and defined in the kernel source file linux-3.1.6/fs/seq_file.c. They are probably common to many file operations.
If you ever played with object-oriented languages like C++, think of file_operations as a base class, and your functions as being implementations of its virtual methods.
Pointers to functions are a very powerful tool in the C language that allows for real-time redirect of function calls. Most if not all operating systems have a similar mechanism, like for example the infamous INT 21 functions 25/35 in the old MS-DOS that allowed TSR programs to exist.
In C, you can assign the pointer to a function to a variable and then call that function through that variable. The function can be changed either at init time based on some parameters or at runtime based on some behavior.
Here is an example:
int fn(int a)
{
...
return a;
}
...
int (*dynamic_fn)(int);
...
dynanic_fn = &fn;
...
int i = dynamic_fn(0);
When the pointer "lives" in a structure that can be passed to system calls, this is a very powerful feature that allows hooks into system functions.
In object oriented languages, the same kind of behavior can be achieved by using reflection to instantiate classes dynamically.

Resources