VC++ native mutex heap corruption - visual-c++

I have a native c++ library that gets used by a managed C++ application. The native library is compiled with no CLR support and the managed C++ application with it (/CLR compiler option).
When I use a std::mutex in the native library I get a heap corruption when the owning native class is deleted. The use of mutex.h is blocked by managed C++ so I'm guessing that could be part of the reason.
The minimal native class that demonstrates the issue is:
Header:
#pragma once
#include <stdio.h>
#ifndef __cplusplus_cli
#include <mutex>
#endif
namespace MyNamespace {
class SomeNativeLibrary
{
public:
SomeNativeLibrary();
~SomeNativeLibrary();
void DoSomething();
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
};
}
Implementation:
#include "SomeNativeLibrary.h"
namespace MyNamespace {
SomeNativeLibrary::SomeNativeLibrary()
{}
SomeNativeLibrary::~SomeNativeLibrary()
{}
void SomeNativeLibrary::DoSomething(){
printf("I did something.\n");
}
}
Managed C++ Console Application:
int main(array<System::String ^> ^args)
{
Console::WriteLine(L"Unit Test Console:");
MyNamespace::SomeNativeLibrary *someNativelib = new MyNamespace::SomeNativeLibrary();
someNativelib->DoSomething();
delete someNativelib;
getchar();
return 0;
}
The heap corruption debug error occurs when the attempt is made to delete the someNativeLib pointer.
Is there anything I can do to use a std::mutex safely in the native library or is there an alternative I could use? In my live code the mutex is used for is to ensure that only a single thread accesses a std::vector.

The solution was to use a CRITICAL_SECTION as the lock instead. It's actually more efficient than a mutex in my case anyway since the lock is only for threads in the same process.

Not sure were you reading your own post but there is a clue in your code:
#ifndef __cplusplus_cli
std::mutex aMutex;
#endif
Member 'aMutex' compiles only if compile condition '__cplusplus_cli' is undefined.
So the moment you included that header in Managed C++ it vanished from definition.
So your Native project and Managed project have mismatch in class definition for beginners == mostly ends in Access Violation if attempted to write to location beyond class memory (non existing member in CLI version if instantiated there).
Or just HEAP CORRUPTION in managed code to put it simply.
So what you have done is no go, ever!
But I was actually amazed that you managed to include native lib, and successfully compile both projects. I must ask how did you hack project properties to manage that. Or you just may have found yet another bug ;)
About question: for posterity
Yes CRITICAL_SECTION helps, Yes it's more faster from mutex since it is implemented in single process and some versions of it even in hardware (!). Also had plenty of changes since it's introduction and some nasty DEAD-LOCKS issues.
example: https://microsoft.public.win32.programmer.kernel.narkive.com/xS8sFPCG/criticalsection-deadlock-with-owningthread-of-zero
end up just killing entire OS. So lock only very small piece of code that actually only accesses the data, and exit locks immediately.
As replacement, you could just use plain "C" kernel mutex or events if not planning cross-platform support (Linux/iOS/Android/Win/MCU/...).
There is a ton of other replacements coming from Windows kernel.
// mutex
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexw
HANDLE hMutex = CreateMutex(NULL, TRUE, _T("MutexName"));
NOTE: mutex name rules 'Global' vs 'Local'.
// or event
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createeventw
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, _T("EventName"));
to either set, or clear/reset event state, just call SetEvent(hEvent), or ResetEvent(hEvent).
To wait for signal (set) again simply
int nret = WaitForSingleObject(hEvent, -1);
INFINITE define (-1) wait for infinity, a value which can be replaced with actual milliseconds timeout, and then return value could be evaluated against S_OK. If not S_OK it's most likely TIMEOUT.
There is a sea of synchronization functions/techniques depending on which you actually need to do !?:
https://learn.microsoft.com/en-us/windows/win32/sync/synchronization-functions
But use it with sanity check.
Since you might be just trying to fix wrong thing in the end, each solution depends on the actual problem. And the moment you think it's too complex know it's wrong solution, simplest solutions were always best, but always verify and validate.

Related

Memory leak if VCL Exception object is thrown with Message

I find it hard to believe, but code that throws a VCL Exception somehow leaks memory.
Have created a MVE to convince myself that this is really happening.
So here's a basic C++ console application, using VCL, that just repeatedly throws the same exception and tries to catch it.
#include <vcl.h>
#include <windows.h>
#pragma hdrstop
#pragma argsused
#include <tchar.h>
#include <stdio.h>
int _tmain(int argc, _TCHAR* argv[])
{
while (true){
try {
throw Exception(L"This is my Exception Message");
} catch (const Exception & e) {
}
}
return 0;
}
When you run this code outside the debugger, it leaks like a seave.
If you run this code under the debugger, it leaks at a slower rate.
If you instead pass an integer (i.e. throw Exception(42)), there is still a leak.
At this point I was hoping to avoid the complicated dance that UnicodeString performs.
The question is: why does this leak?
Have I missed something or am I using Exception the wrong way?
Found this to happen at least with XE7.
With XE11, the leak only occurs if the exception is thrown from a subroutine.
(these are the only versions available to me).
We have the JCL library installed, if that's a factor.
In my experience, exceptions often lead to destructors not being called for local variables (even for non-VCL classes). In this case, it seems destructor isn't even called for Exception class itself.
Possible solution is to update C++ Builder and stop using the classic compiler (Project Options -> C++ Compiler).
The exact memory leak in your question is reported fixed in version 10.3.0 (RSP-11916), and reported broken again (still open) in version 10.3.2 (RSP-27271). Another report says this only occurs in modern versions when optimizations are not enabled (RSP-30118).

vector::empty() function doesn't work correctly in release mode

#include<iostream>
#include<vector>
#include<thread>
#include<string>
using namespace std;
vector<string> s;
void add()
{
while(true)
{
getchar();
s.push_back("added");
}
}
void show()
{
while(true)
{
//cout<<"";
while(!s.empty())
{
cout<<(*s.begin())<<endl;
s.erase(s.begin());
}
}
}
int main()
{
thread one(add);
thread two(show);
one.join();
two.join();
}
In debug mode there is no such a problem. In release mode if the comment line is uncommented it works again. But with just like this, there is a problem. What is the problem?
std::vector (as any other std:: container) is not generally thread-safe. It means that concurrent modifying access to the same vector from multiple thread is generally not supported. What that means is that while you can call non-modifying functions of the vector from many threads at the same time (for instance, you can call begin() and end() with no problems), modification functions should have exclusive access to the vector object. To achieve this exclusivity, you need to use thread-synchronization primitives to 'signal' your intention to obtain exclusive access to the vector, perform your modification and than 'signal' that exclusive access is no longer need.
Note, this is not enough to perform that sort of routine when you modify (insert) data to the vector. You will also have to do the same dance when you read data from the vector, since modifications need exclusive access, and even the read will violate this exclusivity. The non-technical term I've used here, 'signalling', has a technical counterpart - it is called critical section. Here we say that you 'enter critical section' and 'leave critical section'.
There a more than one way to enter and leave critical section. The stapples of this are so-called mutexes, and they should be enough for your learning. Just keep in mind there are other ways as well, which you'll learn in the due course.

Thread local boost fast_pool_allocator

I've a multithreaded (Cilk) program where each thread use a temporary
std::set. There are a lot of allocations on these std::set so that I'm
trying to use some pool allocators namely boost::fast_pool_allocator:
using allocator = boost::fast_pool_allocator< SGroup::type >;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
But now the performances are much worse because of concurrent access to the
allocator. One crucial fact is that the sets are never communicated among the
threads so that I can use a thread local allocators. However, as shown in the
previous code, I'm not constructing allocator objects but passing template
parameters to the std::set constructor.
So here is my question: is it possible to construct multiple
boost::fast_pool_allocator to use them as thread local pool allocator ?
Edit : I removed stupid std::pair allocations.
EDIT
Mmm. I had an answer here that I pieced together from things I remembered seeing. However, upon further inspection it looks like all the allocators actually work with Singleton Pools that are never thread safe without synchronization. In fact, the null_mutex is likely in a detail namespace for this very reason: it only makes sense to use it if you know the program doesn't use threads (well, outisde the main thread) at all.
Aside from this apparent debacle, you could probably use object_pool directly. But it's not an allocator, so it wouldn't serve you for your container example.
Original Answer Text:
You can pass an allocator instance at construction:
#include <boost/pool/pool.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <boost/thread.hpp>
#include <set>
struct SGroup
{
int data;
typedef int type;
};
using allocator = boost::fast_pool_allocator<SGroup::type>;
using set = std::set<SGroup::type, std::less<SGroup::type>, allocator>;
void thread_function()
{
allocator alloc; // thread local
set myset(set::key_compare(), alloc);
// do stuff
}
int main()
{
boost::thread_group group;
for (int i = 0; i<10; ++i)
group.create_thread(thread_function);
group.join_all();
}
Let me read the docs on how to disable thread-awareness on the allocator :)
Found it in an example:
typedef boost::fast_pool_allocator<SGroup::type,
boost::default_user_allocator_new_delete,
boost::details::pool::null_mutex> allocator;
The example in boost/libs/pool/example/time_pool_alloc.hpp should help you get started benchmarking the difference(s) in performance

Use of Tcl in c++ multithreaded application

I am facing crash while trying to create one tcl interpreter per thread. I am using TCL version 8.5.9 on linux rh6. It crashes in different functions each time seems some kind of memory corruption. Going through net it seems a valid approach. Has anybody faced similar issue? Does multi-threaded use of Tcl need any kind of special support?
Here is the following small program causing crash with tcl version 8.5.9.
#include <tcl.h>
#include <pthread.h>
void* run (void*)
{
Tcl_Interp *interp = Tcl_CreateInterp();
sleep(1);
Tcl_DeleteInterp(interp);
}
main ()
{
pthread_t t1, t2;
pthread_create(&t1, NULL, run, NULL);
pthread_create(&t2, NULL, run, NULL);
pthread_join (t1, NULL);
pthread_join (t2, NULL);
}
The default Tcl library isn't built thread enabled. (well, not with 8.5.9 afaik, 8.6 is).
So did you check that your tcl lib was built thread enabled?
If you have a tclsh built against the lib, you can simply run:
% parray ::tcl_platform
::tcl_platform(byteOrder) = littleEndian
::tcl_platform(machine) = intel
::tcl_platform(os) = Windows NT
::tcl_platform(osVersion) = 6.2
::tcl_platform(pathSeparator) = ;
::tcl_platform(platform) = windows
::tcl_platform(pointerSize) = 4
::tcl_platform(threaded) = 1
::tcl_platform(wordSize) = 4
If ::tcl_platform(threaded) is 0, your build isn't thread enabled. You would need to build a version with thread support by passing --enable-threads to the configure script.
Did you use the correct defines to declare you want the thread enabled Macros from tcl.h?
You should add -DTCL_THREADS to your compiler invocation, otherwise the locking macros are compiled as no-ops.
You need to use a thread-enabled build of the library.
When built without thread-enabling, Tcl internally uses quite a bit of global static data in places like memory management. It's pretty pervasive. While it might be possible to eventually make things work (provided you do all the initialisation and setup within a single thread) it's going to be rather unadvisable. That things crash in strange ways in your case isn't very surprising at all.
When you use a thread-enabled build of Tcl, all that global static data is converted to either thread-specific data or to appropriate mutex-guarded global data. That then allows Tcl to be used from many threads at once. However, a particular Tcl_Interp is bound to the thread that created it (as it uses lots of thread-specific data). In your case, that will be no problem; your interpreters are happily per-thread entities.
(Well, provided you also add a call to initialise the Tcl library itself, which only needs to be done once. Put Tcl_FindExecutable(NULL); inside main() before you create any of those threads.)
Tcl 8.5 defaulted to not being thread-enabled on Unix for backward-compatibility reasons — on Windows and Mac OS X it was thread-enabled due to the different ways they handle low-level events — but this was changed in 8.6. I don't know how to get a thread-enabled build on RH6 (other than building it yourself from source, which should be straight-forward).

Double-Checked Locking Pattern in C++11?

The new machine model of C++11 allows for multi-processor systems to work reliably, wrt. to reorganization of instructions.
As Meyers and Alexandrescu pointed out the "simple" Double-Checked Locking Pattern implementation is not safe in C++03
Singleton* Singleton::instance() {
if (pInstance == 0) { // 1st test
Lock lock;
if (pInstance == 0) { // 2nd test
pInstance = new Singleton;
}
}
return pInstance;
}
They showed in their article that no matter what you do as a programmer, in C++03 the compiler has too much freedom: It is allowed to reorder the instructions in a way that you can not be sure that you end up with only one instance of Singleton.
My question is now:
Do the restrictions/definitions of the new C++11 machine model now constrain the sequence of instructions, that the above code would always work with a C++11 compiler?
How does a safe C++11-Implementation of this Singleton pattern now looks like, when using the new library facilities (instead of the mock Lock here)?
If pInstance is a regular pointer, the code has a potential data race -- operations on pointers (or any builtin type, for that matter) are not guaranteed to be atomic (EDIT: or well-ordered)
If pInstance is an std::atomic<Singleton*> and Lock internally uses an std::mutex to achieve synchronization (for example, if Lock is actually std::lock_guard<std::mutex>), the code should be data race free.
Note that you need both explicit locking and an atomic pInstance to achieve proper synchronization.
Since static variable initialization is now guaranteed to be threadsafe, the Meyer's singleton should be threadsafe.
Singleton* Singleton::instance() {
static Singleton _instance;
return &_instance;
}
Now you need to address the main problem: there is a Singleton in your code.
EDIT: based on my comment below: This implementation has a major drawback when compared to the others. What happens if the compiler doesn't support this feature? The compiler will spit out thread unsafe code without even issuing a warning. The other solutions with locks will not even compile if the compiler doesn't support the new interfaces. This might be a good reason not to rely on this feature, even for things other than singletons.
C++11 doesn't change the meaning of that implementation of double-checked locking. If you want to make double-checked locking work you need to erect suitable memory barriers/fences.

Resources