I am getting the following memory leak.Its being probably caused by std::string.
how can i avoid it?
PLK: 23 bytes potentially leaked at 0xeb68278
* Suppressed in /vobs/ubtssw_brrm/test/testcases/.purify [line 3]
* This memory was allocated from:
malloc [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
operator new(unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
operator new(unsigned) [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_S_create(unsigned, unsigned, std::allocator<char> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/
x86/586/target/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_M_clone(std::allocator<char> const&, unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/tar
get/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::string<char, std::char_traits<char>, std::allocator<char>>(std::string<char, std::char_traits<char>, std::alloc
ator<char>> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
uec_UEDir::getEntryToUpdateAfterInsertion(rcapi_ImsiGsmMap const&, rcapi_ImsiGsmMap&, std::_Rb_tree_iterator<std::pair<std::string<char, std::char_traits<char>, std::allocator<
char>> const, UEDirData >>&) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:2278]
uec_UEDir::addUpdate(rcapi_ImsiGsmMap const&, LocalUEDirInfo&, rcapi_ImsiGsmMap&, int, unsigned char) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:282]
ucx_UEDirHandler::addUpdateUEDir(rcapi_ImsiGsmMap, UEDirUpdateType, acap_PresenceEvent) [/vobs/ubtssw_brrm/ucx/linux-x86/../src/ucx_UEDirHandler.cc:374]
I once had a case where Valgrind indicated I had leaks in std::string, but I couldn't see how. It turned out that I was leaking another object that held strings by value, but Valgrind correctly also caught the leaked string memory (which was the vast majority being leaked). I suspect that uec_UEDir isn't managing its strings correctly or is being leaked itself. I actually ended up finding my problem by very careful code inspection.
Related
I am using TMinuit in a loop for scanning some upper limit maps and I am running into a memory problem. The only thing which is created within the loop is the TMinuit object using "TMinuit * minuit = new TMinuit(n_params);". This is deleted at the end of the loop using "delete minuit". I used valgrind and it says something concerning Minuit (just a snippet here), but honestly, I don't understand that output. My guess was, that freeing memory is reached by "delete minuit". Obviously, that's not all.. Some suggestions? :-)
Valgrind output is here:
==17564== 46,053,008 (4,227,048 direct, 41,825,960 indirect) bytes in 25,161 blocks are definitely lost in loss record 11,738 of 11,738
==17564== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17564== by 0x52D77A8: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:330)
==17564== by 0x403601B: ???
==17564== by 0x4036064: ???
==17564== by 0x914984F: TClingCallFunc::exec(void*, void*) (TClingCallFunc.cxx:1776)
==17564== by 0x914A28F: operator() (functional:2267)
==17564== by 0x914A28F: TClingCallFunc::exec_with_valref_return(void*, cling::Value*) (TClingCallFunc.cxx:1998)
==17564== by 0x914AC58: TClingCallFunc::ExecInt(void*) (TClingCallFunc.cxx:2095)
==17564== by 0x53468A8: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:457)
==17564== by 0x17DDFE20: Execute (TMethodCall.h:136)
==17564== by 0x17DDFE20: ExecPluginImpl<int, double*, double*> (TPluginManager.h:162)
==17564== by 0x17DDFE20: ExecPlugin<int, double*, double*> (TPluginManager.h:174)
==17564== by 0x17DDFE20: TMinuit::mnplot(double*, double*, char*, int, int, int) (TMinuit.cxx:6085)
==17564== by 0x17DE3C18: TMinuit::mnscan() (TMinuit.cxx:6803)
==17564== by 0x17DF744D: TMinuit::mnexcm(char const*, double*, int, int&) (TMinuit.cxx:2977)
==17564== by 0x17DD9235: TMinuit::mncomd(char const*, int&) (TMinuit.cxx:1382)
==17564== by 0x178CA910: ULcoh(int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x178CADA4: ulmaps(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x4032084: ???
==17564== by 0x918588B: cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) [clone .part.290] [clone .constprop.445] (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A362: cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A60B: cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x9217886: cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x90FB3D9: HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) (TCling.cxx:2060)
==17564== by 0x911033D: TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (TCling.cxx:2177)
==17564== by 0x91022A2: TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) (TCling.cxx:3053)
==17564== by 0x5272649: TApplication::ExecuteFile(char const*, int*, bool) (TApplication.cxx:1157)
==17564== by 0x52735F5: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:1002)
==17564== by 0x4E4A183: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:756)
==17564== by 0x4E4B956: TRint::Run(bool) (TRint.cxx:416)
==17564== by 0x400999: main (rmain.cxx:30)
I’ve been working on trying to optimise a proprietary algorithm that sorts vectors of string codes base on an internal indexing criteria. The lengths of the codes range from 1 - 32 chars. The algorithm performed a std::swap to move the string into its new location.
std::vector<std::string> temp_container(sorting_size+1,"");
for(auto &index_seed : all_seeds)
{
for(int i=sorting_size;i>=0;--i)
{
size_t index = // new index based on seed
std::swap(temp_container[index],original[i]);
}
original.swap(temp_container);
}
std::swap is overloaded on std::string and should perform a move on the underlying string, but profiling seems to be showing the swap is performing a copy rather than a move, the report from perf shows
- 56.95% std::swap<char, std::char_traits<char>, std::allocator<char> > ▒
- 51.14% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::swap ▒
- 17.71% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local ▒
- 8.77% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_local_data ▒
- 7.49% std::pointer_traits<char const*>::pointer_to ▒
- 4.41% std::addressof<char const> ▒
1.83% std::__addressof<char const> ▒
0.94% std::__addressof<char const> ▒
3.48% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data ▒
2.12% std::pointer_traits<char const*>::pointer_to ▒
+ 4.99% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_set_length ▒
+ 3.76% __gnu_cxx::__alloc_traits<std::allocator<char>, char>::_S_on_swap ▒
3.62% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::length ▒
3.10% std::char_traits<char>::copy ▒
3.09% __memmove_avx_unaligned_erms ▒
2.59% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_local_data ▒
1.48% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_length ▒
1.01% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_get_allocator ▒
0.58% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data ▒
1.21% std::char_traits<char>::copy ▒
0.90% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_set_length ▒
0.90% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::length ▒
0.70% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local ▒
0.69% __gnu_cxx::__alloc_traits<std::allocator<char>, char>::_S_on_swap ▒
0.67% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_get_allocator ▒
0.53% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_length
I have tried to use the std::string swap method and forcing a move ie
// tried this
temp_container[index].swap(original[i]);
// and this
temp_container[index] = std::move(original[i]);
But they made no difference benchmarking consistently shows the algorithm taking over 2secs to sort approximately 400k code strings.
Changing the vectors to use string_views initially gave much better results
std::vector<std::string_view> temp_container(sorting_size+1,"");
for(auto &index_seed : all_seeds)
{
for(int i=sorting_size;i>=0;--i)
{
size_t index = // new index based on seed
// original is now also vector of string_view
std::swap(temp_container[index],original[i]);
}
original.swap(temp_container);
}
Benchmarking shows the string_view version performing the same sort in under 1 sec.
However when i try compiling the algorithm with optimisation -O{1,2 or 3} the string_view version is two times slower than the std::string version.
Checking for cache misses
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./CodeSortingAlgo
for the std::string non optimised version:
2,756.98 msec task-clock # 1.147 CPUs utilized
8,024,349,488 cycles # 2.911 GHz
13,979,212,612 instructions # 1.74 insn per cycle
3,560,486 cache-references # 1.291 M/sec
1,881,425 cache-misses # 52.842 % of all cache refs
2.404464923 seconds time elapsed
2.734367000 seconds user
0.024020000 seconds sys
and for the string_view version:
1,266.53 msec task-clock # 1.354 CPUs utilized
3,586,135,363 cycles # 2.831 GHz
4,780,766,035 instructions # 1.33 insn per cycle
7,747,467 cache-references # 6.117 M/sec
6,172,017 cache-misses # 79.665 % of all cache refs
0.935125202 seconds time elapsed
1.243645000 seconds user
0.023916000 seconds sys
Running the same for the version of algo compiled with -O2
std::string version:
281.92 msec task-clock # 1.214 CPUs utilized
794,130,273 cycles # 2.817 GHz
1,166,108,846 instructions # 1.47 insn per cycle
18,772,676 cache-references # 66.589 M/sec
3,797,519 cache-misses # 20.229 % of all cache refs
0.232186807 seconds time elapsed
0.258991000 seconds user
0.023906000 seconds sys
string_view version
393.30 msec task-clock # 1.137 CPUs utilized
1,124,532,667 cycles # 2.859 GHz
609,130,643 instructions # 0.54 insn per cycle
12,675,992 cache-references # 32.230 M/sec
6,753,795 cache-misses # 53.280 % of all cache refs
0.345989809 seconds time elapsed
0.366555000 seconds user
0.027590000 seconds sys
Why is std::swap and std::string::swap copying rather then moving, am i missing something? Have i read the perf report wrong?
Why is the cache performance of std::string_view so bad? Is it because rather than swapping the len and ptr its following the ptr then swapping?
Why is the optimiser not able to optimise the string_view version to the same level the string version.
compiler is gcc version 8.3.0
Previously I had made a Card.IO binding manually. It was compiling, so was the project that used it, but it would crash too often.
Now I'm trying to recreate the binding using ObjectiveSharpie from scratch. The binding project compiles but when I reference it from another project I get compiler errors shown below.
It was showing AVFoundation framework items as "Undefined symbols..." so under IOS Build options in Xamarin I have "-cxx" option. I tried various combinations of ways to add frameworks to the project:
- modified [assembly: LinkWith(..., Frameworks="...")] in CardIO binding project. This results in unrecognized CardIO namespace in my main project
- added -gcc_flags "-framework ...". This results in "framework not found" message from the compiler
I finally resolved the AVFoundation by including a using AVFoundation, then requesting the type name of one if its classes so the linker doesn't optimize it away - ugly hack in my book.
In the error below it looks like there are more frameworks missing, on top of that I believe the std::... ones should have been resolved by using the "-cxx" parameter
I'm out of ideas how to make this binding compile and work properly.
Undefined symbols for architecture armv7:
"_AudioServicesPlayAlertSound", referenced from:
-[CardIOCameraViewController vibrate] in libCardIO.a(CardIOCameraViewController.o)
"_CMGetAttachment", referenced from:
-[CardIOVideoStream captureOutput:didOutputSampleBuffer:fromConnection:] in libCardIO.a(CardIOVideoStream.o)
"_CMSampleBufferGetImageBuffer", referenced from:
-[CardIOVideoFrame process] in libCardIO.a(CardIOVideoFrame.o)
"_CVPixelBufferGetBaseAddressOfPlane", referenced from:
+[CardIOIplImage imageFromYCbCrBuffer:plane:] in libCardIO.a(CardIOIplImage.o)
"_CVPixelBufferGetBytesPerRowOfPlane", referenced from:
+[CardIOIplImage imageFromYCbCrBuffer:plane:] in libCardIO.a(CardIOIplImage.o)
"_CVPixelBufferGetHeightOfPlane", referenced from:
+[CardIOIplImage imageFromYCbCrBuffer:plane:] in libCardIO.a(CardIOIplImage.o)
"_CVPixelBufferGetWidthOfPlane", referenced from:
+[CardIOIplImage imageFromYCbCrBuffer:plane:] in libCardIO.a(CardIOIplImage.o)
"_CVPixelBufferLockBaseAddress", referenced from:
-[CardIOVideoFrame process] in libCardIO.a(CardIOVideoFrame.o)
"_CVPixelBufferUnlockBaseAddress", referenced from:
-[CardIOVideoFrame process] in libCardIO.a(CardIOVideoFrame.o)
"_OBJC_CLASS_$_EAGLContext", referenced from:
objc-class-ref in libCardIO.a(CardIOGPURenderer.o)
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::find(char const*, unsigned long, unsigned long) const", referenced from:
cv::CommandLineParser::CommandLineParser(int, char const* const*, char const*)in libCardIO.a(cmdparser.o)
(anonymous namespace)::split_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)in libCardIO.a(cmdparser.o)
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::find(char, unsigned long) const", referenced from:
cv::CommandLineParser::CommandLineParser(int, char const* const*, char const*)in libCardIO.a(cmdparser.o)
(anonymous namespace)::del_space(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)in libCardIO.a(cmdparser.o)
cv::CommandLineParser::printParams() in libCardIO.a(cmdparser.o)
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::rfind(char, unsigned long) const", referenced from:
(anonymous namespace)::del_space(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)in libCardIO.a(cmdparser.o)
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::compare(char const*) const", referenced from:
cv::CommandLineParser::CommandLineParser(int, char const* const*, char const*)in libCardIO.a(cmdparser.o)
cv::CommandLineParser::printParams() in libCardIO.a(cmdparser.o)
bool cv::CommandLineParser::get<bool>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool)in libCardIO.a(cmdparser.o)
"std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
This was an ugly one, but an easy fix in the end. In the target's build settings, in "Other Linker Flags," I have the following:
-lstdc++ -ObjC -lc++
That is, libCardIO is requiring both -lstdc++ and -lc++.
Also, make sure you have set "Link With Standard Libraries" to "Yes".
After a lot of guesswork, trial-error and searching for each particular error in the output I came up with a solution that works, add this attribute to the .linkwith.cs file in the binding project:
[assembly: LinkWith ("libCardIO.a", IsCxx=true, LinkTarget= LinkTarget.ArmV7 | LinkTarget.ArmV7s | LinkTarget.Simulator, ForceLoad = true
,Frameworks = "AVFoundation AudioToolbox CoreMedia CoreVideo OpenGLES MobileCoreServices"
,LinkerFlags = "-ObjC -lc++")]
I had assumed that adding
-cxx -gcc_flags "-lstdc++"
to the main project's iOS build should do the same thing but that wasn't the case, with or without these additional compiler parameters the binding wasn't building.
If you need a copy of the binding project just ask.
if i have an ELF file, how can i get each functions imported from a library file ".so" , displaying that shared library associated with the function ?
This works nicely for me:
nm -uC test
E.g. on the code from this other answer I just wrote:
g++ -O0 -I ~/custom/boost/ test.cpp -o test
nm -uC test
The output is
w _Jv_RegisterClasses
U _Unwind_Resume##GCC_3.0
U std::string::compare(std::string const&) const##GLIBCXX_3.4
U std::allocator<char>::allocator()##GLIBCXX_3.4
U std::allocator<char>::~allocator()##GLIBCXX_3.4
U std::ostream::operator<<(std::ostream& (*)(std::ostream&))##GLIBCXX_3.4
U std::ostream::operator<<(int)##GLIBCXX_3.4
U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)##GLIBCXX_3.4
U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)##GLIBCXX_3.4
U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()##GLIBCXX_3.4
U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()##GLIBCXX_3.4
U std::ios_base::Init::Init()##GLIBCXX_3.4
U std::ios_base::Init::~Init()##GLIBCXX_3.4
U std::__throw_bad_alloc()##GLIBCXX_3.4
U std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)##GLIBCXX_3.4
U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)##GLIBCXX_3.4
U std::_Rb_tree_increment(std::_Rb_tree_node_base const*)##GLIBCXX_3.4
U std::_Rb_tree_increment(std::_Rb_tree_node_base*)##GLIBCXX_3.4
U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)##GLIBCXX_3.4
U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)##GLIBCXX_3.4
U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)##GLIBCXX_3.4
U operator delete(void*)##GLIBCXX_3.4
U operator new(unsigned long)##GLIBCXX_3.4
U __cxa_atexit##GLIBC_2.2.5
U __cxa_begin_catch##CXXABI_1.3
U __cxa_end_catch##CXXABI_1.3
U __cxa_rethrow##CXXABI_1.3
w __gmon_start__
U __gxx_personality_v0##CXXABI_1.3
U __libc_start_main##GLIBC_2.2.5
U memmove##GLIBC_2.2.5
w pthread_cancel
I'm somewhat aware of the deficiency that this doesn't say which shared object should fulfil the dependency, but I guess a little join on the output of nm for those libraries should take you a long way.
Drop the -C flag to prevent name demangling. This could be highly effective if you intend to do a crossreference on the data. Use c++filt to demangle names later, in case you want to present the names in user-friendly fashion
I have a video retrieval system which cosumes a lot of memory during retrieval process. I know tbb scalable allocator releases the freed memory to a memory pool and does not return it to the OS. Does this mean the pool will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust?
I am using 2 machine of 24 cores,47G memory. My programme has 24 thread and each thread handle one retrieval task and use tbb scalable allocator for memory allocation but it still get bad alloc exception. I also used valgrind to detect memory leaks and find the report below which seems it only has "still reachable" problems caused by tbb scalable allocator and no other memory leaks. Can anybody show me how to solve this problem?
==1224== HEAP SUMMARY:
==1224== in use at exit: 147,456 bytes in 9 blocks
==1224== total heap usage: 10 allocs, 1 frees, 148,480 bytes allocated
==1224==
==1224== Thread 1:
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 1 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458ABC: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:783)
==1224==
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 2 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2A690: rml::internal::mallocLargeObject(rml::internal::ExtMemoryPool*, unsigned long, unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x4553AC: __gnu_cxx::new_allocator<s_Keypoint*>::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x4553D4: std::_Vector_base<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_allocate(unsigned long) (stl_vector.h:127)
==1224== by 0x455C33: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<s_Keypoint**, std::vector<s_Keypoint*, std::allocator<s_Keypoint*> > >, s_Keypoint* const&) (vector.tcc:275)
==1224== by 0x455E87: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::push_back(s_Keypoint* const&) (stl_vector.h:610)
==1224== by 0x45711C: DirectHash::getNeighbors1(std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >&, unsigned int) (directhash.cpp:157)
==1224==
==1224== 49,152 bytes in 3 blocks are still reachable in loss record 3 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458A42: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:776)
==1224==
==1224== 65,536 bytes in 4 blocks are still reachable in loss record 4 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x453A97: readKeysFromFile(char const*, int) (keypoint.cpp:329)
==1224== by 0x45D929: KeypointDB::Add(char const*) (keypointdb.cpp:201)
==1224== by 0x44A264: MRSystem::MRServer::AddFingerPrint(std::string) (mrserver.cpp:68)
==1224== by 0x445D68: MRSystem::Slave::ConstructHashTable() (Slave.cpp:242)
==1224==
==1224== LEAK SUMMARY:
==1224== definitely lost: 0 bytes in 0 blocks
==1224== indirectly lost: 0 bytes in 0 blocks
==1224== possibly lost: 0 bytes in 0 blocks
==1224== still reachable: 147,456 bytes in 9 blocks
==1224== suppressed: 0 bytes in 0 blocks
==1224==
==1224== For counts of detected and suppressed errors, rerun with: -v
==1224== Use --track-origins=yes to see where uninitialised values come from
==1224== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 4 from 4)
In versions prior to 4.0, memory blocks used by tbbmalloc to allocate "small" (<8K) objects were only available for reuse by the thread that requested it from OS. So the situation described in the question - "a thread will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust" - was possible.
Since v4.0 (released in 2011), the TBB memory allocator may return all memory back to the OS. So the described problem is not relevant anymore. If someone still uses an older version of tbbmalloc and experiences the described problem, the solution is to upgrade the allocator.