Undefined behaviour in NVidia's vulkan driver for linux? - linux

I have the VK_LAYER_KHRONOS_validation layer activated and don't get any validation errors, although when I try running my app under Valgrind I get the following messages.
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E273BAF: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x2193CB74: DispatchCmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (layer_chassis_dispatch.cpp:3336)
==119404== by 0x218B389F: vulkan_layer_chassis::CmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (chassis.cpp:3533)
==119404== by 0x5206EC: ImGui_ImplVulkan_CreateFontsTexture(VkCommandBuffer_T*) (imgui_impl_vulkan.cpp:683)
==119404== by 0x4DE854: Renderer::UploadImguiImages() (Renderer.cpp:169)
==119404== by 0x494E47: main (main.cpp:437)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E273DF7: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x2193CB74: DispatchCmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (layer_chassis_dispatch.cpp:3336)
==119404== by 0x218B389F: vulkan_layer_chassis::CmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (chassis.cpp:3533)
==119404== by 0x4FD6A7: Util::TransitionImageLayout(VkCommandBuffer_T*, VkImage_T*, VkImageLayout, VkImageLayout) (Utilities.cpp:136)
==119404== by 0x4D85B4: Image::SendToGPU(VkDevice_T*, PhysicalDeviceInfo const*, VkCommandBuffer_T*) (Image.cpp:214)
==119404== by 0x4E0C9E: Renderer::SendDataToGPU(entt::basic_registry<entt::entity>*, VkCommandBuffer_T*) (Renderer.cpp:575)
==119404== by 0x4E134D: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:637)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E220970: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E227314: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218BA58C: UnknownInlinedFun (layer_chassis_dispatch.cpp:3463)
==119404== by 0x218BA58C: vulkan_layer_chassis::CmdBeginRenderPass(VkCommandBuffer_T*, VkRenderPassBeginInfo const*, VkSubpassContents) (chassis.cpp:3698)
==119404== by 0x4E14DB: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:656)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E273BAF: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E221561: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E225127: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22184E: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E27167D: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E28EB9F: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E28ECF4: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E273BF4: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E221561: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E225127: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22184E: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E220970: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22185B: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
Is this undefined behaviour in NVidia's driver? or is there something else I need to check for?
Update: I made an MCVE...
#include <iostream>
#include <vulkan/vulkan.h>
int main(int argc, char** argv) {
VkApplicationInfo app_info = {};
app_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
app_info.apiVersion = VK_MAKE_API_VERSION(0, 1, 3, 224);
VkInstanceCreateInfo instance_create_info = {};
instance_create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
instance_create_info.pApplicationInfo = &app_info;
VkInstance instance = nullptr;
vkCreateInstance(&instance_create_info, nullptr, &instance);
vkDestroyInstance(instance, nullptr);
std::cout << "done\n";
}
==215482== LEAK SUMMARY:
==215482== definitely lost: 81,640 bytes in 36 blocks
==215482== indirectly lost: 183,467 bytes in 1,170 blocks
==215482== possibly lost: 0 bytes in 0 blocks
==215482== still reachable: 165,087 bytes in 2,123 blocks
==215482== suppressed: 0 bytes in 0 blocks

Related

Node js crashes with native exception

I am getting unexpected exits (exit code 1) - and stalls ie promises never fulfilled - in a node application. Most of the times this happens, there is no error visible, however, I have seen the following stack trace on one occasion:
#
# Fatal error in , line 0
# Check failed: fixed_size_above_fp + (stack_slots * kSystemPointerSize) - CommonFrameConstants::kFixedFrameSizeAboveFp + outgoing_size == result.
#
#
#
#FailureMessage Object: 0x7ffeefbfc9e0
1: 0x100120e62 node::NodePlatform::GetStackTracePrinter()::$_3::__invoke() [/usr/local/bin/node]
2: 0x10103af53 V8_Fatal(char const*, ...) [/usr/local/bin/node]
3: 0x10030a633 v8::internal::Deoptimizer::Deoptimizer(v8::internal::Isolate*, v8::internal::JSFunction, v8::internal::DeoptimizeKind, unsigned int, unsigned long, int) [/usr/local/bin/node]
4: 0x10030839e v8::internal::Deoptimizer::New(unsigned long, v8::internal::DeoptimizeKind, unsigned int, unsigned long, int, v8::internal::Isolate*) [/usr/local/bin/node]
5: 0x34a8fad8216d
6: 0x1006c5e23 v8::internal::NativeRegExpMacroAssembler::Execute(v8::internal::String, int, unsigned char const*, unsigned char const*, int*, int, v8::internal::Isolate*, v8::internal::JSRegExp) [/usr/local/bin/node]
7: 0x1006c5d66 v8::internal::NativeRegExpMacroAssembler::Match(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, int*, int, int, v8::internal::Isolate*) [/usr/local/bin/node]
8: 0x1006d22ad v8::internal::RegExpImpl::IrregexpExecRaw(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, int, int*, int) [/usr/local/bin/node]
9: 0x1006d2967 v8::internal::RegExpGlobalCache::FetchNext() [/usr/local/bin/node]
10: 0x100725666 v8::internal::Runtime_RegExpExecMultiple(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0x100a81fb9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]
12: 0x100ad1c5a Builtins_RegExpReplace [/usr/local/bin/node]
13: 0x100a72e52 Builtins_StringPrototypeReplace [/usr/local/bin/node]
I'm not sure quite what to make of this, as the exits appear to happen at different points. Does the above seem likely to be a memory problem?

How to Set the Number of Threads on PyTorch Hosted on AWS Lambda

I'm trying to set the number of threads via torch.set_num_threads(multiprocessing.cpu_count()) to speed up inference on AWS Lambda. However, it gives me the following stack trace.
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse both lists of possible and present processors 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.801-07:00 /var/lang/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.801-07:00 warn(f"Failed to load image Python extension: {e}") 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00
Copy
terminate called after throwing an instance of 'c10::Error'
terminate called after throwing an instance of 'c10::Error' 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 what(): [enforce fail at ThreadPool.cc:44] cpuinfo_initialize(). cpuinfo initialization failed 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0x7faf9dc2a0 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00
Copy
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0x7faf9dc440 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0x7faf9dc440 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #2: <unknown function> + 0x1d6cb7c (0x7fb17abb7c in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #3: <unknown function> + 0x1d6fa34 (0x7fb17aea34 in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #4: at::set_num_threads(int) + 0x2c (0x7fb02d082c in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #5: <unknown function> + 0x498864 (0x7fb52ff864 in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 <omitting python frames> 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 Fatal Python error: Aborted
Does anyone know how to fix this? For context, I'm deploying a docker image that runs the PyTorch model but the one line I linked above results in this error.
I expected that the program would speed up. Instead, it crashed without a helpful stack trace.

Investigating memory leak while using TMinuit ROOT with valgrind

I am using TMinuit in a loop for scanning some upper limit maps and I am running into a memory problem. The only thing which is created within the loop is the TMinuit object using "TMinuit * minuit = new TMinuit(n_params);". This is deleted at the end of the loop using "delete minuit". I used valgrind and it says something concerning Minuit (just a snippet here), but honestly, I don't understand that output. My guess was, that freeing memory is reached by "delete minuit". Obviously, that's not all.. Some suggestions? :-)
Valgrind output is here:
==17564== 46,053,008 (4,227,048 direct, 41,825,960 indirect) bytes in 25,161 blocks are definitely lost in loss record 11,738 of 11,738
==17564== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17564== by 0x52D77A8: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:330)
==17564== by 0x403601B: ???
==17564== by 0x4036064: ???
==17564== by 0x914984F: TClingCallFunc::exec(void*, void*) (TClingCallFunc.cxx:1776)
==17564== by 0x914A28F: operator() (functional:2267)
==17564== by 0x914A28F: TClingCallFunc::exec_with_valref_return(void*, cling::Value*) (TClingCallFunc.cxx:1998)
==17564== by 0x914AC58: TClingCallFunc::ExecInt(void*) (TClingCallFunc.cxx:2095)
==17564== by 0x53468A8: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:457)
==17564== by 0x17DDFE20: Execute (TMethodCall.h:136)
==17564== by 0x17DDFE20: ExecPluginImpl<int, double*, double*> (TPluginManager.h:162)
==17564== by 0x17DDFE20: ExecPlugin<int, double*, double*> (TPluginManager.h:174)
==17564== by 0x17DDFE20: TMinuit::mnplot(double*, double*, char*, int, int, int) (TMinuit.cxx:6085)
==17564== by 0x17DE3C18: TMinuit::mnscan() (TMinuit.cxx:6803)
==17564== by 0x17DF744D: TMinuit::mnexcm(char const*, double*, int, int&) (TMinuit.cxx:2977)
==17564== by 0x17DD9235: TMinuit::mncomd(char const*, int&) (TMinuit.cxx:1382)
==17564== by 0x178CA910: ULcoh(int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x178CADA4: ulmaps(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x4032084: ???
==17564== by 0x918588B: cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) [clone .part.290] [clone .constprop.445] (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A362: cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A60B: cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x9217886: cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x90FB3D9: HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) (TCling.cxx:2060)
==17564== by 0x911033D: TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (TCling.cxx:2177)
==17564== by 0x91022A2: TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) (TCling.cxx:3053)
==17564== by 0x5272649: TApplication::ExecuteFile(char const*, int*, bool) (TApplication.cxx:1157)
==17564== by 0x52735F5: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:1002)
==17564== by 0x4E4A183: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:756)
==17564== by 0x4E4B956: TRint::Run(bool) (TRint.cxx:416)
==17564== by 0x400999: main (rmain.cxx:30)

Does TBB scalable allocator emphasize memory fragmentation?

I have a video retrieval system which cosumes a lot of memory during retrieval process. I know tbb scalable allocator releases the freed memory to a memory pool and does not return it to the OS. Does this mean the pool will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust?
I am using 2 machine of 24 cores,47G memory. My programme has 24 thread and each thread handle one retrieval task and use tbb scalable allocator for memory allocation but it still get bad alloc exception. I also used valgrind to detect memory leaks and find the report below which seems it only has "still reachable" problems caused by tbb scalable allocator and no other memory leaks. Can anybody show me how to solve this problem?
==1224== HEAP SUMMARY:
==1224== in use at exit: 147,456 bytes in 9 blocks
==1224== total heap usage: 10 allocs, 1 frees, 148,480 bytes allocated
==1224==
==1224== Thread 1:
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 1 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458ABC: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:783)
==1224==
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 2 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2A690: rml::internal::mallocLargeObject(rml::internal::ExtMemoryPool*, unsigned long, unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x4553AC: __gnu_cxx::new_allocator<s_Keypoint*>::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x4553D4: std::_Vector_base<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_allocate(unsigned long) (stl_vector.h:127)
==1224== by 0x455C33: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<s_Keypoint**, std::vector<s_Keypoint*, std::allocator<s_Keypoint*> > >, s_Keypoint* const&) (vector.tcc:275)
==1224== by 0x455E87: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::push_back(s_Keypoint* const&) (stl_vector.h:610)
==1224== by 0x45711C: DirectHash::getNeighbors1(std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >&, unsigned int) (directhash.cpp:157)
==1224==
==1224== 49,152 bytes in 3 blocks are still reachable in loss record 3 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458A42: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:776)
==1224==
==1224== 65,536 bytes in 4 blocks are still reachable in loss record 4 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x453A97: readKeysFromFile(char const*, int) (keypoint.cpp:329)
==1224== by 0x45D929: KeypointDB::Add(char const*) (keypointdb.cpp:201)
==1224== by 0x44A264: MRSystem::MRServer::AddFingerPrint(std::string) (mrserver.cpp:68)
==1224== by 0x445D68: MRSystem::Slave::ConstructHashTable() (Slave.cpp:242)
==1224==
==1224== LEAK SUMMARY:
==1224== definitely lost: 0 bytes in 0 blocks
==1224== indirectly lost: 0 bytes in 0 blocks
==1224== possibly lost: 0 bytes in 0 blocks
==1224== still reachable: 147,456 bytes in 9 blocks
==1224== suppressed: 0 bytes in 0 blocks
==1224==
==1224== For counts of detected and suppressed errors, rerun with: -v
==1224== Use --track-origins=yes to see where uninitialised values come from
==1224== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 4 from 4)
In versions prior to 4.0, memory blocks used by tbbmalloc to allocate "small" (<8K) objects were only available for reuse by the thread that requested it from OS. So the situation described in the question - "a thread will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust" - was possible.
Since v4.0 (released in 2011), the TBB memory allocator may return all memory back to the OS. So the described problem is not relevant anymore. If someone still uses an older version of tbbmalloc and experiences the described problem, the solution is to upgrade the allocator.

How to avoid the following purify detected memory leak in C++?

I am getting the following memory leak.Its being probably caused by std::string.
how can i avoid it?
PLK: 23 bytes potentially leaked at 0xeb68278
* Suppressed in /vobs/ubtssw_brrm/test/testcases/.purify [line 3]
* This memory was allocated from:
malloc [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
operator new(unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
operator new(unsigned) [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_S_create(unsigned, unsigned, std::allocator<char> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/
x86/586/target/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_M_clone(std::allocator<char> const&, unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/tar
get/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::string<char, std::char_traits<char>, std::allocator<char>>(std::string<char, std::char_traits<char>, std::alloc
ator<char>> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
uec_UEDir::getEntryToUpdateAfterInsertion(rcapi_ImsiGsmMap const&, rcapi_ImsiGsmMap&, std::_Rb_tree_iterator<std::pair<std::string<char, std::char_traits<char>, std::allocator<
char>> const, UEDirData >>&) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:2278]
uec_UEDir::addUpdate(rcapi_ImsiGsmMap const&, LocalUEDirInfo&, rcapi_ImsiGsmMap&, int, unsigned char) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:282]
ucx_UEDirHandler::addUpdateUEDir(rcapi_ImsiGsmMap, UEDirUpdateType, acap_PresenceEvent) [/vobs/ubtssw_brrm/ucx/linux-x86/../src/ucx_UEDirHandler.cc:374]
I once had a case where Valgrind indicated I had leaks in std::string, but I couldn't see how. It turned out that I was leaking another object that held strings by value, but Valgrind correctly also caught the leaked string memory (which was the vast majority being leaked). I suspect that uec_UEDir isn't managing its strings correctly or is being leaked itself. I actually ended up finding my problem by very careful code inspection.

Resources