Does TBB scalable allocator emphasize memory fragmentation?

Does TBB scalable allocator emphasize memory fragmentation? - multithreading

I have a video retrieval system which cosumes a lot of memory during retrieval process. I know tbb scalable allocator releases the freed memory to a memory pool and does not return it to the OS. Does this mean the pool will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust?
I am using 2 machine of 24 cores,47G memory. My programme has 24 thread and each thread handle one retrieval task and use tbb scalable allocator for memory allocation but it still get bad alloc exception. I also used valgrind to detect memory leaks and find the report below which seems it only has "still reachable" problems caused by tbb scalable allocator and no other memory leaks. Can anybody show me how to solve this problem?
==1224== HEAP SUMMARY:
==1224== in use at exit: 147,456 bytes in 9 blocks
==1224== total heap usage: 10 allocs, 1 frees, 148,480 bytes allocated
==1224==
==1224== Thread 1:
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 1 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458ABC: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:783)
==1224==
==1224== 16,384 bytes in 1 blocks are still reachable in loss record 2 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2A690: rml::internal::mallocLargeObject(rml::internal::ExtMemoryPool*, unsigned long, unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x4553AC: __gnu_cxx::new_allocator<s_Keypoint*>::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x4553D4: std::_Vector_base<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_allocate(unsigned long) (stl_vector.h:127)
==1224== by 0x455C33: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<s_Keypoint**, std::vector<s_Keypoint*, std::allocator<s_Keypoint*> > >, s_Keypoint* const&) (vector.tcc:275)
==1224== by 0x455E87: std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >::push_back(s_Keypoint* const&) (stl_vector.h:610)
==1224== by 0x45711C: DirectHash::getNeighbors1(std::vector<s_Keypoint*, std::allocator<s_Keypoint*> >&, unsigned int) (directhash.cpp:157)
==1224==
==1224== 49,152 bytes in 3 blocks are still reachable in loss record 3 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x458922: __gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<unsigned int const, s_Keypoint*> > >::allocate(unsigned long, void const*) (new_allocator.h:88)
==1224== by 0x458947: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_get_node() (hashtable.h:297)
==1224== by 0x458963: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::_M_new_node(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:605)
==1224== by 0x458A42: __gnu_cxx::hashtable<std::pair<unsigned int const, s_Keypoint*>, unsigned int, __gnu_cxx::hash<unsigned int>, std::_Select1st<std::pair<unsigned int const, s_Keypoint*> >, std::equal_to<unsigned int>, std::allocator<s_Keypoint*> >::insert_equal_noresize(std::pair<unsigned int const, s_Keypoint*> const&) (hashtable.h:776)
==1224==
==1224== 65,536 bytes in 4 blocks are still reachable in loss record 4 of 4
==1224== at 0x4A0610C: malloc (vg_replace_malloc.c:195)
==1224== by 0x4E285C6: rml::internal::getRawMemory(unsigned long, bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AB2B: rml::internal::BackRefMaster::findFreeBlock() (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E2AE49: rml::internal::BackRefIdx::newBackRef(bool) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E26C49: rml::internal::MemoryPool::getEmptyBlock(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27676: rml::internal::internalPoolMalloc(rml::MemoryPool*, unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4E27825: scalable_malloc (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc.so.2)
==1224== by 0x4C21278: operator new(unsigned long) (in /home/is_admin/tbb40_233oss/build/linux_intel64_gcc_cc4.1.2_libc2.5_kernel2.6.18_release/libtbbmalloc_proxy.so.2)
==1224== by 0x453A97: readKeysFromFile(char const*, int) (keypoint.cpp:329)
==1224== by 0x45D929: KeypointDB::Add(char const*) (keypointdb.cpp:201)
==1224== by 0x44A264: MRSystem::MRServer::AddFingerPrint(std::string) (mrserver.cpp:68)
==1224== by 0x445D68: MRSystem::Slave::ConstructHashTable() (Slave.cpp:242)
==1224==
==1224== LEAK SUMMARY:
==1224== definitely lost: 0 bytes in 0 blocks
==1224== indirectly lost: 0 bytes in 0 blocks
==1224== possibly lost: 0 bytes in 0 blocks
==1224== still reachable: 147,456 bytes in 9 blocks
==1224== suppressed: 0 bytes in 0 blocks
==1224==
==1224== For counts of detected and suppressed errors, rerun with: -v
==1224== Use --track-origins=yes to see where uninitialised values come from
==1224== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 4 from 4)

In versions prior to 4.0, memory blocks used by tbbmalloc to allocate "small" (<8K) objects were only available for reuse by the thread that requested it from OS. So the situation described in the question - "a thread will have those previous allocated memory in its pool all the time and when other threads need memory it may cause a memory exhaust" - was possible.
Since v4.0 (released in 2011), the TBB memory allocator may return all memory back to the OS. So the described problem is not relevant anymore. If someone still uses an older version of tbbmalloc and experiences the described problem, the solution is to upgrade the allocator.

Related

Undefined behaviour in NVidia's vulkan driver for linux?

I have the VK_LAYER_KHRONOS_validation layer activated and don't get any validation errors, although when I try running my app under Valgrind I get the following messages.
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E273BAF: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x2193CB74: DispatchCmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (layer_chassis_dispatch.cpp:3336)
==119404== by 0x218B389F: vulkan_layer_chassis::CmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (chassis.cpp:3533)
==119404== by 0x5206EC: ImGui_ImplVulkan_CreateFontsTexture(VkCommandBuffer_T*) (imgui_impl_vulkan.cpp:683)
==119404== by 0x4DE854: Renderer::UploadImguiImages() (Renderer.cpp:169)
==119404== by 0x494E47: main (main.cpp:437)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E273DF7: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x2193CB74: DispatchCmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (layer_chassis_dispatch.cpp:3336)
==119404== by 0x218B389F: vulkan_layer_chassis::CmdPipelineBarrier(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, unsigned int, VkMemoryBarrier const*, unsigned int, VkBufferMemoryBarrier const*, unsigned int, VkImageMemoryBarrier const*) (chassis.cpp:3533)
==119404== by 0x4FD6A7: Util::TransitionImageLayout(VkCommandBuffer_T*, VkImage_T*, VkImageLayout, VkImageLayout) (Utilities.cpp:136)
==119404== by 0x4D85B4: Image::SendToGPU(VkDevice_T*, PhysicalDeviceInfo const*, VkCommandBuffer_T*) (Image.cpp:214)
==119404== by 0x4E0C9E: Renderer::SendDataToGPU(entt::basic_registry<entt::entity>*, VkCommandBuffer_T*) (Renderer.cpp:575)
==119404== by 0x4E134D: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:637)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E220970: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E227314: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218BA58C: UnknownInlinedFun (layer_chassis_dispatch.cpp:3463)
==119404== by 0x218BA58C: vulkan_layer_chassis::CmdBeginRenderPass(VkCommandBuffer_T*, VkRenderPassBeginInfo const*, VkSubpassContents) (chassis.cpp:3698)
==119404== by 0x4E14DB: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:656)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E273BAF: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E221561: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E225127: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22184E: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E27167D: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E28EB9F: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E28ECF4: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E273BF4: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E274182: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E221561: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E225127: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22184E: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
==119404== Conditional jump or move depends on uninitialised value(s)
==119404== at 0x1E26FF73: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E220970: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x1E22185B: ??? (in /usr/lib64/libnvidia-glcore.so.515.57)
==119404== by 0x218B5B24: UnknownInlinedFun (layer_chassis_dispatch.cpp:3480)
==119404== by 0x218B5B24: vulkan_layer_chassis::CmdEndRenderPass(VkCommandBuffer_T*) (chassis.cpp:3739)
==119404== by 0x4E1862: Renderer::Render(entt::basic_registry<entt::entity>*, ImDrawData*) (Renderer.cpp:684)
==119404== by 0x4950A0: main (main.cpp:477)
==119404==
Is this undefined behaviour in NVidia's driver? or is there something else I need to check for?
Update: I made an MCVE...
#include <iostream>
#include <vulkan/vulkan.h>
int main(int argc, char** argv) {
VkApplicationInfo app_info = {};
app_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
app_info.apiVersion = VK_MAKE_API_VERSION(0, 1, 3, 224);
VkInstanceCreateInfo instance_create_info = {};
instance_create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
instance_create_info.pApplicationInfo = &app_info;
VkInstance instance = nullptr;
vkCreateInstance(&instance_create_info, nullptr, &instance);
vkDestroyInstance(instance, nullptr);
std::cout << "done\n";
}
==215482== LEAK SUMMARY:
==215482== definitely lost: 81,640 bytes in 36 blocks
==215482== indirectly lost: 183,467 bytes in 1,170 blocks
==215482== possibly lost: 0 bytes in 0 blocks
==215482== still reachable: 165,087 bytes in 2,123 blocks
==215482== suppressed: 0 bytes in 0 blocks

How to Set the Number of Threads on PyTorch Hosted on AWS Lambda

I'm trying to set the number of threads via torch.set_num_threads(multiprocessing.cpu_count()) to speed up inference on AWS Lambda. However, it gives me the following stack trace.
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.625-07:00 Error in cpuinfo: failed to parse both lists of possible and present processors 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.801-07:00 /var/lang/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:11.801-07:00 warn(f"Failed to load image Python extension: {e}") 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00
Copy
terminate called after throwing an instance of 'c10::Error'
terminate called after throwing an instance of 'c10::Error' 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 what(): [enforce fail at ThreadPool.cc:44] cpuinfo_initialize(). cpuinfo initialization failed 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0x7faf9dc2a0 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00
Copy
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0x7faf9dc440 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0x7faf9dc440 in /var/lang/lib/python3.9/site-packages/torch/lib/libc10.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #2: <unknown function> + 0x1d6cb7c (0x7fb17abb7c in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #3: <unknown function> + 0x1d6fa34 (0x7fb17aea34 in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #4: at::set_num_threads(int) + 0x2c (0x7fb02d082c in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 frame #5: <unknown function> + 0x498864 (0x7fb52ff864 in /var/lang/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 <omitting python frames> 2022/03/28/[$LATEST]5935023a5e514dbd8cc15036a23cedfe
2022-03-28T16:48:14.200-07:00 Fatal Python error: Aborted
Does anyone know how to fix this? For context, I'm deploying a docker image that runs the PyTorch model but the one line I linked above results in this error.
I expected that the program would speed up. Instead, it crashed without a helpful stack trace.

Calling a function with node-ffi-napi (always) leaks memory

(Original question involving CGO was deleted and replaced with a simpler demonstration that shows what I think to be the core problem, the old question can still be accessed in the edit history.)
Use case: I am using the npm package ffi-napi to call a native function (nothing surprising).
Problem: each call, even of a void() function, leaks memory, which seems, well, a bit problematic.
Most simple repro I came up with:
const lib = ffi.Library(null, {
'atoi': ['int', ['string']]
});
let i = 0;
while (++i) {
lib.atoi('1234');
global.gc();
if (i % 100000 == 0) {
console.log(`iter. #${i} RSS MBs used: ${process.memoryUsage().rss / 1024 / 1024}`);
}
}
Outputs:
iter. #100000 RSS MBs used: 304.8359375
iter. #200000 RSS MBs used: 509.48828125
iter. #300000 RSS MBs used: 755.44140625
iter. #400000 RSS MBs used: 905.8203125
iter. #500000 RSS MBs used: 1123.90234375
iter. #600000 RSS MBs used: 1370.88671875
iter. #700000 RSS MBs used: 1612.45703125
-- CUT --
iter. #6200000 RSS MBs used: 11568.64453125
iter. #6300000 RSS MBs used: 11794.2109375
iter. #6400000 RSS MBs used: 11905.625
iter. #6500000 RSS MBs used: 12026.44921875
<--- Last few GCs --->
[1217086:0x5627be68a030] 168590 ms: Mark-sweep 2003.3 (2070.7) -> 1992.3 (2072.7) MB, 3546.0 / 170.8 ms (average mu = 0.101, current mu = 0.024) allocation failure scavenge might not succeed
[1217086:0x5627be68a030] 171579 ms: Mark-sweep 2005.6 (2072.7) -> 1994.6 (2074.7) MB, 2907.8 / 231.8 ms (average mu = 0.068, current mu = 0.027) allocation failure scavenge might not succeed
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x7fe054be1679]
Security context: 0x146fda0db989 <JSObject>
1: debug [0x11881d6991b9] [/home/morgan/Documents/Code/nogame/node_modules/debug/src/common.js:~64] [pc=0x553685c70a5](this=0x036c76042a69 <JSGlobal Object>)
2: arguments adaptor frame: 1->0
3: ./apps/server/src/main.ts [0x1cd68b8fffa9] [/home/morgan/Documents/Code/nogame/dist/apps/server/main.js:~94] [pc=0x553685d6a90](this=0x23bff30191f9 <Objec...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0x7fe053f2da9c node::Abort() [/lib/x86_64-linux-gnu/libnode.so.72]
2: 0x7fe053e61872 [/lib/x86_64-linux-gnu/libnode.so.72]
3: 0x7fe054107aea v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/lib/x86_64-linux-gnu/libnode.so.72]
4: 0x7fe054107dcb v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/lib/x86_64-linux-gnu/libnode.so.72]
5: 0x7fe0542b59b9 [/lib/x86_64-linux-gnu/libnode.so.72]
6: 0x7fe0542c5da7 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/lib/x86_64-linux-gnu/libnode.so.72]
7: 0x7fe0542c6a8f v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/lib/x86_64-linux-gnu/libnode.so.72]
8: 0x7fe0542c8d7c v8::internal::Heap::AllocateRawWithLightRetry(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib/x86_64-linux-gnu/libnode.so.72]
9: 0x7fe0542c8de8 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/lib/x86_64-linux-gnu/libnode.so.72]
10: 0x7fe05428d7ef v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/lib/x86_64-linux-gnu/libnode.so.72]
11: 0x7fe0545df559 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/lib/x86_64-linux-gnu/libnode.so.72]
12: 0x7fe054be1679 [/lib/x86_64-linux-gnu/libnode.so.72]
[1] 1217086 IOT instruction node --expose-gc dist/apps/server/main.js
NB: The same problem occurs with ffi-cross.
Valgrind log here for 5000 iterations: https://pastebin.com/p3Uv4WwR.
The leaking stack appears clearly:
==1223109== 400,160 bytes in 5,002 blocks are definitely lost in loss record 61 of 61
==1223109== at 0x4A3F723: operator new(unsigned long) (vg_replace_malloc.c:417)
==1223109== by 0x5428CE9: napi_create_reference (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0xDACDEFF: (anonymous namespace)::InstanceData::GetBufferData(napi_value__*) (in /home/morgan/Documents/Code/redacted/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node)
==1223109== by 0xDCE372F: FFI::FFI::FFICall(Napi::CallbackInfo const&) (in /home/morgan/Documents/Code/redacted/node_modules/ffi-napi/build/Release/ffi_bindings.node)
==1223109== by 0xDCE6B0D: Napi::details::CallbackData<void (*)(Napi::CallbackInfo const&), void>::Wrapper(napi_env__*, napi_callback_info__*) (in /home/morgan/Documents/Code/redacted/node_modules/ffi-napi/build/Release/ffi_bindings.node)
==1223109== by 0x5421127: ??? (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x56AFB48: v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x56AFEF4: ??? (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x56B0749: ??? (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x56B101C: v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x6123778: ??? (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
==1223109== by 0x60A8F03: ??? (in /usr/lib/x86_64-linux-gnu/libnode.so.72)
I will post an issue in the library's GitHub but any help would be greatly appreciated.

Investigating memory leak while using TMinuit ROOT with valgrind

I am using TMinuit in a loop for scanning some upper limit maps and I am running into a memory problem. The only thing which is created within the loop is the TMinuit object using "TMinuit * minuit = new TMinuit(n_params);". This is deleted at the end of the loop using "delete minuit". I used valgrind and it says something concerning Minuit (just a snippet here), but honestly, I don't understand that output. My guess was, that freeing memory is reached by "delete minuit". Obviously, that's not all.. Some suggestions? :-)
Valgrind output is here:
==17564== 46,053,008 (4,227,048 direct, 41,825,960 indirect) bytes in 25,161 blocks are definitely lost in loss record 11,738 of 11,738
==17564== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17564== by 0x52D77A8: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:330)
==17564== by 0x403601B: ???
==17564== by 0x4036064: ???
==17564== by 0x914984F: TClingCallFunc::exec(void*, void*) (TClingCallFunc.cxx:1776)
==17564== by 0x914A28F: operator() (functional:2267)
==17564== by 0x914A28F: TClingCallFunc::exec_with_valref_return(void*, cling::Value*) (TClingCallFunc.cxx:1998)
==17564== by 0x914AC58: TClingCallFunc::ExecInt(void*) (TClingCallFunc.cxx:2095)
==17564== by 0x53468A8: TMethodCall::Execute(void*, long&) (TMethodCall.cxx:457)
==17564== by 0x17DDFE20: Execute (TMethodCall.h:136)
==17564== by 0x17DDFE20: ExecPluginImpl<int, double*, double*> (TPluginManager.h:162)
==17564== by 0x17DDFE20: ExecPlugin<int, double*, double*> (TPluginManager.h:174)
==17564== by 0x17DDFE20: TMinuit::mnplot(double*, double*, char*, int, int, int) (TMinuit.cxx:6085)
==17564== by 0x17DE3C18: TMinuit::mnscan() (TMinuit.cxx:6803)
==17564== by 0x17DF744D: TMinuit::mnexcm(char const*, double*, int, int&) (TMinuit.cxx:2977)
==17564== by 0x17DD9235: TMinuit::mncomd(char const*, int&) (TMinuit.cxx:1382)
==17564== by 0x178CA910: ULcoh(int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x178CADA4: ulmaps(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int) (in /mnt/scr1/user/j_blom02/analysis/phikk/ul/ulmaps_C.so)
==17564== by 0x4032084: ???
==17564== by 0x918588B: cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) [clone .part.290] [clone .constprop.445] (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A362: cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x918A60B: cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x9217886: cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (in /mnt/scr1/user/bes3/root/build_v6_14_08/lib/libCling.so)
==17564== by 0x90FB3D9: HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) (TCling.cxx:2060)
==17564== by 0x911033D: TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (TCling.cxx:2177)
==17564== by 0x91022A2: TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) (TCling.cxx:3053)
==17564== by 0x5272649: TApplication::ExecuteFile(char const*, int*, bool) (TApplication.cxx:1157)
==17564== by 0x52735F5: TApplication::ProcessLine(char const*, bool, int*) (TApplication.cxx:1002)
==17564== by 0x4E4A183: TRint::ProcessLineNr(char const*, char const*, int*) (TRint.cxx:756)
==17564== by 0x4E4B956: TRint::Run(bool) (TRint.cxx:416)
==17564== by 0x400999: main (rmain.cxx:30)

How to avoid the following purify detected memory leak in C++?

I am getting the following memory leak.Its being probably caused by std::string.
how can i avoid it?
PLK: 23 bytes potentially leaked at 0xeb68278
* Suppressed in /vobs/ubtssw_brrm/test/testcases/.purify [line 3]
* This memory was allocated from:
malloc [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
operator new(unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
operator new(unsigned) [/vobs/ubtssw_brrm/test/test_build/linux-x86/rtlib.o]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_S_create(unsigned, unsigned, std::allocator<char> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/
x86/586/target/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::_Rep::_M_clone(std::allocator<char> const&, unsigned) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/tar
get/usr/lib/libstdc++.so.6]
std::string<char, std::char_traits<char>, std::allocator<char>>::string<char, std::char_traits<char>, std::allocator<char>>(std::string<char, std::char_traits<char>, std::alloc
ator<char>> const&) [/vobs/MontaVista/Linux/montavista/pro/devkit/x86/586/target/usr/lib/libstdc++.so.6]
uec_UEDir::getEntryToUpdateAfterInsertion(rcapi_ImsiGsmMap const&, rcapi_ImsiGsmMap&, std::_Rb_tree_iterator<std::pair<std::string<char, std::char_traits<char>, std::allocator<
char>> const, UEDirData >>&) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:2278]
uec_UEDir::addUpdate(rcapi_ImsiGsmMap const&, LocalUEDirInfo&, rcapi_ImsiGsmMap&, int, unsigned char) [/vobs/ubtssw_brrm/uectrl/linux-x86/../src/uec_UEDir.cc:282]
ucx_UEDirHandler::addUpdateUEDir(rcapi_ImsiGsmMap, UEDirUpdateType, acap_PresenceEvent) [/vobs/ubtssw_brrm/ucx/linux-x86/../src/ucx_UEDirHandler.cc:374]

I once had a case where Valgrind indicated I had leaks in std::string, but I couldn't see how. It turned out that I was leaking another object that held strings by value, but Valgrind correctly also caught the leaked string memory (which was the vast majority being leaked). I suspect that uec_UEDir isn't managing its strings correctly or is being leaked itself. I actually ended up finding my problem by very careful code inspection.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Does TBB scalable allocator emphasize memory fragmentation? - multithreading

Related

Undefined behaviour in NVidia's vulkan driver for linux?

How to Set the Number of Threads on PyTorch Hosted on AWS Lambda

Calling a function with node-ffi-napi (always) leaks memory

Investigating memory leak while using TMinuit ROOT with valgrind

How to avoid the following purify detected memory leak in C++?

Categories

Resources