GDB debug output for multi-thread program - linux

All,
I am debuging a 24-thread program with GDB, now I have find which line in the code the error occurs, but I cannot tell what the error is from the output of GDB. The followsing line of code leads to the error, it's just a normal insertion to a map structure.
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
I used GDB to find out in which thread the error happens and switched to that thread, the backtrace command shows the function calls in the stack. (The last several lines try to print the value of some variables in a function, but failed.)
What should I do to clear know what error is happening?
[root#localhost nameComponentEncoding]# gdb NCE_david
GNU gdb (GDB) Fedora (7.2.90.20110429-36.fc15)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david...done.
(gdb) r /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
Starting program: /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffd2bf5700 (LWP 13129)]
[New Thread 0x7fffd23f4700 (LWP 13130)]
[New Thread 0x7fffd1bf3700 (LWP 13131)]
[New Thread 0x7fffd13f2700 (LWP 13132)]
[New Thread 0x7fffd0bf1700 (LWP 13133)]
[New Thread 0x7fffd03f0700 (LWP 13134)]
[New Thread 0x7fffcfbef700 (LWP 13135)]
[New Thread 0x7fffcf3ee700 (LWP 13136)]
[New Thread 0x7fffcebed700 (LWP 13137)]
[New Thread 0x7fffce3ec700 (LWP 13138)]
[New Thread 0x7fffcdbeb700 (LWP 13139)]
[New Thread 0x7fffcd3ea700 (LWP 13140)]
[New Thread 0x7fffccbe9700 (LWP 13141)]
[New Thread 0x7fffcc3e8700 (LWP 13142)]
[New Thread 0x7fffcbbe7700 (LWP 13143)]
[New Thread 0x7fffcb3e6700 (LWP 13144)]
[New Thread 0x7fffcabe5700 (LWP 13145)]
[New Thread 0x7fffca3e4700 (LWP 13146)]
[New Thread 0x7fffc9be3700 (LWP 13147)]
[New Thread 0x7fffc93e2700 (LWP 13148)]
[New Thread 0x7fffc8be1700 (LWP 13149)]
[New Thread 0x7fffc83e0700 (LWP 13150)]
[New Thread 0x7fffc7bdf700 (LWP 13151)]
this is thread 1
this is thread 7
this is thread 14
this is thread 18
this is thread 2
this is thread 19
this is thread 6
this is thread 8
this is thread 24
base: 64312646
this is thread 11
this is thread 5
this is thread 12
this is thread 13
this is thread 3
this is thread 15
this is thread 16
this is thread 17
this is thread 4
this is thread 20
this is thread 21
this is thread 22
this is thread 23
this is thread 9
this is thread 10
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc8be1700 (LWP 13149)]
std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) info threads
Id Target Id Frame
24 Thread 0x7fffc7bdf700 (LWP 13151) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
(... other 22 threads not listed)
2 Thread 0x7fffd2bf5700 (LWP 13129) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
1 Thread 0x7ffff7fe57a0 (LWP 13126) "NCE_david" strtok () at ../sysdeps/x86_64/strtok.S:76
(gdb) thread 22
[Switching to thread 22 (Thread 0x7fffc8be1700 (LWP 13149))]
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) bt
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
#1 0x0000003cdd26e848 in std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x7fffc0005ba0, __p=<optimized out>, __header=...)
at ../../../../libstdc++-v3/src/tree.cc:266
#2 0x00000000004029ca in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_ (this=0x608108, __x=<optimized out>, __p=0x16cd3e30, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_pair.h:87
#3 0x0000000000402b7d in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_unique (this=0x608108, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_tree.h:1281
#4 0x000000000040444c in insert (__x=..., this=0x608108) at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_map.h:518
#5 ComponentTrie::add_prefix (this=0x7fffffffe2e0, prefix_input=<optimized out>, port=10) at ComponentTrie_david.cpp:112
#6 0x0000000000401c3b in main._omp_fn.0 () at NameComponentEncoding_david.cpp:277
#7 0x0000003cd2607fea in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:115
#8 0x0000003cd0607cd1 in start_thread (arg=0x7fffc8be1700) at pthread_create.c:305
#9 0x0000003cd02dfd3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::comps[j]
No symbol "comps" in specified context.
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::prefix
No symbol "prefix" in specified context.
Edit: I have run the code with valgrind --tool=memcheck, the following is the result.
[root#localhost nameComponentEncoding]# valgrind --tool=memcheck ./NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
(... many lines omitted)
==13261==
==13261== Thread 11:
==13261== Invalid read of size 1
==13261== at 0x3CD02849BC: strtok (strtok.S:141)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
==13261== Invalid read of size 1
==13261== at 0x3CD02849EC: strtok (strtok.S:167)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
Insertion and lookup cost time(us): 994669532 67108864 14.821731 0.067469
component number:4849478, state number: 2545847
Parallel threads:24
==13261==
==13261== HEAP SUMMARY:
==13261== in use at exit: 4,239,081,584 bytes in 76,746,193 blocks
==13261== total heap usage: 80,050,114 allocs, 3,303,921 frees, 4,323,622,103 bytes allocated
==13261==
==13261== LEAK SUMMARY:
==13261== definitely lost: 0 bytes in 0 blocks
==13261== indirectly lost: 0 bytes in 0 blocks
==13261== possibly lost: 4,111,951,106 bytes in 74,746,429 blocks
==13261== still reachable: 127,130,478 bytes in 1,999,764 blocks
==13261== suppressed: 0 bytes in 0 blocks
==13261== Rerun with --leak-check=full to see details of leaked memory
==13261==
==13261== For counts of detected and suppressed errors, rerun with: -v
==13261== Use --track-origins=yes to see where uninitialised values come from
==13261== ERROR SUMMARY: 45 errors from 30 contexts (suppressed: 6 from 6)

We know that the program is segfaulting on this line:
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
From the stack trace, we know that the segfault happens deep in the red black tree implementation of std::map:
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
This implies that:
The segfault could be caused by:
evaluating __x->_M_right
evaluating __y->_M_left
storing the right hand side to the left hand side of __x->_M_right = __y->_M_left
std::map::insert() being called implies that the segfault was NOT caused while building the arguments to the call. In particular comps[j] is not out of bounds.
This leads me to think that your heap was already corrupted by previous memory operation errors by this time and that the crash in std::map::insert() is a symptom and not a cause.
Run your program under the Valgrind memcheck tool:
$ valgrind --tool=memcheck /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
and carefully read Valgrind's output afterwards to find the first memory error in your program.
Valgrind is implemented as a virtual CPU, so your program would slow down by a factor of ~30. This is time consuming but should allow you to make progress in troubleshooting the problem.
In addition to Valgrind, you might also want to try enabling debug mode for the libstdc++ containers:
To use the libstdc++ debug mode, compile your application with the compiler flag -D_GLIBCXX_DEBUG. Note that this flag changes the sizes and behavior of standard class templates such as std::vector, and therefore you can only link code compiled with debug mode and code compiled without debug mode if no instantiation of a container is passed between the two translation units.
If your program uses no external libraries then rebuilding the whole thing with -D_GLIBCXX_DEBUG added to CXXFLAGS in the Makefile should work. Otherwise you'd need to know whether C++ containers are passed between components compiled with and without the debug flag.
Valgrind Log Review
I'm surprised that you're using strtok() in a multi-threaded program. Is ComponentTrie::add_prefix() never called from two threads concurrently? While fixing the invalid read by inspecting how strtok() is used on ComponentTrie_david.cpp:99, you might want to replace strtok() with strtok_r() as well.
Concurrent Access to STL Containers
The standard C++ containers are explicitly documented to not do thread synchronization:
The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state. An object will be modified by invoking a non-const member function on it or passing it as a non-const argument to a library function. An object will not be modified by invoking a const member function on it or passing it to a function as a pointer- or reference-to-const. Typically, the application programmer may infer what object locks must be held based on the objects referenced in a function call and whether the objects are accessed as const or non-const.
(That's from the GNU libstdc++ documentation but the C++11 standard essentially specifies the same behavior) Concurrent modifications of std::map and other containers is a serious error and likely the culprit that caused the crash. Guard each container with their own pthread_mutex_t or use the OpenMP synchronization mechanisms.

Related

A question about malloc implementation in glibc

I was reading source code of glibc.
In function void *__libc_malloc(size_t bytes):
void *__libc_malloc(size_t bytes) {
mstate ar_ptr;
void *victim;
_Static_assert(PTRDIFF_MAX <= SIZE_MAX / 2, "PTRDIFF_MAX is not more than half of SIZE_MAX");
if (!__malloc_initialized) ptmalloc_init();
...
}
It shows that if the first thread was created, it calls ptmalloc_init(), and links thread_arena with main_arena, and sets __malloc_initialized to true.
On the other hand, the second thread was blocked by the following code in ptmalloc_init():
static void ptmalloc_init(void) {
if (__malloc_initialized) return;
__malloc_initialized = true;
thread_arena = &main_arena;
malloc_init_state(&main_arena);
...
Thus the thread_arena of the second thread is NULL, and it has to mmap() additional arena.
My question is:
It seems possible to cause race condition because there's no any lock with __malloc_initialized, and thread_arenas of the first thread and second thread may both link with main_arena, why not use lock to protect __malloc_initialized?
It seems possible to cause race condition because there's no any lock with __malloc_initialized
It is impossible1 for a program to create a second running thread without having called an allocation routine (and therefore ptmalloc_init) while it was still single-threaded.
Because of that, ptmalloc_init can assume that it runs while there is only a single thread.
1Why is it impossible? Because creating a thread itself calls calloc.
For example, in this program:
#include <pthread.h>
void *fn(void *p) { return p; }
int main()
{
pthread_t tid;
pthread_create(&tid, NULL, fn, NULL);
pthread_join(tid, NULL);
return 0;
}
ptmalloc_init is called here (only a single thread exists at that point):
Breakpoint 2, ptmalloc_init () at /usr/src/debug/glibc-2.34-42.fc35.x86_64/malloc/arena.c:283
283 if (__malloc_initialized)
(gdb) bt
#0 ptmalloc_init () at /usr/src/debug/glibc-2.34-42.fc35.x86_64/malloc/arena.c:283
#1 __libc_calloc (n=17, elem_size=16) at malloc.c:3526
#2 0x00007ffff7fdd6c3 in calloc (b=16, a=17) at ../include/rtld-malloc.h:44
#3 allocate_dtv (result=result#entry=0x7ffff7dae640) at ../elf/dl-tls.c:375
#4 0x00007ffff7fde0e2 in __GI__dl_allocate_tls (mem=mem#entry=0x7ffff7dae640) at ../elf/dl-tls.c:634
#5 0x00007ffff7e514e5 in allocate_stack (stacksize=<synthetic pointer>, stack=<synthetic pointer>,
pdp=<synthetic pointer>, attr=0x7fffffffde30)
at /usr/src/debug/glibc-2.34-42.fc35.x86_64/nptl/allocatestack.c:429
#6 __pthread_create_2_1 (newthread=0x7fffffffdf58, attr=0x0, start_routine=0x401136 <fn>, arg=0x0)
at pthread_create.c:648
#7 0x0000000000401167 in main () at p.c:7
GLIBC's dynamic memory allocator is designed to deliver performances in both mono-threaded and multi-threaded programs. Several mutexes are used instead of having a centralized unique one which would at the end serialize every concurrent accesses to the dynamic memory allocator. The concept of arenas protected by one mutex has been introduced to have a kind of reserved memory area for each thread. Hence, the threads can access the memory allocator data structures in parallel as long as they use different arenas.
The main goal is to avoid as much as possible the contention on the mutexes.
The initialization step is critical because the main arena must be set up once. The __malloc_initialized global variable is a flag to prevent multiple initializations. Of course, in a multi-threaded environment, the latter should be protected by a mutex because checking the value of a variable is not multi-thread safe. But doing this would break the main design principle consisting to avoid a centralized mutex which would somehow serialize the execution of the concurrent threads during the process life time.
So, the unprotected __malloc_initialized is a trade-off that works as long as the first access to the memory allocator is done in mono-threaded mode.
Under Linux, a process starts mono-threaded (the main thread). With dynamically and statically linked programs, the GLIBC library has an initialization entry point (CSU = C Start Up) called __libc_start_main()_ defined in csu/libc-start.c in the library's source tree. It performs many initializations before calling the main() function. This is where a first call to the dynamic allocator occurs to initialize the main arena.
Let's look at the following program which does not explicitly call any service from the dynamic memory allocator and does not create any thread:
#include <unistd.h>
int main(void)
{
pause();
return 0;
}
Let's compile it and run it with gdb and a breakpoint on malloc():
$ gcc -g mm.c -o mm
$ gdb ./mm
[...]
(gdb) br malloc
Function "malloc" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (malloc) pending.
(gdb) run
Starting program: /.../mm
Breakpoint 1, malloc (n=1441) at dl-minimal.c:49
49 dl-minimal.c: No such file or directory.
(gdb) where
#0 malloc (n=1441) at dl-minimal.c:49
#1 0x00007ffff7fec5e5 in calloc (nmemb=<optimized out>, size=size#entry=1) at dl-minimal.c:103
#2 0x00007ffff7fdc284 in _dl_new_object (realname=realname#entry=0x7ffff7ff4342 "", libname=libname#entry=0x7ffff7ff4342 "", type=type#entry=0, loader=loader#entry=0x0,
mode=mode#entry=536870912, nsid=nsid#entry=0) at dl-object.c:89
#3 0x00007ffff7fd1d2f in dl_main (phdr=0x555555554040, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:1330
#4 0x00007ffff7febc4b in _dl_sysdep_start (start_argptr=start_argptr#entry=0x7fffffffdf70, dl_main=dl_main#entry=0x7ffff7fd15e0 <dl_main>) at ../elf/dl-sysdep.c:252
#5 0x00007ffff7fd104c in _dl_start_final (arg=0x7fffffffdf70) at rtld.c:449
#6 _dl_start (arg=0x7fffffffdf70) at rtld.c:539
#7 0x00007ffff7fd0108 in _start () from /lib64/ld-linux-x86-64.so.2
#8 0x0000000000000001 in ?? ()
#9 0x00007fffffffe2e2 in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb)
The above display shows that even if malloc() is not called explicitly in the main program, the GLIBC's internals call at least once the memory allocator triggering the initialization of the main arena.
We may consequently wonder why we need to check the __malloc_initialized variable during the process life time after the internal initialization step. The GLIBC initialization sets up various internal modules (main stack, pthreads...) and some of them may call the dynamic memory allocator. Hence __malloc_initialized is here to allow calling the allocator at any time during the initialization step. And, if the allocator is not needed because of some specific esoteric configuration, then it will not be initialized at all.

How can I get rid of the seg fault?

I have a list of function pointers called tasks_ready_master. The pointers point to functions (tasks) defined in a seperate module. I want to execute them in parallel using threads. Each thread has a queue called "thread_queue" of capacity 1. This queue will contain the task that should be executed by the thread. Once it is done, the task is retired from the queue. We have also a queue where we put all the tasks (called "master _queue"). This is my implementation for the execution subroutine:
subroutine master_worker_execution(self,var,tasks_ready_master,first_task,last_task)
type(tcb),dimension(20)::tasks_ready_master !< the master array of tasks
integer::i_task !< the task counter
type(tcb)::self !< self
integer,intent(in)::first_task,last_task
type(variables),intent(inout)::var !< the variables
!OpenMP variables
integer::num_thread !< the rank of the thread
integer:: OMP_GET_THREAD_NUM !< function to get the rank of the thread
type(QUEUE_STRUCT),pointer:: thread_queue
type(QUEUE_STRUCT),pointer::master_queue
logical::success
integer(kind = OMP_lock_kind) :: lck !< a lock
call OMP_init_lock(lck) !< lock initialization
!$OMP PARALLEL PRIVATE(i_task,num_thread,thread_queue) &
!$OMP SHARED(tasks_ready_master,self,var,master_queue,lck)
num_thread=OMP_GET_THREAD_NUM() !< the rank of the thread
!$OMP MASTER
call queue_create(master_queue,last_task-first_task+1) !< create the master queue
do i_task=first_task,last_task
call queue_append_data(master_queue,tasks_ready_master(i_task),success) !< add the list elements to the queue (full queue)
end do
!$OMP END MASTER
!$OMP BARRIER
if (num_thread .ne. 0) then
do while (.not. queue_empty(master_queue)) !< if the queue is not empty
call queue_create(thread_queue,1) !< create a thread queue of capacity 1
call OMP_set_lock(lck) !< set the lock
call queue_append_data(thread_queue,master_queue%data(1),success) !< add the first element of the list to the thread queue
call queue_retrieve_data(master_queue) !< retire the first element of the master queue
call OMP_unset_lock(lck) !< unset the lock
call thread_queue%data(1)%f_ptr(self,var) !< execute the one and only element of the thread queueu
call queue_retrieve_data(thread_queue) !< retire the element
end do
end if
!$OMP MASTER
call queue_destroy(master_queue) !< destory the master queue
!$OMP END MASTER
call queue_destroy(thread_queue) !< destroy the thread queue
!$OMP END PARALLEL
call OMP_destroy_lock(lck) !< destroy the lock
end subroutine master_worker_execution
The problem is that I get a segmentation fault:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f30fd3ca700 in ???
#0 0x7f30fd3ca700 in ???
#1 0x7f30fd3c98a5 in ???
#1 0x7f30fd3c98a5 in ???
#2 0x7f30fd06920f in ???
#2 0x7f30fd06920f in ???
#3 0x56524a0f1d08 in __master_worker_MOD_master_worker_execution._omp_fn.0
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:70
#4 0x7f30fd230a85 in ???
#3 0x56524a0f1ad7 in __queue_MOD_queue_destroy
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/queue.f90:64
#4 0x56524a0f1d94 in __master_worker_MOD_master_worker_execution._omp_fn.0
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:81
#5 0x7f30fd227e75 in ???
#6 0x56524a0f1f68 in __master_worker_MOD_master_worker_execution
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:54
#7 0x56524a0f29b5 in __app_management_MOD_management
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/app_management_without_t.f90:126
#8 0x56524a0f579b in hecese
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/program_hecese.f90:398
#9 0x56524a0ed26e in main
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/program_hecese.f90:13
Erreur de segmentation (core dumped)
I tried to retire the while loop and it works (no seg fault). I don't understand where the mistake came from.
While debugging with gdb, it guides me to the line where we use queue_append_data and queue_retrieve_data.
This is the ouput I get when I use valgrind:
==13100== Memcheck, a memory error detector
==13100== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13100== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==13100== Command: ./output_hecese_omp
==13100==
==13100== Thread 3:
==13100== Jump to the invalid address stated on the next line
==13100== at 0x0: ???
==13100== by 0x10EB64: __master_worker_MOD_master_worker_execution._omp_fn.0 (master_worker.f90:73)
==13100== by 0x4C8BA85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==13100== by 0x4F1D608: start_thread (pthread_create.c:477)
==13100== by 0x4DD7292: clone (clone.S:95)
==13100== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==13100==
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x4888700 in ???
#1 0x48878a5 in ???
#2 0x4cfb20f in ???
#3 0x0 in ???
==13100==
==13100== Process terminating with default action of signal 11 (SIGSEGV)
==13100== at 0x4CFB169: raise (raise.c:46)
==13100== by 0x4CFB20F: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
==13100==
==13100== HEAP SUMMARY:
==13100== in use at exit: 266,372 bytes in 121 blocks
==13100== total heap usage: 194 allocs, 73 frees, 332,964 bytes allocated
==13100==
==13100== LEAK SUMMARY:
==13100== definitely lost: 29,280 bytes in 3 blocks
==13100== indirectly lost: 2,416 bytes in 2 blocks
==13100== possibly lost: 912 bytes in 3 blocks
==13100== still reachable: 233,764 bytes in 113 blocks
==13100== suppressed: 0 bytes in 0 blocks
==13100== Rerun with --leak-check=full to see details of leaked memory
==13100==
==13100== For lists of detected and suppressed errors, rerun with: -s
==13100== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)

The futex facility returned an unexpected error code?

Two threads in same process using rwlock object stored in shared memory encounter crash during pthreads stress test. I spent a while trying to find memory corruption or deadlock but nothing so far. is this just an less than optimal way of informing me I have created a deadlock? Any pointers on tools/methods for debugging this?
Thread 5 "tms_test" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff28a7700 (LWP 3777)]
0x00007ffff761e428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff761e428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff762002a in __GI_abort () at abort.c:89
#2 0x00007ffff76607ea in __libc_message (do_abort=do_abort#entry=1, fmt=fmt#entry=0x7ffff77776cc "%s") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff766080e in __GI___libc_fatal (message=message#entry=0x7ffff79c4ae0 "The futex facility returned an unexpected error code.") at ../sysdeps/posix/libc_fatal.c:185
#4 0x00007ffff79be7e5 in futex_fatal_error () at ../sysdeps/nptl/futex-internal.h:200
#5 futex_wait (private=, expected=, futex_word=0x7ffff7f670d9) at ../sysdeps/unix/sysv/linux/futex-internal.h:77
#6 futex_wait_simple (private=, expected=, futex_word=0x7ffff7f670d9) at ../sysdeps/nptl/futex-internal.h:135
#7 __pthread_rwlock_wrlock_slow (rwlock=0x7ffff7f670cd) at pthread_rwlock_wrlock.c:67
#8 0x00000000004046e3 in _memstat (offset=0x7fffdc0b11a5, func=0x0, lineno=0, size=134, flag=1 '\001') at tms_mem.c:107
#9 0x000000000040703b in TmsMemReallocExec (in=0x7fffdc0abb81, size=211, func=0x43f858 "_malloc_thread", lineno=478) at tms_mem.c:390
#10 0x000000000042a008 in _malloc_thread (arg=0x644c11) at tms_test.c:478
#11 0x000000000041a1d6 in _threadStarter (arg=0x644c51) at tms_mem.c:2384
#12 0x00007ffff79b96ba in start_thread (arg=0x7ffff28a7700) at pthread_create.c:333
#13 0x00007ffff76ef82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)
It's pretty hard to debug something what is not documented well. I was trying to find any helpful information about "The futex facility returned an unexpected error code" but it seems that it isn't specified in futex documentation.
In my case this message was generated by sem_wait(sem), where sem wasn't valid sem_t pointer. I was accidentally overwriting it (the memory pointed by sem) with some random integers after initializing sem with sem_init(sem,1,1).
Try checking if you are passing valid pointer to locking function.
I was getting this error when i declared sem_t mutex as local variable.

OpenCV app compiled with C++11 creates extra thread

I'm debugging a OpenCV app compiled with C++11 (I use OpenCV 2.4.10). The app has two threads that do some image processing on the CPU (no GPU functions used but I also included libopencv_gpu.so in the linked libraries).
Using gdb I noticed that instead of just two threads (the main process thread and another thread created by the main process thread) I found 3 threads running:
(gdb) info threads
Id Target Id Frame
78 Thread 0x7fffe2ff5700 (LWP 20531) "app_name" 0x00007ffff5bb2f3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x7fffe3c42700 (LWP 20454) "app_name" 0x00007ffff5bdf12d in poll () at ../sysdeps/unix/syscall-template.S:81
* 1 Thread 0x7ffff7fab800 (LWP 20450) "app_name" 0x00007ffff5bb2f3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
Thread 1 and 78 (using gdb ID) are executing my code. I added a sleep call in each one so I can make sure that those are my threads.
Thread 2 (using gdb ID) is created before entering the main function of the main process I believe. As far as I could debug this, thread with ID 2 just calls poll() function all the time.
I'm new to gdb and maybe you can tell me how to find out who creates this thread and what is it's purpose? Is this OpenCV related or C++11? When I compile the same app using Opencv4Tegra and run it on a Tegra K1 board, thread number 2 does not exist.
EDIT
This is the backtrace when creating thread number 2. It seems that libusb creates this but I don't know why yet:
(gdb) backtrace
#0 __pthread_create_2_1 (newthread=0x7fffea79c438, attr=0x0, start_routine=0x7fffea5941c0, arg=0x0) at pthread_create.c:466
#1 0x00007fffea5943df in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#2 0x00007fffea5926a5 in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#3 0x00007fffea58b715 in libusb_init () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#4 0x00007ffff2f06a0e in ?? () from /usr/lib/x86_64-linux-gnu/libdc1394.so.22
#5 0x00007ffff2ef5465 in dc1394_new () from /usr/lib/x86_64-linux-gnu/libdc1394.so.22
#6 0x00007ffff6f615e9 in CvDC1394::CvDC1394() () from /usr/local/lib/libopencv_highgui.so.2.4
#7 0x00007ffff6f373f0 in _GLOBAL__sub_I_cap_dc1394_v2.cpp () from /usr/local/lib/libopencv_highgui.so.2.4
#8 0x00007ffff7dea13a in call_init (l=<optimized out>, argc=argc#entry=3, argv=argv#entry=0x7fffffffdcd8, env=env#entry=0x7fffffffdcf8) at dl-init.c:78
#9 0x00007ffff7dea223 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:36
#10 _dl_init (main_map=0x7ffff7ffe1c8, argc=3, argv=0x7fffffffdcd8, env=0x7fffffffdcf8) at dl-init.c:126
#11 0x00007ffff7ddb30a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
(gdb) quit

Printing thread hierarchy with GDB

I have a multithreaded program that I'm trying to debug. When I run info thread in GDB, I get the following:
(gdb) info thread
Id Target Id Frame
8 Thread 0x7fffe77fd700 (LWP 17425) "SocketWriter" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
7 Thread 0x7fffe73fc700 (LWP 17426) "SocketWriter" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
6 Thread 0x7fffe7fff700 (LWP 17423) "SocketReader" 0x00007ffff7bcc66d in read () from /usr/lib/libpthread.so.0
5 Thread 0x7fffe7bfe700 (LWP 17424) "SocketReader" 0x00007ffff7bcc66d in read () from /usr/lib/libpthread.so.0
* 4 Thread 0x7ffff4810700 (LWP 17422) "unittest" 0x00007ffff7bcc38c in __lll_lock_wait () from /usr/lib/libpthread.so.0
3 Thread 0x7ffff4c11700 (LWP 17421) "receiver" 0x00007ffff7bcc38c in __lll_lock_wait () from /usr/lib/libpthread.so.0
2 Thread 0x7ffff5a3b700 (LWP 17420) "unittest" 0x00007ffff634e553 in select () from /usr/lib/libc.so.6
1 Thread 0x7ffff7fc9780 (LWP 17419) "unittest" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
It would be excellent if I could make GDB display the parent/child relationships between the threads, something like the following:
(gdb) info thread
Id Target Id Frame
1 Thread 0x7ffff7fc9780 (LWP 17419) "unittest" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
3 Thread 0x7ffff4c11700 (LWP 17421) "receiver" 0x00007ffff7bcc38c in __lll_lock_wait () from /usr/lib/libpthread.so.0
8 Thread 0x7fffe77fd700 (LWP 17425) "SocketWriter" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
6 Thread 0x7fffe7fff700 (LWP 17423) "SocketReader" 0x00007ffff7bcc66d in read () from /usr/lib/libpthread.so.0
2 Thread 0x7ffff5a3b700 (LWP 17420) "unittest" 0x00007ffff634e553 in select () from /usr/lib/libc.so.6
5 Thread 0x7fffe7bfe700 (LWP 17424) "SocketReader" 0x00007ffff7bcc66d in read () from /usr/lib/libpthread.so.0
* 4 Thread 0x7ffff4810700 (LWP 17422) "unittest" 0x00007ffff7bcc38c in __lll_lock_wait () from /usr/lib/libpthread.so.0
7 Thread 0x7fffe73fc700 (LWP 17426) "SocketWriter" 0x00007ffff7bc9b2f in pthread_cond_wait##GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
For example, thread 3 is the parent of threads 8, 6, and 2, and thread 1 is the parent of everything.
Does such functionality exist? I have not seen reference to it, if it does.
gdb doesn't print this information because it doesn't exist in your program -- there is no way for gdb to discover it once the threads have been created.
There are maybe two ways it could be done.
First, you could set a breakpoint on the thread-creation function and record the information. This is readily done from Python. Then you can write a new command, also in Python, to format the output the way you like.
The problem with this approach is that it won't work if you "attach" to a running program. It will be too late to capture the information.
Another method is if you have extra information available in your program that describes the hierarchy. Then you can write a new command in Python that extracts this information to display things as you like.

Resources