How can I get rid of the seg fault? - multithreading

I have a list of function pointers called tasks_ready_master. The pointers point to functions (tasks) defined in a seperate module. I want to execute them in parallel using threads. Each thread has a queue called "thread_queue" of capacity 1. This queue will contain the task that should be executed by the thread. Once it is done, the task is retired from the queue. We have also a queue where we put all the tasks (called "master _queue"). This is my implementation for the execution subroutine:
subroutine master_worker_execution(self,var,tasks_ready_master,first_task,last_task)
type(tcb),dimension(20)::tasks_ready_master !< the master array of tasks
integer::i_task !< the task counter
type(tcb)::self !< self
integer,intent(in)::first_task,last_task
type(variables),intent(inout)::var !< the variables
!OpenMP variables
integer::num_thread !< the rank of the thread
integer:: OMP_GET_THREAD_NUM !< function to get the rank of the thread
type(QUEUE_STRUCT),pointer:: thread_queue
type(QUEUE_STRUCT),pointer::master_queue
logical::success
integer(kind = OMP_lock_kind) :: lck !< a lock
call OMP_init_lock(lck) !< lock initialization
!$OMP PARALLEL PRIVATE(i_task,num_thread,thread_queue) &
!$OMP SHARED(tasks_ready_master,self,var,master_queue,lck)
num_thread=OMP_GET_THREAD_NUM() !< the rank of the thread
!$OMP MASTER
call queue_create(master_queue,last_task-first_task+1) !< create the master queue
do i_task=first_task,last_task
call queue_append_data(master_queue,tasks_ready_master(i_task),success) !< add the list elements to the queue (full queue)
end do
!$OMP END MASTER
!$OMP BARRIER
if (num_thread .ne. 0) then
do while (.not. queue_empty(master_queue)) !< if the queue is not empty
call queue_create(thread_queue,1) !< create a thread queue of capacity 1
call OMP_set_lock(lck) !< set the lock
call queue_append_data(thread_queue,master_queue%data(1),success) !< add the first element of the list to the thread queue
call queue_retrieve_data(master_queue) !< retire the first element of the master queue
call OMP_unset_lock(lck) !< unset the lock
call thread_queue%data(1)%f_ptr(self,var) !< execute the one and only element of the thread queueu
call queue_retrieve_data(thread_queue) !< retire the element
end do
end if
!$OMP MASTER
call queue_destroy(master_queue) !< destory the master queue
!$OMP END MASTER
call queue_destroy(thread_queue) !< destroy the thread queue
!$OMP END PARALLEL
call OMP_destroy_lock(lck) !< destroy the lock
end subroutine master_worker_execution
The problem is that I get a segmentation fault:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f30fd3ca700 in ???
#0 0x7f30fd3ca700 in ???
#1 0x7f30fd3c98a5 in ???
#1 0x7f30fd3c98a5 in ???
#2 0x7f30fd06920f in ???
#2 0x7f30fd06920f in ???
#3 0x56524a0f1d08 in __master_worker_MOD_master_worker_execution._omp_fn.0
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:70
#4 0x7f30fd230a85 in ???
#3 0x56524a0f1ad7 in __queue_MOD_queue_destroy
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/queue.f90:64
#4 0x56524a0f1d94 in __master_worker_MOD_master_worker_execution._omp_fn.0
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:81
#5 0x7f30fd227e75 in ???
#6 0x56524a0f1f68 in __master_worker_MOD_master_worker_execution
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/master_worker.f90:54
#7 0x56524a0f29b5 in __app_management_MOD_management
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/app_management_without_t.f90:126
#8 0x56524a0f579b in hecese
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/program_hecese.f90:398
#9 0x56524a0ed26e in main
at /home/hakim/stage_hecese_HPC/OpenMP/hecese_OMP/program_hecese.f90:13
Erreur de segmentation (core dumped)
I tried to retire the while loop and it works (no seg fault). I don't understand where the mistake came from.
While debugging with gdb, it guides me to the line where we use queue_append_data and queue_retrieve_data.
This is the ouput I get when I use valgrind:
==13100== Memcheck, a memory error detector
==13100== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13100== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==13100== Command: ./output_hecese_omp
==13100==
==13100== Thread 3:
==13100== Jump to the invalid address stated on the next line
==13100== at 0x0: ???
==13100== by 0x10EB64: __master_worker_MOD_master_worker_execution._omp_fn.0 (master_worker.f90:73)
==13100== by 0x4C8BA85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==13100== by 0x4F1D608: start_thread (pthread_create.c:477)
==13100== by 0x4DD7292: clone (clone.S:95)
==13100== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==13100==
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x4888700 in ???
#1 0x48878a5 in ???
#2 0x4cfb20f in ???
#3 0x0 in ???
==13100==
==13100== Process terminating with default action of signal 11 (SIGSEGV)
==13100== at 0x4CFB169: raise (raise.c:46)
==13100== by 0x4CFB20F: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
==13100==
==13100== HEAP SUMMARY:
==13100== in use at exit: 266,372 bytes in 121 blocks
==13100== total heap usage: 194 allocs, 73 frees, 332,964 bytes allocated
==13100==
==13100== LEAK SUMMARY:
==13100== definitely lost: 29,280 bytes in 3 blocks
==13100== indirectly lost: 2,416 bytes in 2 blocks
==13100== possibly lost: 912 bytes in 3 blocks
==13100== still reachable: 233,764 bytes in 113 blocks
==13100== suppressed: 0 bytes in 0 blocks
==13100== Rerun with --leak-check=full to see details of leaked memory
==13100==
==13100== For lists of detected and suppressed errors, rerun with: -s
==13100== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)

Related

Segmentation fault - invalid memory reference (Conditional jump or move depends on uninitialised value(s))

Here is the code I am trying to execute:
SUBROUTINE GRAD(tasklist_GRAD,ww,pas,cpt ,nb_element,cpt1,dt,dx,p_element,u_prime,u_prime_moins,u_prime_plus,&
&taux,grad_x_u,grad_t_u,grad_x_f,grad_t_f,ax_plus,ax_moins,ux_plus,ux_moins,sm,flux,tab0,tab)
INTEGER ::i,j,k,ff,pas
INTEGER,intent(inout)::cpt,cpt1,nb_element,ww
real*8 :: dt,dx
integer ,allocatable, dimension(:),intent(inout) ::p_element
REAL*8 ,allocatable, dimension(:),intent(inout) :: u_prime,u_prime_moins, u_prime_plus,taux,grad_x_u,&
&grad_t_u,grad_t_f,grad_x_f,flux,sm
real*8,allocatable,dimension(:),intent(inout) :: ax_plus,ax_moins,ux_moins,ux_plus
REAL*8 ,allocatable, dimension(:,:),intent(inout) ::tab0,tab
integer::num_thread,nthreads
integer, external :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS
type(tcb),dimension(20)::tasklist_GRAD,tasks_ready_master
integer,allocatable,dimension(:)::threads_list
integer,dimension(30)::threads_list_all
integer,dimension(3)::threads_list_part1
integer::threads_list_part2
integer,dimension(16)::threads_list_part3
type(tcb)::self
!-----------Calcul des gradients de x
Choisircese: select case (ww)
case(0) ! Old CESE
tasklist_GRAD(1)%f_ptr => u_prime_1 !1
tasklist_GRAD(2)%f_ptr => u_prime_droite_1 !2
tasklist_GRAD(3)%f_ptr => u_prime_gauche_1 !3
!$OMP PARALLEL PRIVATE(num_thread,nthreads) &
!$OMP SHARED(tasklist_GRAD,threads_list,threads_list_all,tasks_ready_master) &
!$OMP SHARED(threads_list_part1,threads_list_part2,threads_list_part3)
num_thread=OMP_GET_THREAD_NUM()
nthreads=OMP_GET_NUM_THREADS()
!Thread Application Master
!$OMP SINGLE
if (num_thread==1) then
do ff=1,3
if (associated(tasklist_GRAD(ff)%f_ptr) .eqv. .true. ) then
tasks_ready_master(ff) = tasklist_GRAD(ff)
end if
end do
do ff=1,3
if (associated(tasks_ready_master(ff)%f_ptr) .eqv. .true.) then
tasks_ready_master(ff)%state=STATE_READY
end if
end do
end if
!$OMP END SINGLE
!Thread Master
!$OMP SINGLE
if (num_thread==0) then
allocate(threads_list(nthreads-2))
do ff=1,nthreads-2
threads_list(ff)=ff+1
end do
do ff=1,3,nthreads-2
if (tasks_ready_master(ff)%state==STATE_READY) then
threads_list_all(ff:ff+nthreads-3)=threads_list(:)
end if
end do
threads_list_part1=threads_list_all(1:3)
end if
!$OMP END SINGLE
!Threads workers
do ff=1,3
if (num_thread==threads_list_part1(ff)) then
!$OMP TASK
call tasks_ready_master(ff)%f_ptr(self,ww,pas,cpt ,nb_element,cpt1,dt,dx,p_element,u_prime,u_prime_moins,&
&u_prime_plus,taux,grad_x_u,grad_t_u,grad_x_f,grad_t_f,ax_plus,ax_moins,ux_plus,ux_moins,sm,flux,tab0,tab)
!$OMP END TASK
end if
end do
!$OMP END PARALLEL
if(pas.eq.2)then
u_prime(2) = tab0(2,2)+dt/2.0d0*grad_x_u(2)!d_t_u(2)
u_prime(cpt-1) = tab0(cpt-1,2)+dt/2.0d0*grad_t_u(cpt-1)
u_prime(1) = tab0(1,2)+dt/2.0d0*grad_t_u(1)
u_prime(cpt) = tab0(cpt,2)+dt/2.0d0*grad_t_u(cpt)
u_prime_plus(1)= (u_prime(2)-tab(1,2))/(dx/2.0d0)
u_prime_moins(1)=-(u_prime(1)-tab(1,2))/(dx/2.0d0)
u_prime_plus(cpt)= (u_prime(cpt)-tab(cpt,2))/(dx/2.0d0)
u_prime_moins(cpt)= -(u_prime(cpt-1)-tab(cpt,2))/(dx/2.0d0)
end if
end select choisircese
END SUBROUTINE GRAD
The code is quite long so I only posted the case 0 (a sufficient part to understand the whole subroutine).
In order to compile, I do:
gfortran -fopenmp -O3 -g HECESE_openmp.f90
In order to execute, I do (I have fixed the number of threads previously):
./a.out
The error I get is:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f0eae2d6700 in ???
#0 0x7f0eae2d6700 in ???
#0 0x7f0eae2d6700 in ???
#1 0x7f0eae2d58a5 in ???
#1 0x7f0eae2d58a5 in ???
#2 0x7f0eadf7520f in ???
#3 0x0 in ???
Erreur de segmentation (core dumped)
I decided so to use valgrind and I did:
valgrind --track-origins=yes ./a.out
and the errors I get are:
==10923== Thread 6:
==10923== Conditional jump or move depends on uninitialised value(s)
==10923== at 0x1153DE: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:702)
==10923== by 0x4C81A85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4F13608: start_thread (pthread_create.c:477)
==10923== by 0x4DCD292: clone (clone.S:95)
==10923== Uninitialised value was created by a stack allocation
==10923== at 0x10B22A: __procedures_MOD_grad (HECESE_openmp.f90:482)
==10923==
==10923== Thread 1:
==10923== Conditional jump or move depends on uninitialised value(s)
==10923== at 0x1153DE: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:702)
==10923== by 0x4C78E75: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x10BF35: __procedures_MOD_grad (HECESE_openmp.f90:659)
==10923== by 0x1123CA: MAIN__ (HECESE_openmp.f90:1065)
==10923== by 0x113E45: main (HECESE_openmp.f90:723)
==10923== Uninitialised value was created by a stack allocation
==10923== at 0x10B22A: __procedures_MOD_grad (HECESE_openmp.f90:482)
==10923==
==10923== Thread 8:
==10923== Jump to the invalid address stated on the next line
==10923== at 0x0: ???
==10923== by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x1153A4: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:674)
==10923== by 0x4C81A85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4F13608: start_thread (pthread_create.c:477)
==10923== by 0x4DCD292: clone (clone.S:95)
==10923== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923==
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
==10923== Thread 5:
==10923== Jump to the invalid address stated on the next line
==10923== at 0x0: ???
==10923== by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4C81A91: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4F13608: start_thread (pthread_create.c:477)
==10923== by 0x4DCD292: clone (clone.S:95)
==10923== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923==
==10923== Thread 1:
==10923== Jump to the invalid address stated on the next line
==10923== at 0x0: ???
==10923== by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x4C8304C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923== by 0x10BF35: __procedures_MOD_grad (HECESE_openmp.f90:659)
==10923== by 0x1123CA: MAIN__ (HECESE_openmp.f90:1065)
==10923== by 0x113E45: main (HECESE_openmp.f90:723)
==10923== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923==
#0 0x487e700 in ???
#1 0x487d8a5 in ???
#2 0x4cf120f in ???
#3 0x0 in ???
==10923==
==10923== Process terminating with default action of signal 11 (SIGSEGV)
==10923== at 0x4CF1169: raise (raise.c:46)
==10923== by 0x4CF120F: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
==10923==
==10923== HEAP SUMMARY:
==10923== in use at exit: 263,919 bytes in 122 blocks
==10923== total heap usage: 195 allocs, 73 frees, 330,511 bytes allocated
==10923==
==10923== LEAK SUMMARY:
==10923== definitely lost: 0 bytes in 0 blocks
==10923== indirectly lost: 0 bytes in 0 blocks
==10923== possibly lost: 3,344 bytes in 11 blocks
==10923== still reachable: 260,575 bytes in 111 blocks
==10923== suppressed: 0 bytes in 0 blocks
==10923== Rerun with --leak-check=full to see details of leaked memory
==10923==
==10923== For lists of detected and suppressed errors, rerun with: -s
==10923== ERROR SUMMARY: 165 errors from 5 contexts (suppressed: 0 from 0)
Erreur de segmentation (core dumped)
Can you help me please to find out where all these errors come from ? It was okay until I added the $!OMP SINGLE and retired the $!OMP BARRIER.
Consider
!$OMP SINGLE
if (num_thread==0) then
...
threads_list_part1=threads_list_all(1:3)
end if
!$OMP END SINGLE
!Threads workers
do ff=1,3
if (num_thread==threads_list_part1(ff)) then
The first thread that reaches this code will go into the single block. All other threads will then skip over it and wait at the implicit barrier at the end until the thread that entered the block finishes its work. If, and only if, the thread that enters the block is thread number 0 will the array threads_list_part1 get initialised. If any other thread enters the block it will not be initialised. You have no guarantee which thread enters the block so what you are seeing is a thread with number not zero being the first to reach the single block. Probable solution: just get rid of the if (num_thread==0) then and similarly for the other single block previous to it.
That said having seen what you are doing an even more OpenMP way of doing it might be to use parallel sections, about the first time I have ever seen that this might be a sensible thing to do.

Memory leaks in pthread even if the state is detached

I am learning pthreads programming.
I understood that there are two states of thread:
1. Joinable
2. Detachable
In case of Joinable, we need to call pthread_join to free the resources(stack), whereas in case of detached there is no need to call pthread_join and the resources will be freed on thread exit.
I wrote a sample program to observe the behavior
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
void *threadFn(void *arg)
{
pthread_detach(pthread_self());
sleep(1);
printf("Thread Fn\n");
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
pthread_t tid;
int ret = pthread_create(&tid, NULL, threadFn, NULL);
if (ret != 0) {
perror("Thread Creation Error\n");
exit(1);
}
printf("After thread created in Main\n");
pthread_exit(NULL);
}
When i try to check any mem leaks with valgrind it gave me leaks of 272 bytes. Can you show me why is the leak happening here.
$valgrind --leak-check=full ./app
==38649==
==38649== HEAP SUMMARY:
==38649== in use at exit: 272 bytes in 1 blocks
==38649== total heap usage: 7 allocs, 6 frees, 2,990 bytes allocated
==38649==
==38649== 272 bytes in 1 blocks are possibly lost in loss record 1 of 1
==38649== at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==38649== by 0x40134A6: allocate_dtv (dl-tls.c:286)
==38649== by 0x40134A6: _dl_allocate_tls (dl-tls.c:530)
==38649== by 0x4E44227: allocate_stack (allocatestack.c:627)
==38649== by 0x4E44227: pthread_create##GLIBC_2.2.5 (pthread_create.c:644)
==38649== by 0x108902: main (2.c:18)
==38649==
==38649== LEAK SUMMARY:
==38649== definitely lost: 0 bytes in 0 blocks
==38649== indirectly lost: 0 bytes in 0 blocks
==38649== possibly lost: 272 bytes in 1 blocks
==38649== still reachable: 0 bytes in 0 blocks
==38649== suppressed: 0 bytes in 0 blocks
==38649==
==38649== For counts of detected and suppressed errors, rerun with: -v
==38649== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Your expectation is correct that there shouldn't be any leaks in main thread once you call pthread_exit.
However, what you observe is a quirk of the implementation you're using (which is likely to be glibc) - pthreads library (glibc implementation) re-uses the initially allocated stack for threads - like a cache so that previously allocated stacks can be re-used whenever possible.
Valgrind simply reports what it "sees" (something was allocated but not de-allocated). But it's not a real leak, so you don't need to worry about this.
If you "reverse" the logic (main thread exits as the last thread) then you wouldn't see leaks because the initially allocated stack space is properly free'd by the main thread. But this leak isn't a real leak in any case and you can safely ignore this.
You can also setup a suppression file so that Valgrind doesn't complain about this (which is to inform Valgrind that "I know this isn't not real leak, so don't report this"), such as:
{
Pthread_Stack_Leaks_Ignore
Memcheck:Leak
fun:calloc
fun:allocate_dtv
fun:_dl_allocate_tls
fun:allocate_stack
fun:pthread_create*
}

Memory Leaks with OpenMP Tasks and the Intel Compiler Suite

I am trying to parallelize an algorithm using DAG-Scheduling via OpenMP tasks and there many programs are killed by the Linux kernel due to Out-Of-Memory after a calls to the parallelized code although the allocated memory is only 1% of the servers main memory. But this happens only if I use the Intel Compilers from 2015, 2016 or even the new 2017 edition.
Here is a small example building the same task dependency graph as the algorithm crashing:
PROGRAM OMP_TASK_PROBLEM
IMPLICIT NONE
INTEGER M, N
PARAMETER(M = 256, N=256)
DOUBLE PRECISION X(M,N)
X(1:M,1:N) = 0.0D0
CALL COMPUTE_X(M, N, X, M)
! WRITE(*,*) X (1:M, 1:N)
END PROGRAM
SUBROUTINE COMPUTE_X(M,N, X, LDX)
IMPLICIT NONE
INTEGER M, N, LDX
DOUBLE PRECISION X(LDX, N)
INTEGER K, L, KOLD, LOLD
!$omp parallel default(shared)
!$omp master
L = 1
DO WHILE ( L .LE. N )
K = M
DO WHILE (K .GT. 0)
IF ( K .EQ. M .AND. L .EQ. 1) THEN
!$omp task depend(out:X(K,L)) firstprivate(K,L) default(shared)
X(K,L) = 0
!$omp end task
ELSE IF ( K .EQ. M .AND. L .GT. 1) THEN
!$omp task depend(out:X(K,L)) depend(in:X(K,LOLD)) firstprivate(K,L,LOLD) default(shared)
X(K,L) = 1 + X(K,LOLD)
!$omp end task
ELSE IF ( K .LT. M .AND. L .EQ. 1) THEN
!$omp task depend(out:X(K,L)) depend(in:X(KOLD,L)) firstprivate(K,L,KOLD) default(shared)
X(K,L) = 2 + X(KOLD,L)
!$omp end task
ELSE
!$omp task depend(out:X(K,L)) depend(in:X(KOLD,L),X(K,LOLD)) firstprivate(K,L,KOLD, LOLD) default(shared)
X(K,L) = X(KOLD, L) + X(K,LOLD)
!$omp end task
END IF
KOLD = K
K = K - 1
END DO
LOLD = L
L = L + 1
END DO
!$omp end master
!$omp taskwait
!$omp end parallel
END SUBROUTINE
After compiling it using ifort -qopenmp -g omp_test.f90 and running it via valgrind it reports:
==21071== LEAK SUMMARY:
==21071== definitely lost: 0 bytes in 0 blocks
==21071== indirectly lost: 0 bytes in 0 blocks
==21071== possibly lost: 118,489,088 bytes in 113 blocks
==21071== still reachable: 1,286 bytes in 35 blocks
==21071== suppressed: 0 bytes in 0 blocks
where the --leak-check=full option shows the following details:
==21071== 1,048,576 bytes in 1 blocks are possibly lost in loss record 22 of 26
==21071== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21071== by 0x516A807: bget(kmp_info*, long) (kmp_alloc.c:741)
==21071== by 0x516A52D: ___kmp_fast_allocate (kmp_alloc.c:2012)
==21071== by 0x51CD917: __kmp_task_alloc (kmp_tasking.c:991)
==21071== by 0x51CD886: __kmpc_omp_task_alloc (kmp_tasking.c:1131)
==21071== by 0x4034FA: compute_x_ (omp_task_problem.f90:31)
==21071== by 0x51DBCB2: __kmp_invoke_microtask (in /scratch/software/intel-2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64_lin/libiomp5.so)
==21071== by 0x51AA436: __kmp_invoke_task_func (kmp_runtime.c:7058)
==21071== by 0x51AB60A: __kmp_fork_call (kmp_runtime.c:2397)
==21071== by 0x5184517: __kmpc_fork_call (kmp_csupport.c:339)
==21071== by 0x403353: compute_x_ (omp_task_problem.f90:24)
==21071== by 0x402F4C: MAIN__ (omp_task_problem.f90:10)
==21071==
==21071== 24,117,248 bytes in 23 blocks are possibly lost in loss record 23 of 26
==21071== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21071== by 0x516A807: bget(kmp_info*, long) (kmp_alloc.c:741)
==21071== by 0x516A52D: ___kmp_fast_allocate (kmp_alloc.c:2012)
==21071== by 0x51CB5E0: __kmpc_omp_task_with_deps (kmp_taskdeps.cpp:452)
==21071== by 0x403D54: compute_x_ (omp_task_problem.f90:43)
==21071== by 0x51DBCB2: __kmp_invoke_microtask (in /scratch/software/intel-2016/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64_lin/libiomp5.so)
==21071== by 0x51AA436: __kmp_invoke_task_func (kmp_runtime.c:7058)
==21071== by 0x51AB60A: __kmp_fork_call (kmp_runtime.c:2397)
==21071== by 0x5184517: __kmpc_fork_call (kmp_csupport.c:339)
==21071== by 0x403353: compute_x_ (omp_task_problem.f90:24)
==21071== by 0x402F4C: MAIN__ (omp_task_problem.f90:10)
==21071== by 0x402E1D: main (in ./a.out)
...
The line numbers of the compute_x_ function in the backtrace correspond to the !$omp task statements. These memory leaks accumulated rapidly to an amount of memory such that the program crashes. But changing the call of COMPUTE_X to
DO J = 1, 10000
CALL COMPUTE_X(M,N, X, M)
END DO
one can even use top to see that the memory requirements of the application explode.
Using gcc-6.2 for this valgrind ends up with:
==21246== LEAK SUMMARY:
==21246== definitely lost: 0 bytes in 0 blocks
==21246== indirectly lost: 0 bytes in 0 blocks
==21246== possibly lost: 8,640 bytes in 15 blocks
==21246== still reachable: 4,624 bytes in 4 blocks
==21246== suppressed: 0 bytes in 0 blocks
==21246==
where the leaks are only from the first initialization of the OpenMP runtime system.
So my question is: Why does the Intel Compiler/Intel OpenMP runtime system produce theses leaks or alternatively is there an error in the way I have
designed the task parallelism.

GDB debug output for multi-thread program

All,
I am debuging a 24-thread program with GDB, now I have find which line in the code the error occurs, but I cannot tell what the error is from the output of GDB. The followsing line of code leads to the error, it's just a normal insertion to a map structure.
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
I used GDB to find out in which thread the error happens and switched to that thread, the backtrace command shows the function calls in the stack. (The last several lines try to print the value of some variables in a function, but failed.)
What should I do to clear know what error is happening?
[root#localhost nameComponentEncoding]# gdb NCE_david
GNU gdb (GDB) Fedora (7.2.90.20110429-36.fc15)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david...done.
(gdb) r /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
Starting program: /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffd2bf5700 (LWP 13129)]
[New Thread 0x7fffd23f4700 (LWP 13130)]
[New Thread 0x7fffd1bf3700 (LWP 13131)]
[New Thread 0x7fffd13f2700 (LWP 13132)]
[New Thread 0x7fffd0bf1700 (LWP 13133)]
[New Thread 0x7fffd03f0700 (LWP 13134)]
[New Thread 0x7fffcfbef700 (LWP 13135)]
[New Thread 0x7fffcf3ee700 (LWP 13136)]
[New Thread 0x7fffcebed700 (LWP 13137)]
[New Thread 0x7fffce3ec700 (LWP 13138)]
[New Thread 0x7fffcdbeb700 (LWP 13139)]
[New Thread 0x7fffcd3ea700 (LWP 13140)]
[New Thread 0x7fffccbe9700 (LWP 13141)]
[New Thread 0x7fffcc3e8700 (LWP 13142)]
[New Thread 0x7fffcbbe7700 (LWP 13143)]
[New Thread 0x7fffcb3e6700 (LWP 13144)]
[New Thread 0x7fffcabe5700 (LWP 13145)]
[New Thread 0x7fffca3e4700 (LWP 13146)]
[New Thread 0x7fffc9be3700 (LWP 13147)]
[New Thread 0x7fffc93e2700 (LWP 13148)]
[New Thread 0x7fffc8be1700 (LWP 13149)]
[New Thread 0x7fffc83e0700 (LWP 13150)]
[New Thread 0x7fffc7bdf700 (LWP 13151)]
this is thread 1
this is thread 7
this is thread 14
this is thread 18
this is thread 2
this is thread 19
this is thread 6
this is thread 8
this is thread 24
base: 64312646
this is thread 11
this is thread 5
this is thread 12
this is thread 13
this is thread 3
this is thread 15
this is thread 16
this is thread 17
this is thread 4
this is thread 20
this is thread 21
this is thread 22
this is thread 23
this is thread 9
this is thread 10
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc8be1700 (LWP 13149)]
std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) info threads
Id Target Id Frame
24 Thread 0x7fffc7bdf700 (LWP 13151) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
(... other 22 threads not listed)
2 Thread 0x7fffd2bf5700 (LWP 13129) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
1 Thread 0x7ffff7fe57a0 (LWP 13126) "NCE_david" strtok () at ../sysdeps/x86_64/strtok.S:76
(gdb) thread 22
[Switching to thread 22 (Thread 0x7fffc8be1700 (LWP 13149))]
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) bt
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
#1 0x0000003cdd26e848 in std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x7fffc0005ba0, __p=<optimized out>, __header=...)
at ../../../../libstdc++-v3/src/tree.cc:266
#2 0x00000000004029ca in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_ (this=0x608108, __x=<optimized out>, __p=0x16cd3e30, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_pair.h:87
#3 0x0000000000402b7d in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_unique (this=0x608108, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_tree.h:1281
#4 0x000000000040444c in insert (__x=..., this=0x608108) at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_map.h:518
#5 ComponentTrie::add_prefix (this=0x7fffffffe2e0, prefix_input=<optimized out>, port=10) at ComponentTrie_david.cpp:112
#6 0x0000000000401c3b in main._omp_fn.0 () at NameComponentEncoding_david.cpp:277
#7 0x0000003cd2607fea in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:115
#8 0x0000003cd0607cd1 in start_thread (arg=0x7fffc8be1700) at pthread_create.c:305
#9 0x0000003cd02dfd3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::comps[j]
No symbol "comps" in specified context.
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::prefix
No symbol "prefix" in specified context.
Edit: I have run the code with valgrind --tool=memcheck, the following is the result.
[root#localhost nameComponentEncoding]# valgrind --tool=memcheck ./NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
(... many lines omitted)
==13261==
==13261== Thread 11:
==13261== Invalid read of size 1
==13261== at 0x3CD02849BC: strtok (strtok.S:141)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
==13261== Invalid read of size 1
==13261== at 0x3CD02849EC: strtok (strtok.S:167)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
Insertion and lookup cost time(us): 994669532 67108864 14.821731 0.067469
component number:4849478, state number: 2545847
Parallel threads:24
==13261==
==13261== HEAP SUMMARY:
==13261== in use at exit: 4,239,081,584 bytes in 76,746,193 blocks
==13261== total heap usage: 80,050,114 allocs, 3,303,921 frees, 4,323,622,103 bytes allocated
==13261==
==13261== LEAK SUMMARY:
==13261== definitely lost: 0 bytes in 0 blocks
==13261== indirectly lost: 0 bytes in 0 blocks
==13261== possibly lost: 4,111,951,106 bytes in 74,746,429 blocks
==13261== still reachable: 127,130,478 bytes in 1,999,764 blocks
==13261== suppressed: 0 bytes in 0 blocks
==13261== Rerun with --leak-check=full to see details of leaked memory
==13261==
==13261== For counts of detected and suppressed errors, rerun with: -v
==13261== Use --track-origins=yes to see where uninitialised values come from
==13261== ERROR SUMMARY: 45 errors from 30 contexts (suppressed: 6 from 6)
We know that the program is segfaulting on this line:
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
From the stack trace, we know that the segfault happens deep in the red black tree implementation of std::map:
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=#0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
This implies that:
The segfault could be caused by:
evaluating __x->_M_right
evaluating __y->_M_left
storing the right hand side to the left hand side of __x->_M_right = __y->_M_left
std::map::insert() being called implies that the segfault was NOT caused while building the arguments to the call. In particular comps[j] is not out of bounds.
This leads me to think that your heap was already corrupted by previous memory operation errors by this time and that the crash in std::map::insert() is a symptom and not a cause.
Run your program under the Valgrind memcheck tool:
$ valgrind --tool=memcheck /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
and carefully read Valgrind's output afterwards to find the first memory error in your program.
Valgrind is implemented as a virtual CPU, so your program would slow down by a factor of ~30. This is time consuming but should allow you to make progress in troubleshooting the problem.
In addition to Valgrind, you might also want to try enabling debug mode for the libstdc++ containers:
To use the libstdc++ debug mode, compile your application with the compiler flag -D_GLIBCXX_DEBUG. Note that this flag changes the sizes and behavior of standard class templates such as std::vector, and therefore you can only link code compiled with debug mode and code compiled without debug mode if no instantiation of a container is passed between the two translation units.
If your program uses no external libraries then rebuilding the whole thing with -D_GLIBCXX_DEBUG added to CXXFLAGS in the Makefile should work. Otherwise you'd need to know whether C++ containers are passed between components compiled with and without the debug flag.
Valgrind Log Review
I'm surprised that you're using strtok() in a multi-threaded program. Is ComponentTrie::add_prefix() never called from two threads concurrently? While fixing the invalid read by inspecting how strtok() is used on ComponentTrie_david.cpp:99, you might want to replace strtok() with strtok_r() as well.
Concurrent Access to STL Containers
The standard C++ containers are explicitly documented to not do thread synchronization:
The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state. An object will be modified by invoking a non-const member function on it or passing it as a non-const argument to a library function. An object will not be modified by invoking a const member function on it or passing it to a function as a pointer- or reference-to-const. Typically, the application programmer may infer what object locks must be held based on the objects referenced in a function call and whether the objects are accessed as const or non-const.
(That's from the GNU libstdc++ documentation but the C++11 standard essentially specifies the same behavior) Concurrent modifications of std::map and other containers is a serious error and likely the culprit that caused the crash. Guard each container with their own pthread_mutex_t or use the OpenMP synchronization mechanisms.

alsa - mem leak?

I've been chasing a memory leak (reported by 'valgrind --leak-check=yes') and it appears to be coming from ALSA. This code has been in the free world for some time so I'm guessing that it's something I'm doing wrong.
#include <stdio.h>
#include <stdlib.h>
#include <alsa/asoundlib.h>
int main (int argc, char *argv[])
{
snd_ctl_t *handle;
int err = snd_ctl_open( &handle, "hw:1", 0 );
printf( "snd_ctl_open: %d\n", err );
err = snd_ctl_close(handle);
printf( "snd_ctl_close: %d\n", err );
}
The output looks like this:
[root#aeolus alsa]# valgrind --leak-check=yes ./test2
==16296== Memcheck, a memory error detector
==16296== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==16296== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==16296== Command: ./test2
==16296==
snd_ctl_open: 0
snd_ctl_close: 0
==16296==
==16296== HEAP SUMMARY:
==16296== in use at exit: 22,912 bytes in 1,222 blocks
==16296== total heap usage: 1,507 allocs, 285 frees, 26,236 bytes allocated
==16296==
==16296== 4 bytes in 2 blocks are possibly lost in loss record 1 of 62
==16296== at 0x4007100: malloc (vg_replace_malloc.c:270)
==16296== by 0x340F7F: strdup (in /lib/libc-2.5.so)
==16296== by 0x624C6B5: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624CA5B: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624CD81: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624F311: snd_config_update_r (in /lib/libasound.so.2.0.0)
==16296== by 0x624FAD7: snd_config_update (in /lib/libasound.so.2.0.0)
==16296== by 0x625DA22: snd_ctl_open (in /lib/libasound.so.2.0.0)
==16296== by 0x804852F: main (test2.cpp:9)
and continues for some pages to
==16296== 2,052 bytes in 57 blocks are possibly lost in loss record 62 of 62
==16296== at 0x4005EB4: calloc (vg_replace_malloc.c:593)
==16296== by 0x624A268: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624A38F: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624CA33: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624CCC9: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624CD81: ??? (in /lib/libasound.so.2.0.0)
==16296== by 0x624F311: snd_config_update_r (in /lib/libasound.so.2.0.0)
==16296== by 0x624FAD7: snd_config_update (in /lib/libasound.so.2.0.0)
==16296== by 0x625DA22: snd_ctl_open (in /lib/libasound.so.2.0.0)
==16296== by 0x804852F: main (test2.cpp:9)
==16296==
==16296== LEAK SUMMARY:
==16296== definitely lost: 0 bytes in 0 blocks
==16296== indirectly lost: 0 bytes in 0 blocks
==16296== possibly lost: 22,748 bytes in 1,216 blocks
==16296== still reachable: 164 bytes in 6 blocks
==16296== suppressed: 0 bytes in 0 blocks
==16296== Reachable blocks (those to which a pointer was found) are not shown.
==16296== To see them, rerun with: --leak-check=full --show-reachable=yes
==16296==
==16296== For counts of detected and suppressed errors, rerun with: -v
==16296== ERROR SUMMARY: 56 errors from 56 contexts (suppressed: 19 from 8)
This came about as I'm using ALSA in a project and started seeing this huge leak...or at least the report of said leak.
So the question is: is it me, ALSA or valgrind that's having issues here?
http://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=MEMORY-LEAK;hb=HEAD says:
Memory leaks - really?
----------------------
Note that some developers are thinking that the ALSA library has some memory
leaks. Sure, it can be truth, but before contacting us, please, be sure that
these leaks are not forced.
The biggest reported leak is that the global configuration is cached for
next usage. If you do not want this feature, simply, call
snd_config_update_free_global() after all snd_*_open*() calls. This function
will free the cache.
The biggest reported leak is that the global configuration is cached for next usage.
If you do not want this feature, simply call snd_config_update_free_global() after all snd_*_open*() calls.
This function will free the cache." <---- Valgrind still detects leaks.
This can be fixed if you call snd_config_update_free_global() after snd_pcm_close(handle);
Perhaps this will work (source):
diff --git a/src/pcm/pcm.c b/src/pcm/pcm.c
--- a/src/pcm/pcm.c
+++ b/src/pcm/pcm.c
## -2171,7 +2171,12 ## static int snd_pcm_open_conf(snd_pcm_t **pcmp, const char *name,
if (open_func) {
err = open_func(pcmp, name, pcm_root, pcm_conf, stream, mode);
if (err >= 0) {
- (*pcmp)->open_func = open_func;
+ if ((*pcmp)->open_func) {
+ /* only init plugin (like empty, asym) */
+ snd_dlobj_cache_put(open_func);
+ } else {
+ (*pcmp)->open_func = open_func;
+ }
err = 0;
} else {
snd_dlobj_cache_put(open_func);
I tried it myself, but to no avail. My core temp heats up ~10 °F, most likely due to similar memory leak. Here's some of what valgrind gave me, even after using the patch above:
==869== 16,272 bytes in 226 blocks are possibly lost in loss record 103 of 103
==869== at 0x4C28E48: calloc (vg_replace_malloc.c:566)
==869== by 0x5066E61: _snd_config_make (in /usr/lib64/libasound.so.2)
==869== by 0x5066F58: _snd_config_make_add (in /usr/lib64/libasound.so.2)
==869== by 0x50673B9: parse_value (in /usr/lib64/libasound.so.2)
==869== by 0x50675DE: parse_array_def (in /usr/lib64/libasound.so.2)
==869== by 0x5067680: parse_array_defs (in /usr/lib64/libasound.so.2)
==869== by 0x5067A8E: parse_def (in /usr/lib64/libasound.so.2)
==869== by 0x5067BC7: parse_defs (in /usr/lib64/libasound.so.2)
==869== by 0x5067A6F: parse_def (in /usr/lib64/libasound.so.2)
==869== by 0x5067BC7: parse_defs (in /usr/lib64/libasound.so.2)
==869== by 0x5067A6F: parse_def (in /usr/lib64/libasound.so.2)
==869== by 0x5067BC7: parse_defs (in /usr/lib64/libasound.so.2)
The number of bytes lost just keeps going up and up.

Resources