Openmpi and vargrind - openmpi

I just use a valgrind to test an example provide in openmpi-1.4/example:
mpirun.openmpi --np 2 valgrind --log-file=output.dat --leak-check=full --tool=memcheck ./ring_c
then I found below in output.dat:
==30450== Syscall param writev(vector[...]) points to uninitialised byte(s)
==30450== at 0x54DC150: __writev_nocancel (syscall-template.S:81)
==30450== by 0x7E3B312: mca_oob_tcp_msg_send_handler (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==30450== by 0x7E3C50A: mca_oob_tcp_peer_send (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==30450== by 0x7E40266: mca_oob_tcp_send_nb (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==30450== by 0x7C2FFB7: orte_rml_oob_send (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==30450== by 0x7C30637: orte_rml_oob_send_buffer (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==30450== by 0x824CBAE: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==30450== by 0x4E900FB: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4EA8499: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4009AD: main (ring_c.c:19)
==30450== Address 0x65c0321 is 161 bytes inside a block of size 256 alloc'd
==30450== at 0x4C2DEAE: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30450== by 0x4F1E619: opal_dss_buffer_extend (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4F1E9D0: opal_dss_copy_payload (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4EFA3DD: orte_grpcomm_base_pack_modex_entries (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x824CA8F: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==30450== by 0x4E900FB: ompi_mpi_init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4EA8499: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.1.0.8)
==30450== by 0x4009AD: main (ring_c.c:19)
==30450== HEAP SUMMARY:
==30450== in use at exit: 298,974 bytes in 1,482 blocks
==30450== total heap usage: 7,740 allocs, 6,258 frees, 13,223,431 bytes allocated
... ... ...
==30450== LEAK SUMMARY:
==30450== definitely lost: 51,132 bytes in 69 blocks
==30450== indirectly lost: 14,378 bytes in 39 blocks
==30450== possibly lost: 0 bytes in 0 blocks
==30450== still reachable: 233,464 bytes in 1,374 blocks
==30450== suppressed: 0 bytes in 0 blocks
==30450== Reachable blocks (those to which a pointer was found) are not shown.
==30450== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==30450==
==30450== For counts of detected and suppressed errors, rerun with: -v
==30450== Use --track-origins=yes to see where uninitialized values come from
==30450== ERROR SUMMARY: 63 errors from 63 contexts (suppressed: 0 from 0)
It has memory leak based on the memorycheck results.
Since the example is provided by openmpi-1.4 developers, does it mean every program using openmpi-1.4 as a libary will meet memory leak?
Fred

For performance reasons, OpenMPI is not valgrind-clean. However, as per the FAQ, a supression file is provided.
mpirun -np 2 valgrind --suppressions=$PREFIX/share/openmpi/openmpi-valgrind.supp

Related

Why does valgrind point to a memory leak at libc-start.c?

After building an app, it keeps crashing due to some memory leak. The beginning of the valgrind report reads:
==70588==
==70588== HEAP SUMMARY:
==70588== in use at exit: 215,842 bytes in 2,327 blocks
==70588== total heap usage: 77,289 allocs, 74,962 frees, 7,513,045 bytes allocated
==70588==
==70588== 20 bytes in 1 blocks are definitely lost in loss record 182 of 510
==70588== at 0x4849D8C: malloc (in /usr/lib/aarch64-linux-gnu/valgrid/vgpreload_memcheck-arm64-linux.so
==70588== by 0x15481F: ??? (in /usr/bin/tcsh)
==70588== by 0x15C313: ??? (in /usr/bin/tcsh)
==70588== by 0x117EAB: ??? (in /usr/bin/tcsh)
==70588== by 0x492C08F: (below main) (libc-start.c:308)
A similar message is repeated for several records but the strange thing is that it always points to libc-start.c, which doesn't provide much insight into which piece of the app is causing the memory leak (the app itself has tens of thousands of lines with a mix of C/Fortran and many internal dependencies). Any suggestions on what might be the root of the problem or what to look at would be welcomed.

Memory leak with allocatable array in fortran 2008 [duplicate]

I am using gfortran 8.4 in Ubuntu with a deferred-length character variable as in the following example:
PROGRAM test
IMPLICIT NONE
CHARACTER(LEN=:),ALLOCATABLE :: str
str = '10'
END PROGRAM test
If I compile it using:
gfortran-8 test.f90 -o test -O0
When running the program using Valgrind I get a memory leak:
==29119== HEAP SUMMARY:
==29119== in use at exit: 2 bytes in 1 blocks
==29119== total heap usage: 22 allocs, 21 frees, 13,522 bytes allocated
==29119==
==29119== LEAK SUMMARY:
==29119== definitely lost: 2 bytes in 1 blocks
==29119== indirectly lost: 0 bytes in 0 blocks
==29119== possibly lost: 0 bytes in 0 blocks
==29119== still reachable: 0 bytes in 0 blocks
==29119== suppressed: 0 bytes in 0 blocks
However, compiling the program with:
gfortran-8 test.f90 -o test -O1
I get in Valgrind:
==29130== HEAP SUMMARY:
==29130== in use at exit: 0 bytes in 0 blocks
==29130== total heap usage: 21 allocs, 21 frees, 13,520 bytes allocated
==29130==
==29130== All heap blocks were freed -- no leaks are possible
I do not understand why I am getting this memory leak when no optimization is applied at compile time. Thanks in advance.
All variables declared in the main program or as module variables are implicitly save. Saved variables are not automatically deallocated. The Fortran standard does not mandate deallocation of arrays at the end of the program. They will be reclaimed by your OS anyway.
You can deallocate your arrays manually or if you wish to get automatic reallocation, you can move that logic - and the allocatable variables - into a subroutine that is entered from the main program. That way the local allocatable variables of that subroutine will be deallocated when the subroutine finishes.
Alternatively, you can also create a block using block and end block and declare the allocatable variables inside the block with all that it brings. They will be deallocated when the execution of the block is finished.
Technically what happens is that code generated by the compiler for your program does not maintain the pointers inside the allocatable descriptors until the moment valgrind would like to see them for them to be "still reachable". That is a technicality that you do not have to worry about.
It might not be perfectly nice to let the OS do the memory cleanup for a variable that has a lifetime until the end of a program, but it's still valid.
To avoid these false positive leaks in valgrind it is sufficient to enclose your code in a scope contained in the main program using the block construct.

Deferred-length character variable causing memory leaks depending on the optimization level

I am using gfortran 8.4 in Ubuntu with a deferred-length character variable as in the following example:
PROGRAM test
IMPLICIT NONE
CHARACTER(LEN=:),ALLOCATABLE :: str
str = '10'
END PROGRAM test
If I compile it using:
gfortran-8 test.f90 -o test -O0
When running the program using Valgrind I get a memory leak:
==29119== HEAP SUMMARY:
==29119== in use at exit: 2 bytes in 1 blocks
==29119== total heap usage: 22 allocs, 21 frees, 13,522 bytes allocated
==29119==
==29119== LEAK SUMMARY:
==29119== definitely lost: 2 bytes in 1 blocks
==29119== indirectly lost: 0 bytes in 0 blocks
==29119== possibly lost: 0 bytes in 0 blocks
==29119== still reachable: 0 bytes in 0 blocks
==29119== suppressed: 0 bytes in 0 blocks
However, compiling the program with:
gfortran-8 test.f90 -o test -O1
I get in Valgrind:
==29130== HEAP SUMMARY:
==29130== in use at exit: 0 bytes in 0 blocks
==29130== total heap usage: 21 allocs, 21 frees, 13,520 bytes allocated
==29130==
==29130== All heap blocks were freed -- no leaks are possible
I do not understand why I am getting this memory leak when no optimization is applied at compile time. Thanks in advance.
All variables declared in the main program or as module variables are implicitly save. Saved variables are not automatically deallocated. The Fortran standard does not mandate deallocation of arrays at the end of the program. They will be reclaimed by your OS anyway.
You can deallocate your arrays manually or if you wish to get automatic reallocation, you can move that logic - and the allocatable variables - into a subroutine that is entered from the main program. That way the local allocatable variables of that subroutine will be deallocated when the subroutine finishes.
Alternatively, you can also create a block using block and end block and declare the allocatable variables inside the block with all that it brings. They will be deallocated when the execution of the block is finished.
Technically what happens is that code generated by the compiler for your program does not maintain the pointers inside the allocatable descriptors until the moment valgrind would like to see them for them to be "still reachable". That is a technicality that you do not have to worry about.
It might not be perfectly nice to let the OS do the memory cleanup for a variable that has a lifetime until the end of a program, but it's still valid.
To avoid these false positive leaks in valgrind it is sufficient to enclose your code in a scope contained in the main program using the block construct.

Decoding output from Valgrind

I'm trying to understand the output from Valgrind having executed it as follows:
valgrind --leak-check=yes "someprogram"
The output is here:
==30347==
==30347== HEAP SUMMARY:
==30347== in use at exit: 126,188 bytes in 2,777 blocks
==30347== total heap usage: 4,562 allocs, 1,785 frees, 974,922 bytes
allocated
==30347==
==30347== LEAK SUMMARY:
==30347== definitely lost: 0 bytes in 0 blocks
==30347== indirectly lost: 0 bytes in 0 blocks
==30347== possibly lost: 0 bytes in 0 blocks
==30347== still reachable: 126,188 bytes in 2,777 blocks
==30347== suppressed: 0 bytes in 0 blocks
==30347== Reachable blocks (those to which a pointer was found) are
not shown.
==30347== To see them, rerun with: --leak-check=full --show-reachable=yes
==30347==
==30347== For counts of detected and suppressed errors, rerun with: -v
==30347== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
According to the output, there are no lost bytes, but there seems to be still reachable blocks. So do I have a memory leak?
No.
You are most concerned with unreachable blocks. What you are seeing here is that there are active variables that are still "pointing" at reachable blocks of memory. They are still in scope.
An unreachable block would be, for instance, memory that you have allocated dynamically, used for a period of time and then all of the references to it have gone out of scope even though the program is still executing. Since you no longer have any handles pointing to them they are now unrecoverable, creating a memory leak.
Here is a quote from the Valgrind docs:
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.

help required regarding memory leak

my application is causing memory leak of 10mb when the first timeout occurs. Heare i am using linux timer functions (timer_create etc.,).
For the subsequent timeouts no issue is there. I doubt some problem with linux timers.
I debugged it with valgrind and purify. Even these tools are of no help to me. In both the tools, memory leaked is shown as few kb's. But my application is causing memory leak of 10mb for the first timeout.
If anybody faced this problem earlier, please help me.
To find out which bits of you code is causing the leak (if any), compile your code to include debug symbols (i.e. include -g flag if you're using gcc), then run your program via valgrind.
valgrind --leak-check=full ./your_program
The run will take a little longer than usual, but when your program ends, the output from valgrind should tell you how much memory you've leaked and where the cuplrits are.
Sample output:
==10934== HEAP SUMMARY:
==10934== in use at exit: 10 bytes in 10 blocks
==10934== total heap usage: 10 allocs, 0 frees, 10 bytes allocated
==10934==
==10934== 10 bytes in 10 blocks are definitely lost in loss record 1 of 1
==10934== at 0x4024F20: malloc (vg_replace_malloc.c:236)
==10934== by 0x8048402: main (a.c:8)
==10934==
==10934== LEAK SUMMARY:
==10934== definitely lost: 10 bytes in 10 blocks
==10934== indirectly lost: 0 bytes in 0 blocks
==10934== possibly lost: 0 bytes in 0 blocks
==10934== still reachable: 0 bytes in 0 blocks
==10934== suppressed: 0 bytes in 0 blocks
update
Since you're already using valgrind, perhaps you could try using the Massif tool that comes with it. It should be able to paint a more accurate picture of memory usage (compare to simply watching top).
Check out this tutorial to see how it can be used. You may need some additional options to get a sensible graph depending on the runtime and mem usage of your program. Some useful options are described a few pages later in the tutorial.
Good luck.

Resources