Valgrind hangs when reading large HDF5 dataset in Fortran - linux

I have an application written in Fortran which makes use of parallel HDF5 for input / output.
A matching post-processing code is used to read its output, in the form of a *.h5 file, and process it.
When I try to use valgrind to check for memory leaks, however, it stalls when reading large datasets.
More exactly, the stalling occurs at a call to H5Dread_f for large datasets, for example 1069120 doubles (where doubles are defined as H5kind_to_type(REAL64,H5_REAL_KIND)), whereas for smaller ones it is okay.
I tried recompiling the HDF5 library using --enable-using-memchecker, as described here, but it didn't help.
Does anybody have more experience with this?

I found the cause / solution: Due to an error of mine, the chunk size I used in these HDF5 routines was often only 1 byte, which is of course much too low.
Fixing this also makes valgrind much much faster and usuable.

Related

How do I read a large .conll file with python?

My attempt to read a very large file with pyconll and conllu keeps running into memory errors. The file is 27Gb in size and even using iterators to read it does not help. I'm using python 3.7.
Both pyconll and conllu have iterative versions that use less memory at a given moment. If you call pyconll.load_from_file it is going to try to read and parse the entire file into memory and likely your machine has much less than 27Gb of RAM. Instead use pyconll.iter_from_file. This will read the sentences one by one and use minimal memory, and you can extract that's needed from those sentences piecemeal.
If you need to do some larger processing that requires having all information at once, it's a bit outside the scope of either of these libraries to support that type of scenario.

python: will memory_profiler affect runtime?

I am evaluating the tools that profile my python program. One of the interesting tools here is memory_profiler. Before moving forward, just want to know whethermemory_profiler affects runtime. The reason I am asking this question is that memory_profiler will output a lot of memory usages. So I am suspecting it might affect runtime.
Thanks
Derek
It depends how you are using memory_profiler. This can be used in two different ways:
To get memory usage line-by-line (run with python -m memory_profiler my_script.py). This needs to get memory information (from the OS) for every line executed within the profiled function. How this affects run-time depends on the amount of lines in the function: if it has a lot of lines with fast execution times, it might suppose a significant overhead. On the other hand, if the function to profile has few lines, and each lines has a significant computing time, then the overhead will be negligible.
To get memory as a function of time (run with mprof run my_script.py and plot with mprof plot). In this case the function that collects the memory usage is in a different process as the one that runs your script, hence the overhead is minimal (unless you are using all CPUs).

Minimizing IOPS with Fortran

I have a Fortran program that writes out a large amount of ASCII data, one line at-a-time, and there is some concern from system admins (and evidence from my runs) that this is adversely affecting system performance. I/O generally works better for fewer big writes than many small writes. So, I'd like to get the program to minimize the number of IOPS by writing out bigger chunks of data without changing the format of the output file (this is a large set of software with lots of related software depending on assumed file formats). I had thought turning a loop like this:
nwrite=100000000 !total number of lines to write
do cnt=1,nwrite
write(11,'(i22,3x,f16.14)')cnt,numar(cnt)
enddo
into a loop like this:
nwrite=100000000
nblock=10000 !number of lines to write in each block
do cnt=1,nwrite/nblock
write(11,'(i22,3x,f16.14)')(nblock*(cnt-1)+j,numar(nblock*(cnt-1)+j),j=1,nblock)
enddo
would do the trick. But I made two small scripts doing the above and they didn't show any real difference in run time. It's a fairly major time commitment to make the change in the actual code, so I'd like to be fairly sure before committing to an approach. I haven't completely unrolled the loop into a single write command because that might not work well for my current problem, though approaches that do this are also welcome.
Can anyone confirm whether the above code would reduce the actual number of write commands or what else might achieve what I'm looking for? Thanks in advance.
Based on input from other users and reading more, Fortran leaves control over this to the compiler, therefore it is compiler-dependent. Buffered writes are the default behavior for the Portland Group Fortran compiler, and it looks to be the same for GFortran. Intel does not buffer files by default. For the Intel compiler, adding the option -assume buffered_io will make file I/O buffered by default.

ImageMagick's display GPU "memory leak"?

I'm testing CUDA app and I have run into strange memory issue:
My program performs some image operations and displays it using ImageMagick's display program.
The problem is that every time I run that IM's display I get more GPU memory usage, so less memory for GPU computation.
I'm using IM's display, because I couldn't find anything that displays image from the pipe input. Any suggestions?
Anyway why IM's display takes so much GPU memory and why is it not freed?
Based on your question, you're attempting to display a series of files in sequence using a shell not unlike Bash after performing a set of GPU-intensive operations. You're curious why more GPU memory is being consumed with every subsequent invocation of ImageMagick display, which appears to be closing out successfully after the conclusion of each operation.
We may further theorize that you're using ImageMagick's OpenCL support for at least some of your processing. While we don't have enough information to determine what your GPU's texture buffers look like at the completion of each rendering via display, I speculate your GPU isn't freeing textures expediently, causing memory to slowly creep up.
Instead of continuing to build conjecture around this hypothesis, I will instead recommend a tool to debug your issue: gDEBugger. This should allow you to interrogate your video card to determine exactly why things are slowing down.
Best of luck with your application.
I know it's old, but we have figured out that using pipes (popen()) makes sophisticated copy of the program in memory, what also causes copying the end program directives, or whatever called... So when I close program opened with popen I also finish all CUDA related context that are usually freed in "background", when program ends. So cleaning CUDA memory after I close popen application won't work, and I thing here was my memory leak and general major program error.
I hope someone will find it useful.

R stats - memory issues when allocating a big matrix / Linux

I have read several threads about memory issues in R and I can't seem to find a solution to my problem.
I am running a sort of LASSO regression on several subsets of a big dataset. For some subsets it works well, and for some bigger subsets it does not work, with errors of type "cannot allocate vector of size 1.6Gb". The error occurs at this line of the code:
example <- cv.glmnet(x=bigmatrix, y=price, nfolds=3)
It also depends on the number of variables that were included in "bigmatrix".
I tried on R and R64 for both Mac and R for PC but recently went onto a faster virtual machine on Linux thinking I would avoid any memory issues. It was better but still had some limits, even though memory.limit indicates "Inf".
Is there anyway to make this work or do I have to cut a few variables in the matrix or take a smaller subset of data ?
I have read that R is looking for some contiguous bits of memory and that maybe I should pre-allocate the matrix ? Any idea ?
Let me build slightly on what #richardh said. All of the data you load with R chews up RAM. So you load your main data and it uses some hunk of RAM. Then you subset the data so the subset is using a smaller hunk. Then the regression algo needs a hunk that is greater than your subset because it does some manipulations and gyrations. Sometimes I am able to better use RAM by doing the following:
save the initial dataset to disk using save()
take a subset of the data
rm() the initial dataset so it is no longer in memory
do analysis on the subset
save results from the analysis
totally dump all items in memory: rm(list=ls())
load the initial dataset from step 1 back into RAM using load()
loop steps 2-7 as needed
Be careful with step 6 and try not to shoot your eye out. That dumps EVERYTHING in R memory. If it's not been saved, it'll be gone. A more subtle approach would be to delete the big objects that you are sure you don't need and not do the rm(list=ls()).
If you still need more RAM, you might want to run your analysis in Amazon's cloud. Their High-Memory Quadruple Extra Large Instance has over 68GB of RAM. Sometimes when I run into memory constraints I find the easiest thing to do is just go to the cloud where I can be as sloppy with RAM as I want to be.
Jeremy Anglim has a good blog post that includes a few tips on memory management in R. In that blog post Jeremy links to this previous StackOverflow question which I found helpful.
I don't think this has to do with continuous memory, but just that R by default works only in RAM (i.e., can't write to cache). Farnsworth's guide to econometrics in R mentions package filehash to enable writing to disk, but I don't have any experience with it.
Your best bet may be to work with smaller subsets, manage memory manually by removing variables you don't need with rm (i.e., run regression, store results, remove old matrix, load new matrix, repeat), and/or getting more RAM. HTH.
Try bigmemory package. It is very easy to use. The idea is that data are stored in file on HDD and you create an object in R as reference to this file. I have tested this one and it works very well.
There are some alternative as well, like "ff". See CRAN Task View: High-Performance and Parallel Computing with R for more information.

Resources