I am working on a very wierd Memory leak issue and this resulted into the following problem.
I have a process running on my system which increases its Virtual Memory size after a certain operation is made.Now in order to confirm the issue is not a memory leak issue I want to get statistics for the number of free and used pages held by the process when its currently running.
I am aware of vmstat command which gives the same statistics for the entire system.But for my confirmation I need a per process vmstat command.
Does anyone have a idea how this can be done ?
/proc/PID/smaps file will give you exhaustive information on all regions of virtual memory held by the given process.
If you're coding in C/C++, dynamic analysis tool like Valgrind could be useful. http://valgrind.org/
Related
I have a Linux server running MongoDB on it as you can see in the picture below.
MongoDB service is using %65 of memory but the whole system memory usage is about 4375MB/16047MB. The whole system memory usage and MongoDB memory usage seems conflicting or I'm not interpreting the results correctly.
Can you help please?
Seeing that the memory from the 3 running tasks adds up to be ~88%, what your seeing is probably a percentage of the memory that's currently in use. The missing ~12% can probably be found with the other tasks.
Since I'm fairly new to linux and core dumps, I'm not sure what kind of information is stored in core-dumps. It makes me wonder if there is a GDB command to retrieve CPU % usage of threads from a Core dump file. Like the CPU % usage you get from 'top' command. Would be also nice to get memory usage too.
I'm rephrasing the question from my previous posting to stay more focused to the answer I'm looking for.
Reference : How to diagnose a python process chewing CPU in linux
Thanks.
No, it's not possible to obtain info about the CPU usage from a coredump.
The coredump is just the snapshot of the memory of the process at death-time. Any dynamic history is not available: CPU make/model/frequency, system load, number of other processes, kernel scheduling info, etc.
As a side effect, you DO get the memory usage information, as long as you know the memory available on the system that generated the coredump: since the coredump is the memory of the process, the more memory the process used, the bigger the coredump (generally speaking, there are exceptions like regions of memory not included in the codedump).
A core dump is a copy of the crashed process's address space (memory). You can use it to see how much memory the process was using (and you can examine all the data in its memory at the time it crashed), but it doesn't contain any information about CPU usage.
For the future, you can collect this easily enough -- have your process periodically collect memory usage for each thread, and when debugging, hunt for that variable in the core.
I had a problem in which my server began failing some of its normal processes and checks because the server's memory was completely full and taken.
I looked in the logging history and found that what it killed were some Java processes.
I used the "top" command to see what processes were taking up the most memory right now(after the issue was fixed) and it was a Java process. So in essence, I can tell what processes are taking up the most memory right now.
What I want to know is if there is a way to see what processes were taking up the most memory at the time when the failures started happening? Perhaps Linux keeps track or a log of the memory usage at particular times? I really have no idea but it would be great if I could see that kind of detail.
#Andy has answered your question. However, I'd like to add that for future reference use a monitoring tool. Something like these. These will give you what happened during a crash since you obviously cannot monitor all your servers all the time. Hope it helps.
Are you saying the kernel OOM killer went off? What does the log in dmesg say? Note that you can constrain a JVM to use a fixed heap size, which means it will fail affirmatively when full instead of letting the kernel kill something else. But the general answer to your question is no: there's no way to reliably run anything at the time of an OOM failure, because the system is out of memory! At best, you can use a separate process to poll the process table and log process sizes to catch memory leak conditions, etc...
There is no history of memory usage in linux be default, but you can achieve it with some simple command-line tool like sar.
Regarding your problem with memory:
If it was OOM-killer that did some mess on machine, then you have one great option to ensure it won't happen again (of course after reducing JVM heap size).
By default linux kernel allocates more memory than it has really. This, in some cases, can lead to OOM-killer killing the most memory-consumptive process if there is no memory for kernel tasks.
This behavior is controlled by vm.overcommit sysctl parameter.
So, you can try setting it to vm.overcommit = 2 is sysctl.conf and then run sysctl -p.
This will forbid overcommiting and make possibility of OOM-killer doing nasty things very low. Also you can think about adding a little-bit of swap space (if you don't have it already) and setting vm.swappiness to some really low value (like 5, for example. default value is 60), so in normal workflow your application won't go into swap, but if you'll be really short on memory, it will start using it temporarily and you will be able to see it even with df
WARNING this can lead to processes receiving "Cannot allocate memory" error if you have your server overloaded by memory. In this case:
Try to restrict memory usage by applications
Move part of them to another machine
I'm trying to run a code on ssh that works perfect for a smaller mesh , but since the new mesh is much bigger i used ifort command to compile it,
ifort -mcmodel=medium -i-dynamic -otest.out*.f
and it complies but when i run it , the output is:
killed
i know that problem is from memory, does anyone know if there's any way to run it?
how can i understand where in code cause memory problem?
Thanks
shadi
From the ifort command line, I think you are running on Linux.
Seeing "killed" as output is generally the result of Linux's Out Of Memory killer (OOM) getting involved to prevent an impending crash (because it's common practice for applications to ask for more memory then they need requests for more memory than is currently available are accepted - check for "Out of Memory: Killed process [PID] [process name]" in the system log files). The OOM killer is generally pretty good at disposing of the application responsible for using all the memory, so the place to start is your applications memory usage.
The first thing to do is try and estimate (even if it's only roughly) how much memory you expect your application to use. One approach is to guestimate the size of the major arrays and multiply them by the number of bits needed per element. Another approach is to think about how you would expect the memory use to grow with mesh size. You can study this by experiment (run with different mesh sizes, measure the memory use and extrapolate) or from one measurement and knowledge of how the major array scale. It may be that you are asking for much more memory then you have on the machine: and the solution to this is probably to get a access to bigger computer. (Or you could try and find an alternative algorithm which uses less memory.)
If their is a memory leak you should see more memory use than expected, even for the smaller mesh size. If this is the case, valgrind should help. Moving from static to dynamic storage probably isn't going to help here - I would expect to see a segmentation fault if you were just exceeding the available space on the stack.
try using valgrind. i tried it to find memory leaks in my fortran code with good success.
http://valgrind.org/
Is there a way we can record memory footprint? In a way that
after the process has finish we still can have access to it.
The typical way I check memory footprint is this:
$ cat /proc/PID/status
But in no way it exist after the process has finished.
you can do something like:
watch 'grep VmSize /proc/PID/status >> log'
when the program ends you'll have a list of memory footprints over time in log.
Valgrind has a memory profiler called Massif that provides detailed information about the memory usage of your program:
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program's heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. The graph is supplemented by a text or HTML file that includes more information for determining where the most memory is being allocated. Massif runs programs about 20x slower than normal.
You can record it using munin + a custom plugin.
This will allow you to monitor and save the needed process information, and graph it, easily.
Here's a related answer I gave at serverfault.com