Can a memory leak cause my process to get killed? - linux

This is a short description of my problem :
Context :
Hardware : Toradex Colibri VF61
Distribution : Angstrom v2014.12
Kernel release : 4.0.2-v2
Software language : Qt/C++
Problem :
I develop an application which needs to run at least 2 weeks on an embedded product. My problem is that my process runs for 5 days with a small memory leak, that I monitore whit "Top", and then it gets killed.
My process was turned into a zombie, as Top told me.
Attempt number 1 :
I tried to correct the memory leak with Valgrind, but some "probably" leaks are in libraries I use in my program (many are malloc). It's a very big work to understand all of the librairies and it's not the goal.
I think the memory leak is about 1% of memory lost per day, so 15% lost in 2 weeks. This kind of leak is acceptable for me, because the process will not run after 2 weeks, and the embedded system is dedicated for this process, I don't have any other big process running on the machine. The RAM monitoring shows that the process takes 30% of ressources, so estimated to 45% two weeks later.
Attempt number 2 :
I inquired about memory management under Linux and learned about OOM-Killer. I deduced that OOM-Killer propably felt that my process had been running for too long with a memory leak and killed it.
So I set the variable "oom_score_adj" of my process to -1000 to prevent OOM-Killer from killing my process and I tried again to run for long time with my memory leak.
But this time my process was turned into "sleeping" and not killed but unusable. The sleeping state was associated to an error message "Error in './app': malloc(): memory corruption (fast) : 0x72518ddf". I precise that I have zero malloc in my code, only in librairies I use.
Questions :
Do you think it's possible that a process like OOM-Killer could turn my process into zombie because I have a memory leak and my program has been running for a long time ?
Do you think it's possible that Linux turn my process into sleeping mode because the leak has filled up the memory allocated to the process ?

Concerning your first question, the OOM Killer will kill one or more processes following the oom_score (high memory consumption, less important for system, ..) in case of very less memory left for the system. So if the OOM Killer kills a child process of your main process, this will turn your main process to zombie.
To your second question, Linux put a process to sleeping state, if the resources for this certain process are not available. But in your case if there is memory leakage and the process consumes lot of memory then the process will be rather killed then put in sleeping state.
Are you using UART for your Application?
By the way, there is also a Toradex Community, where engineers can directly answers your questions.
Best regards, Jaski

Related

How to set memory limit for OOM Killer for chrome?

chrome invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=300
I'm getting the above error while testing with headless chrome browser + Selenium.
This error message...
chrome invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=300
...implies that the ChromeDriver controlled Browsing Context i.e. Chrome Browser invoked the OOM Killer due to out-of-memory error.
Out of Memory
Out of Memory error messages can appear when you attempt to start new programs or you try to use programs that are already running, even though you still have plenty of physical and pagefile memory available.
OOM Killer
The OOM Killer or Out Of Memory Killer is a process that the linux kernel employs when the system is critically low on memory. This situation occurs because the linux kernel has over allocated memory to its processes. When a process starts it requests a block of memory from the kernel. This initial request is usually a large request that the process will not immediately or indeed ever use all of. The kernel, aware of this tendency for processes to request redundant memory, over allocates the system memory. This means that when the system has, for example, 2GB of RAM the kernel may allocate 2.5GB to processes. This maximises the use of system memory by ensuring that the memory that is allocated to processes is being actively used. Now, if enough processes begin to use all of their requested memory blocks then there will not be enough physical memory to support them all. This means that the running processes require more memory than is physically available. This situation is exactly when linux kernel invoke the OOM Killer to review all running processes and kill one or more of them in order to free up system memory and keep the system running.
Chrome the first victim of the OOM Killer
Surprisingly it seems Chrome Browser Client is the first victiom of the oom killer. As the Linux OOM Killer kills the process with the highest score=(RSS + oom_score_adj), the chrome tabs are killed because they have an oom_score_adj of 300 (kLowestRendererOomScore = 300 in chrome_constants.cc) as follows:
#if defined(OS_LINUX)
const int kLowestRendererOomScore = 300;
const int kHighestRendererOomScore = 1000;
#endif
Details
This issue is a known issue and can be easily reproduced. We have discussed this issue in length and breadth with in oom_score_adj too high - chrome is always the first victiom of the oom killer. The goal was to adjust the OOM in Chrome OS to make sure the most-recently-opened tab isn't killed as it appeared OOM killer prefers recent processes by default. But on Linux distros that won't reflect and you would get undesirable behavior where Chrome procs get killed over other procs that probably should have been killed instead.
Solution
Some details interms of the error stack trace would have helped us to suggest you some changes in terms of:
total-vm usage
Physical memory
Swap memory
You can find a couple of relevant discussions in:
Understanding the Linux oom-killer's logs
what does anon-rss and total-vm mean
determine vm size of process killed by oom-killer
However, there was a code review to address this issue but the discussion still seems still in status Assigned with Priority:2 with in:
Linux: Adjust /proc/pid/oom_adj to sacrifice plugin and renderer processes to the OOM killer
tl; dr
java.lang.OutOfMemoryError: unable to create new native thread error using ChromeDriver and Chrome through Selenium in Spring boot
Outro
Chromium OS - Design Documents - Out of memory handling
Despite 32GB of RAM, this chromium OOM is still happening within its latest release !
Because this issue will fully freeze Xorg, the sysrq keys association can help to recover the console terminal.
ALT + SYS + K to kill chromium
Think about adding sysrq_always_enabled in the kernel boot command line.

Does linux core dumps have thread cpu usage information

Since I'm fairly new to linux and core dumps, I'm not sure what kind of information is stored in core-dumps. It makes me wonder if there is a GDB command to retrieve CPU % usage of threads from a Core dump file. Like the CPU % usage you get from 'top' command. Would be also nice to get memory usage too.
I'm rephrasing the question from my previous posting to stay more focused to the answer I'm looking for.
Reference : How to diagnose a python process chewing CPU in linux
Thanks.
No, it's not possible to obtain info about the CPU usage from a coredump.
The coredump is just the snapshot of the memory of the process at death-time. Any dynamic history is not available: CPU make/model/frequency, system load, number of other processes, kernel scheduling info, etc.
As a side effect, you DO get the memory usage information, as long as you know the memory available on the system that generated the coredump: since the coredump is the memory of the process, the more memory the process used, the bigger the coredump (generally speaking, there are exceptions like regions of memory not included in the codedump).
A core dump is a copy of the crashed process's address space (memory). You can use it to see how much memory the process was using (and you can examine all the data in its memory at the time it crashed), but it doesn't contain any information about CPU usage.
For the future, you can collect this easily enough -- have your process periodically collect memory usage for each thread, and when debugging, hunt for that variable in the core.

Does an Application memory leak cause an Operating System memory leak?

When we say a program leaks memory, say a new without a delete in c++, does it really leak? I mean, when the program ends, is that memory still allocated to some non-running program and can't be used, or does the OS know what memory was requested by each program, and release it when the program ends? If I run that program a lot of times, will I run out of memory?
No, in all practical operating systems, when a program exits, all its resources are reclaimed by the OS. Memory leaks become a more serious issue in programs that might continue running for an extended time and/or functions that may be called often from the same program.
On operating systems with protected memory (Mac OS 10+, all Unix-clones such as Linux, and NT-based Windows systems meaning Windows 2000 and younger), the memory gets released when the program ends.
If you run any program often enough without closing it in between (running more and more instances at the same time), you will eventually run out of memory, regardless of whether there is a memory leak or not, so that's also true of programs with memory leaks. Obviously, programs leaking memory will fill the memory faster than an identical program without memory leaks, but how many times you can run it without filling the memory depends much rather on how much memory that program needs for normal operation than whether there's a memory leak or not. That comparison is really not worth anything unless you are comparing two completely identical programs, one with a memory leak and one without.
Memory leaks become the most serious when you have a program running for a very long time. Classic examples of this is server software, such as web servers. With games or spreadsheet programs or word processors, for instance, memory leaks aren't nearly as serious because you close those programs eventually, freeing up the memory. But of course memory leaks are nasty little beasts which should always be tackled as a matter of principle.
But as stated earlier, all modern operating systems release the memory when the program closes, so even with a memory leak, you won't fill up the memory if you're continuously opening and closing the program.
Leaked memory is returned by the OS after the execution has stopped.
That's why it isn't always a big problem with desktop applications, but its a big problem with servers and services (they tend to run long times.).
Lets look at the following scenario:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
The program returns the memory.
The OS marks the block as free. Using the block now results in a access violation.
Program A ends and all memory used by A is marked unused.
Nothing wrong with that.
But if the memory is allocated in a loop and the delete is forgotten, you run into real problems:
Program A ask memory from the OS
The OS marks the block X as been used by A and returns it to the program.
The program should have a pointer to X.
Goto 1
If the OS runs out of memory, the program probably will crash.
No. Once the OS finishes closing the program, the memory comes back (given a reasonably modern OS). The problem is with long-running processes.
When the process ends, the memory gets cleared as well. The problem is that if a program leaks memory, it will requests more and more of the OS to run, and can possibly crash the OS.
It's more leaking in the sense that the code itself has no more grip on the piece of memory.
The OS can release the memory when the program ends. If a leak exists in a program then it is just an issue whilst the program is running. This is a problem for long running programs such as server processes. Or for example, if your web browser had a memory leak and you kept it running for days then it would gradually consume more memory.
As far as I know, on most OS when a program is started it receives a defined segment of memory which will be completely liberated once the program is ended.
Memory leaks are one of the main reason why garbage collector algorithms were invented since, once plugged into the runtime, they become responsible in reclaiming the memory that is no longer accessible by a program.
Memory leaks don't persist past end of execution so a "solution" to any memory leak is to simply end program execution. Obviously this is more of an issue on certain types of software. Having a database server which needs to go offline every 8 hours due to memory leaks is more of an issue than a video game which needs to be restarted after 8 hours of continual play.
The term "leak" refers to the fact that over time memory consumption will grow without any increased benefit. The "leaked" memory is memory neither used by the program nor usable by the OS (and other programs).
Sadly memory leaks are very common in unmanaged code. I have had firefox running for a couple days now and memory usage is 424MB despite only having 4 tabs open. If I closed firefox and re-opened the same tabs memory usage would likely be <100MB. Thus 300+ MB has "leaked".

How to tell where the memory went in Linux

I have a long running process that I suspect has a memory leak. I use top to monitor the memory levels of each process and nothing uses more than 15% of the total RAM. The machine has 4GB of RAM and the process starts with well over 3GB free. The process itself does very heavy, custom calculations on several MB of data. It takes a single core at 100%.
As time goes on, memory disappears but top does not blame my long running process. Instead, the "cached" and "buffers" memory increases and the "free" memory is reduced to as low as 2MB. The process eventually finishes its job and exits without issue but the memory never comes back. Should I be concerned or is this "normal"? Are there other tools besides top that can provide a deeper understanding?
Thanks.
That's normal. Your process is operating on files which are getting cached in memory. If there is "memory pressure" (demand from other programs) then that cache memory will be relinquished. The first time I wrote an X widget to show how much memory was "free" it took me a while to get used to the idea that free memory is doing you no good: Best to have it all in use doing some kind of caching until it's needed elsewhere!

ARM/Linux memory leak: Can a user program retain memory after terminating?

I've got a memory leak somewhere, but it doesn't appear to be related to my program. I'm making this bold statement based on the fact that once my program terminates, either by seg-faulting, exitting, or aborting, the memory isn't recovered. If my program were the culprit, I would assume the MMU would recover everything, but this doesn't appear to be the case.
My question is:
On a small Linux system (64 Mb Ram) running a program that uses only stack memory and a few calls to malloc(), what causes can I look too see memory being run down and stay down once my program terminates?
A related question is here:
This all started when after code in question was directing its stdout, stderr to a file. After a few hours it aborted with a "Segmentation Fault". A quick (naive?) look at /proc/meminfo showed that there wasn't much available memory, so I assumed something was leaking.
It appears I don't have a memory leak (see here) but it does lead me to some new questions...
It turns out that writing to block devices can use a quite a pile of physical memory; in my system there was only 64 Meg, so writing hundreds of Megs to a USB drive was increasing the cached, active and inactive memory pools quite a bit.
These memory pools are immediately released to the Free memory pool when the device is dismounted.
The exact cause of my segmentation fault remains a small mystery, but I know it's occurence can be reduced by understanding the virtual memory resources better, particularly around the use of Block devices.

Resources