After "OOM Killer", is there a "Resurrector"? - linux

I understand that on Linux there is a kernel functionality referred to as "OOM Killer". When the OOM (Out-Of-Memory) condition subsides, is there such a thing as a "Process Resurrector" ?
I understand this functionality would be difficult to implement for all sorts of reason, but is there something that gets close to it?
Edit: Example: the "Resurrector" would have a block of memory guaranteed to it for storing a limited set of process information (e.g. command-line, environment etc.) (i.e. not a whole process code & data !). Once the OOM condition is cleared, the "Resurrector" could go through the list and "resurrect" some processes.
From what I gather up to now, there doesn't seem to be functionality akin to what I am asking.

No. Once a process is killed by the OOM Killer, it's dead. You can restart it (resources permitting), and if it's something that's managed by the system (via inittab, perhaps), it might get restarted that way.
Edit: As a thought experiment, think about what a resurrection of a process would mean. Even if you could store the entire process state, you wouldn't want to because the process killed might be the REASON for the out-of-memory condition.
So the best you could possibly due would be to store it's startup state (command line, etc). But that's no good either, because again, that may be WHY the system ran out of memory in the first place!
Furthermore, if you resurrected a process in this way, there's no telling what could go wrong. What if the process controls hardware? What if the process controls shouldn't be run more than once? What if it was connected to a tty that isn't there anymore (because the sshd was one of the processes killed)?
There's an ENORMOUS amount of context around a process that the system can't possibly be aware of. The ONLY sensible thing is the thing that the kernel does: kill the sucker and go on.
I suppose you can imagine a hibernate-the-process-to-disk strategy, but given that we're out of memory (including swap), that means either pre-reserving some disk space or deciding to allocate disk space to this on the fly. Either of which strategy may not be capable of dealing with the size of the process in question.
In short: No, you don't get to come back from the OOM killer. It's a killer, you just have to deal with it.

Of course there is no. Otherwise, where can a killed process be stored if there's no more memory to store it? :-)
The thing is that OOM killer only comes into play when all available memory is exhausted, both RAM and on-disk swap memory. If a "process resurrector" could "resurrect" a process after the condition subsides, it should have been capable to store it somewhere at the time when "the killer" starts. But since killer only starts when there's no memory available, that is impossible.
Of course you may say "save to disk", but well, swap memory is a disk. If you want to limit memory consumption of your process, use ulimit functionality and track your mem usage manually via ps program or /proc filesystem. "OOM killer" is a panic measure and should not be very nice to processes.
Example of what you can do with ulimit (and, perhaps, without, but I can't experiment with OOM killing on my system atm)
#!/bin/bash
save_something=$ENV_VARIABLE
( ulimit -Sv 1000000;
perl -e 'print "Taking all RAM!!!\n"; while (1) { $a[$i++] = $i; }'
)
echo "killed, resetting"
( ulimit -Sv 1000000;
export ENV_VARIABLE="$save_something"
perl -e 'print "Taking all RAM!!!\n"; while (1) { $a[$i++] = $i; }'
)

Related

How to stop page cache for disk I/O in my linux system?

Here is my system based on Linux2.6.32.12:
1 It contains 20 processes which occupy a lot of usr cpu
2 It needs to write data on rate 100M/s to disk and those data would not be used recently.
What I expect:
It can run steadily and disk I/O would not affect my system.
My problem:
At the beginning, the system run as I thought. But as the time passed, Linux would cache a lot data for the disk I/O, that lead to physical memory reducing. At last, there will be not enough memory, then Linux will swap in/out my processes. It will cause I/O problem that a lot cpu time was used to I/O.
What I have try:
I try to solved the problem, by "fsync" everytime I write a large block.But the physical memory is still decreasing while cached increasing.
How to stop page cache here, it's useless for me
More infomation:
When Top show free 46963m, all is well including cpu %wa is low and vmstat shows no si or so.
When Top show free 273m, %wa is so high which affect my processes and vmstat shows a lot si and so.
I'm not sure that changing something will affect overall performance.
Maybe you might use posix_fadvise(2) and sync_file_range(2) in your program (and more rarely fsync(2) or fdatasync(2) or sync(2) or syncfs(2), ...). Also look at madvise(2), mlock(2) and munlock(2), and of course mmap(2) and munmap(2). Perhaps ionice(1) could help.
In the reader process, you might perhaps use readhahead(2) (perhaps in a separate thread).
Upgrading your kernel (to a 3.6 or better) could certainly help: Linux has improved significantly on these points since 2.6.32 which is really old.
To drop pagecache you can do the following:
"echo 1 > /proc/sys/vm/drop_caches"
drop_caches are usually 0. And, can be changed as per need. As you've identified yourself, that you need to free pagecache, so this is how to do it. You can also take a look at dirty_writeback_centisecs (and it's related tunables)(http://lxr.linux.no/linux+*/Documentation/sysctl/vm.txt#L129) to make quick writeback, but note it might have consequences, as it calls up kernel flasher thread to write out dirty pages. Also, note the uses of dirty_expire_centices, which defines how much time some data needs to be eligible for writeout.

How to measure the stack size of a process?

How do I find the stack size of a process ?
/proc/5848/status gives me VmStk but this doesnt change
No matter how much ever while loop and recursion I do in my test program this value hardly changes.
when I looked at /proc/pid/status all of the process has 136k and have no idea where that value comes from.
Thanks,
There really is no such thing as the "stack size of a process" on Linux. Processes have a starting stack, but as you see, they rarely allocate much from the standard stack. Instead, processes just allocate generic memory from the operating system and use it as a stack. So there's no way for the OS to know -- that detail is only visible from inside the process.
A typical, modern OS may have a stack size limit of 8MB imposed by the operating system. Yet processes routinely allocate much larger objects on their stack. That's because the application is using a stack that is purely application-managed and not a stack as far as the OS is concerned.
This is always true for multi-threaded processes. For single-threaded processes, it's possible they are actually just using very, very little stack.
Maybe you just want to get the address map of some process. For process 1234, read sequentially the /proc/1234/maps pseudo-file. For your own process, read /proc/self/maps
Try
cat /proc/self/maps
to get a feeling of it (the above command displays the address map of the cat process executing it).
Read proc(5) man page for details.
You might also be interested by process limits, e.g. getrlimit(2) and related syscalls.
I am not sure that stack size has some precise sense, notably for multi-threaded processes.
Maybe you are interested in mmap(2)-ed segments with MAP_GROWSDOWN.
the stksize can be got by pidstat command. install it by apt install sysstat
pidstat -p 11577 -l -s

Linux memory usage history

I had a problem in which my server began failing some of its normal processes and checks because the server's memory was completely full and taken.
I looked in the logging history and found that what it killed were some Java processes.
I used the "top" command to see what processes were taking up the most memory right now(after the issue was fixed) and it was a Java process. So in essence, I can tell what processes are taking up the most memory right now.
What I want to know is if there is a way to see what processes were taking up the most memory at the time when the failures started happening? Perhaps Linux keeps track or a log of the memory usage at particular times? I really have no idea but it would be great if I could see that kind of detail.
#Andy has answered your question. However, I'd like to add that for future reference use a monitoring tool. Something like these. These will give you what happened during a crash since you obviously cannot monitor all your servers all the time. Hope it helps.
Are you saying the kernel OOM killer went off? What does the log in dmesg say? Note that you can constrain a JVM to use a fixed heap size, which means it will fail affirmatively when full instead of letting the kernel kill something else. But the general answer to your question is no: there's no way to reliably run anything at the time of an OOM failure, because the system is out of memory! At best, you can use a separate process to poll the process table and log process sizes to catch memory leak conditions, etc...
There is no history of memory usage in linux be default, but you can achieve it with some simple command-line tool like sar.
Regarding your problem with memory:
If it was OOM-killer that did some mess on machine, then you have one great option to ensure it won't happen again (of course after reducing JVM heap size).
By default linux kernel allocates more memory than it has really. This, in some cases, can lead to OOM-killer killing the most memory-consumptive process if there is no memory for kernel tasks.
This behavior is controlled by vm.overcommit sysctl parameter.
So, you can try setting it to vm.overcommit = 2 is sysctl.conf and then run sysctl -p.
This will forbid overcommiting and make possibility of OOM-killer doing nasty things very low. Also you can think about adding a little-bit of swap space (if you don't have it already) and setting vm.swappiness to some really low value (like 5, for example. default value is 60), so in normal workflow your application won't go into swap, but if you'll be really short on memory, it will start using it temporarily and you will be able to see it even with df
WARNING this can lead to processes receiving "Cannot allocate memory" error if you have your server overloaded by memory. In this case:
Try to restrict memory usage by applications
Move part of them to another machine

Linux: how to detect if a process is thrashing too much?

Is there a way to detect programmatically?
Also, what would be the linux commands to detect which processes are thrashing?
I'm assuming that "thrashing" here refers to a situation where the active memory set of all processes is too big to fit into memory. In such a situation every context switch causes reading and writing to disk, and eventually the server may become so thrashed that hardware reboot is the only option to regain control of the box.
There are global counters swin and swout in /proc/vmstat - if both of them increases during some short time interval the box is probably experiencing thrashing problems.
At the process level it's non-trivial AFAIK. /proc/$pid/status contains some useful stuff, but not swin and swout. From 2.6.34 there is a VmSwap entry, total amount of swap used, and the variable #12 in /proc/$pid/state is the number of major page faults. /proc/$pid/oom_score is also worth looking into. If VmSwap is increasing and/or the number of major page faults is increasing and/or oom_score is spectacularly high, then the process is likely to cause thrashing.
I wrote up a script thrash-protect - it's available at https://github.com/tobixen/thrash-protect - it attempts to figure out what processes are causing the thrashing and temporary suspends processes. It has worked out great for me and rescued me from some server reboots eventually.
Update: newer versions of the kernel has useful statistics under /proc/pressure. Also, a computer set up without swap will also start "thrashing" as the memory is getting full, as lack of buffer space tends to cause excessive read operations on the disk.

Why the process is getting killed at 4GB?

I have written a program which works on huge set of data. My CPU and OS(Ubuntu) both are 64 bit and I have got 4GB of RAM. Using "top" (%Mem field), I saw that the process's memory consumption went up to around 87% i.e 3.4+ GB and then it got killed.
I then checked how much memory a process can access using "uname -m" which comes out to be "unlimited".
Now, since both the OS and CPU are 64 bit and also there exists a swap partition, the OS should have used the virtual memory i.e [ >3.4GB + yGB from swap space ] in total and only if the process required more memory, it should have been killed.
So, I have following ques:
How much physical memory can a process access theoretically on 64 bit m/c. My answer is 2^48 bytes.
If less than 2^48 bytes of physical memory exists, then OS should use virtual memory, correct?
If ans to above ques is YES, then OS should have used SWAP space as well, why did it kill the process w/o even using it. I dont think we have to use some specific system calls which coding our program to make this happen.
Please suggest.
It's not only the data size that could be the reason. For example, do ulimit -a and check the max stack size. Have you got a kill reason? Set 'ulimit -c 20000' to get a core file, it shows you the reason when you examine it with gdb.
Check with file and ldd that your executable is indeed 64 bits.
Check also the resource limits. From inside the process, you could use getrlimit system call (and setrlimit to change them, when possible). From a bash shell, try ulimit -a. From a zsh shell try limit.
Check also that your process indeed eats the memory you believe it does consume. If its pid is 1234 you could try pmap 1234. From inside the process you could read the /proc/self/maps or /proc/1234/maps (which you can read from a terminal). There is also the /proc/self/smaps or /proc/1234/smaps and /proc/self/status or /proc/1234/status and other files inside your /proc/self/ ...
Check with  free that you got the memory (and the swap space) you believe. You can add some temporary swap space with swapon /tmp/someswapfile (and use mkswap to initialize it).
I was routinely able, a few months (and a couple of years) ago, to run a 7Gb process (a huge cc1 compilation), under Gnu/Linux/Debian/Sid/AMD64, on a machine with 8Gb RAM.
And you could try with a tiny test program, which e.g. allocates with malloc several memory chunks of e.g. 32Mb each. Don't forget to write some bytes inside (at least at each megabyte).
standard C++ containers like std::map or std::vector are rumored to consume more memory than what we usually think.
Buy more RAM if needed. It is quite cheap these days.
In what can be addressed literally EVERYTHING has to fit into it, including your graphics adaptors, OS kernel, BIOS, etc. and the amount that can be addressed can't be extended by SWAP either.
Also worth noting that the process itself needs to be 64-bit also. And some operating systems may become unstable and therefore kill the process if you're using excessive RAM with it.

Resources