Linux kernel module OOM when there is memory in cache [closed] - linux

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Embedded system, no swap, kernel v2.6.36, memory compaction enabled.
Under heavy usage, all the RAM is tied up in cache. Cache was using about 70M of memory. When a user space process allocates memory, no problem, cache gives it up.
But there's a 3rd party device driver that seems to try to allocate a physical 5th order page, and fails with OOM. A quick look at buddyinfo confirms this... no 5th order page available. But as soon as I drop cache, plenty becomes available and the device driver no longer OOM.
So it seems to me that virtual memory allocation will trigger cache drop, but physical memory allocation will not? This doesn't make sense, because then kernel modules are likely to OOM when memory is tied up in cache, and this behavior seems to be more detrimental than slow disk access from no caching.
Is there a tuning parameter to address this?
Thanks!

So here's what's going on. I still don't know why high cache use is causing kernel modules to OOM. The problem is in 3rd party code that we don't have access to, so who knows what they're doing.
I guess one can argue if this is by design, where non-critical disk cache could take all available free memory, and cause kernel modules to OOM, then IMHO, maybe disk cache should leave something for the kernel.
I've decided to instead, limit the cache, so there is always some "truly free" memory left for kernel use, and not depend on "sort of free" memory tied up in cache.
There is a kernel patch I found that will add /proc/sys/vm/pagecache_ratio so you can set how much memory the disk cache could take. But that was never incorporated into the kernel for whatever reason (I thought that was a good idea, especially if disk cache could cause kernel OOM). But I didn't want to mess with kernel patches for maintainability and future-proofing reasons. If someone is just doing a one-shot deal, and doesn't mind patches, here's the link:
http://lwn.net/Articles/218890/
My solution is that I've recompiled the kernel and enabled cgroups, and I'm using that to limit the memory usage for a group of processes that are responsible for lots of disk access (hence running up the cache). After tweaking the configuration, it seems to be working fine. I'll leave my setup running the stress test over the weekend and see if OOM still happens.
Edit
I guess I found my own answer. There are VM tuning parameters in /proc/sys/vm/. Tune-able settings relevant to this issue are: min_free_kbytes, lowmem_reserve_ratio, and extfrag_threshold.

Related

Linux server's memory not release when upload lots of document files in liferay

I had a lot of users upload files and I find the memory not released after user uploaded files. Thus I stop the liferay tomcat, and there is no other applications, while the memory usage still high. So who cost the memory, I guess its linux server cached the documents. Can I get some idea or suggestion from you? I want to release the memory
Once Java has allocated memory from the OS, it'll not free it up again. This is not a feature of Liferay, but of the underlying JVM.
You can allocate less memory to Liferay (or the appserver) to begin with, but must be sure to at least allocate enough for the upload to be processed (AFAIK the documents aren't necessarily held in memory at the same time). You can also configure the cache sizes, so that Liferay won't need to allocate more memory from the OS, at the price of more cache misses. I'm aware of several installations that rather accepted the (minor) impact of cache misses than increasing the overall memory requirements.
However, as memory is so cheap these days, many opt to not optimize this particular aspect. If you can't upgrade your hardware it might be called for though.

Find process responsible for memory spikes

I have a WordPress website hosted on a shared Red Hat 4.4.7-18 Linux box and from time to time I get huge memory and I/O spikes exceding my allowed memory limit making my website unresponsive.
I have Cpanel installed but there is no way I can find out exactly what process is causing this but only seeing those spikes.
I think this is being caused by one of my plugins and would really want to know which of them does that. I have 15+ plugin installed so disabling and monitoring if the issue still exists it's not an option as this apparently happens randomly.

Code/Program memory and caching

I know that any data could be cached from main memory to cache memory that provides faster access from the CPU than does main memory.
I know as well that each thread has it's own stack memory.
So the question is, could the program/code memory be cached as well? or is the caching exclusive to data blocks, not instructions block?
Yes, the machine instructions of the program can be cached as well. In fact, some processors have separate caches for instructions and data, while in other processors they share a single cache.

do we need to disable swap for riak?

I just found in the riak documentation that the swap makes the server unresponsive so it has to be disabled.It is also given that Riak node be allowed to be killed by the kernel if it uses too much RAM. If swap is completely disabled, Riak will simply exit. I am confused should we have to disable the swap or not?
http://docs.basho.com/riak/latest/cookbooks/Linux-Performance-Tuning/
Swap Space
Due to the heavily I/O-focused profile of Riak, swap usage
can result in the entire server becoming unresponsive. Disable swap or
otherwise implement a solution for ensuring Riak's process pages are
not swapped.
Basho recommends that the Riak node be allowed to be killed by the
kernel if it uses too much RAM. If swap is completely disabled, Riak
will simply exit when it is unable to allocate more RAM and leave a
crash dump (named erl_crash.dump) in the /var/log/riak directory which
can be used for forensics (by Basho Client Services Engineers if you
are a customer).
So no, you don't have to ... but if you don't and you use all your available RAM the machine is likely to become unresponsive.
With any (unbounded) application that performs heavy I/O where you could exhaust your system's memory that's going to be the case. Typically you would have monitoring on the machine that warned you of memory usage going past a threshold.

Upgrades without reboot - what kinds of problems happen in practice? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
If you update, what kinds of problems can happen before you reboot? This happens especially frequently if you use unattended-upgrade to apply security patches.
Shared objects get replaced and so it is possible for programs to get out of sync with each other.
How long can you go safely before rebooting?
Clarification:
What I meant by "can programs get out of sync with one another" is that one binary has the earlier version of the shared object and a newly launched instance has the newer version of the shared object. It seems to me that if those versions are incompatible that the two binaries may not interoperate properly.
And does this happen in practice very often?
More clarification:
What I'm getting at is more along the lines that installers typically start/stop services that depend on a shared library so that they will get the new version of an API. If they get all the dependencies, then you are probably ok. But do people see installers missing dependencies often?
If a service is written to support all previous API versions compatibly, then this will not be an issue. But I suspect that often it is not done.
If there are kernel updates, especially if there are incompatible ABI changes, I don't see how you can get all the dependencies. I was looking for experience with whether and how things "tip over" and whether people have observed this in practice, either for kernel updates or for library/package updates.
Yes, this probably should have been put into ServerFault...
There are two versions of an executable file at any moment in time; the one in memory and the one in disk.
When you update, the one on disk gets replaced; the one in memory is the old one. If it's a shared object, it stays there until every application that uses it quits; if it's the kernel, it stays there until you reboot.
Bluntly put, if it's a security vulnerability you're updating for, the vulnerability stays until you load the (hopefully) patched version. So if it's a kernel, you aren't safe until you reboot. If it's a shared object, a reboot guarantees safety.
Basically, I'd say it depends on the category of the vulnerability. If it's security, restart whatever is affected. Otherwise, well, unless the bug is adversely affecting you, I wouldn't worry. If it's the kernel, I always reboot.

Resources