Cache memory occupied in RHEL - linux

I am running my app servers (one instance each of Karaf, Tomcat, Mongo and Zookeeper) in a RHEL environment and often see that (using free -m) of my total 12GB RAM almost 8GM is shown as cached. The app slows down as well. Why is this happening. I even tried to stop all of these services gracefully until i have only the Linux OS alone running on my box. Even then the cache is not freed. I have to manually free it to bring it down.
Why is the cache being accumulated like this and Does it have something to do with my application? Is it a good practise to run a chron job like this just to free the cache?

Try clearing the cache.
#sync; echo 3 > /proc/sys/vm/drop_caches

if you are talking about the "cache" from the last column here:
$ free -m
total used free shared buffers cached
Mem: 3954 3580 374 0 1 1448
then there is no reason to clear it. This cache is absolutely harmless, it retains (caches) for example previously opened files for faster access. When more memory is needed, this cache is automatically cleared. There is no reason why this cache would slow down any apps.
Update: Some apps store temporary files in memory, /dev/shm is usually a place for this, but you can check these on your system using:
$ mount|grep tmpfs
These files also show up in the cached column, but this data is not harmless in the sense that it cannot be dropped when more free memory is needed.

Related

inotifywait secretely consumes a lot of memory

On Ubuntu 14.04:
$ cat /proc/sys/fs/inotify/max_queued_events
16384
$ cat /proc/sys/fs/inotify/max_user_instances
128
$ cat /proc/sys/fs/inotify/max_user_watches
1048576
Right after computer restart I had 1GB of RAM consumed. After 20-30 minutes (having just 1 terminal opened) I had 6GB RAM used and growing, however none of the processes seemed to be using so much memory (according to htop and top). When I've killed inotifywait process memory was not freed but stopped growing. Then I've restarted PC, killed inotifywait right away and memory usage stopped at 1GB.
I have 2 hard drives, one is 1TB and second is 2TB. Was inotifywait somehow caching those or in general is it normal that it caused such behavior?
This is the Linux disk cache at work, it is not a memory leak in inotifywait or anything else.
I've accepted previous answer because it explains what's going on. However I have some second thoughts regarding this topic. What the page is basically saying is "caching doesn't occupy memory because this memory can at any point be taken back, so you should not worry, you should not panic, you should be grateful!". Well ... I'm not. I believe there should be some decent but at the same time hard limit for caching.
The idea behind this process was good -> "let's not waste user time and cache everything till we have space". However what this process is actually doing in my case is wasting my time. Currently I'm working on Linux which is running in virtual machine. Since I have a lot jira pages opened, a lot terminals in many desktops, some tools opened and etc I don't want to open all that stuff every day, so I just save virtual machine state instead of turning it off at the end of the day. Now let's say that my stuff occupies 4GB RAM. What will happen is that 4GB will be written into my hard drive after I save state and then 4GB will have to be loaded into RAM when I start virtual machine. Unfortunately that's only theory. In practice due to inotifywait which will happily fill my 32 GB RAM I have to wait 8 times longer for saving and restoring virtual machine. Yes my RAM is "fine" as the page is saying, yes I can open different app and I will not hit OOM but at the same time caching is wasting my time.
If the limit was decent let's say 1GB for caching then it would not be so painful. I think if I would have VM on HDD instead of SSD it would took forever to save the state and I would probably not use it at all.

Drop cache does not work

I am currently working on optimizing the memory management of a large program. For some pupose, I want to drop the page cache in my main memory.
I used sync && echo 3 > /proc/sys/vm/drop_caches as widely suggested by the internet, but it does not drop the cache to the level where it was before the program starts. This means there are some undroppable cache in the main memory after the program starts.
But isn't echo 3 means to free pagecache, dentries and inodes in cache memory? Is there any other kinds of cache that cannot be freed by this command?
Yes, there are some types of caches that cannot be dropped. For instance, tmpfs filesystems are stored in page cache. But these could not be flushed while in use. You can get better picture of how much memory you really have available by using free command, and checking available column. You'll notice that available memory is smaller than free + buffers + caches. Sometimes much smaller.
For more information on tmpfs using caches see this answer.
Collect output of cat /proc/vmstat before and after you issue drop cache.
It will give nr_inactive_file,nr_active_file ,nr_file_pages,nr_isolated_file. If drop cache works then total of above 4 should be less than before issuing drop cache.

Swappiness in JVM

I stumbled upon some interesting problems last week where I noticed that the one of our production servers, which has Apache HTTP Server over Tomcat running, stopped, reporting an HTTP outage.
On further investigating this issue it seemed like it was due to JVM causing memory pages to be swapped out quickly. This resulted in the swap space being completely populated, causing a memory issue the next time a page is moved to swap.
Investigating further, it seems that there is a JVM swappiness factor that's set to 60% by default with some of the our Linux distributions. Based on some research, it seems that this could be a high value for a web service that has high traffic. Our swap space was set to 2 GB.
The swap details are:
Filename Type Size Used Priority
/dev/sda3 partition 2096472 1261420 -1
From /proc/meminfo
SwapCached: 944668 kB
Our JVM properties are as follows:
-Xmx6g -Xms4g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:PermSize=512M -XX:MaxPermSize=1024M -XX:NewSize=2g -XX:MaxNewSize=2g -XX:ParallelGCThreads=8
The server runs with 12 GB RAM.
Swappiness does not go well with the JVM's GC process. Thus I tried reducing the swappiness to 0, but that did not change anything. We still see cases where the entire swap space is consumed and resulting in an OutOfMemory error.
How can I tune the JVM performance?
The swappiness factor is actually a system-wide setting in Linux, not JVM-specific.
My personal experience with some kinds of applications is that they need swap to be turned off in order to work without hiccups, period. While I can't say for sure if your application belongs to this category, I have seen apps being swapped out despite enough free RAM being available. As you pointed out, GC and swap don't mix well. Swappiness is just an indication to the OS, so its impact on how much gets swapped out may be larger or smaller depending on circumstances. My suggestion would be to try turning swap completely off.
In order to do this, you need to be root or have sudo access. Comment out the line describing swap in /etc/fstab, which will prevent swap from being turned on after a reboot. Then, in order to turn swap off for the current server run, run swapoff -a. This may take up to a few minutes if there's data in swap that needs to be pulled back into RAM. Then, check the last line of the output of free to ensure that the total size of available swap is 0. After this, observe your app in order to determine if turning swap off solved your issue.
If your physical RAM is 12 GB and your heap is set to only 6 GB (max), then you have plenty of RAM. Is this server dedicated to the Tomcat instance?
Take a look at the verbose GC log files to see what the memory usage is over time. Check your access log files to confirm whether the access requests have increased over time.

How can I limit the cache used by copying so there is still memory available for other caches?

Basic situation:
I am copying some NTFS disks in openSUSE. Each one is 2 TB. When I do this, the system runs slow.
My guesses:
I believe it is likely due to caching. Linux decides to discard useful caches (for example, KDE 4 bloat, virtual machine disks, LibreOffice binaries, Thunderbird binaries, etc.) and instead fill all available memory (24 GB total) with stuff from the copying disks, which will be read only once, then written and never used again. So then any time I use these applications (or KDE 4), the disk needs to be read again, and reading the bloat off the disk again makes things freeze/hiccup.
Due to the cache being gone and the fact that these bloated applications need lots of cache, this makes the system horribly slow.
Since it is USB, the disk and disk controller are not the bottleneck, so using ionice does not make it faster.
I believe it is the cache rather than just the motherboard going too slow, because if I stop everything copying, it still runs choppy for a while until it recaches everything.
And if I restart the copying, it takes a minute before it is choppy again. But also, I can limit it to around 40 MB/s, and it runs faster again (not because it has the right things cached, but because the motherboard busses have lots of extra bandwidth for the system disks). I can fully accept a performance loss from my motherboard's I/O capability being completely consumed (which is 100% used, meaning 0% wasted power which makes me happy), but I can't accept that this caching mechanism performs so terribly in this specific use case.
# free
total used free shared buffers cached
Mem: 24731556 24531876 199680 0 8834056 12998916
-/+ buffers/cache: 2698904 22032652
Swap: 4194300 24764 4169536
I also tried the same thing on Ubuntu, which causes a total system hang instead. ;)
And to clarify, I am not asking how to leave memory free for the "system", but for "cache". I know that cache memory is automatically given back to the system when needed, but my problem is that it is not reserved for caching of specific things.
Is there some way to tell these copy operations to limit memory usage so some important things remain cached, and therefore any slowdowns are a result of normal disk usage and not rereading the same commonly used files? For example, is there a setting of max memory per process/user/file system allowed to be used as cache/buffers?
The nocache command is the general answer to this problem! It is also in Debian and Ubuntu 13.10 (Saucy Salamander).
Thanks, Peter, for alerting us to the --drop-cache" option in rsync. But that was rejected upstream (Bug 9560 – drop-cache option), in favor of a more general solution for this: the new "nocache" command based on the rsync work with fadvise.
You just prepend "nocache" to any command you want. It also has nice utilities for describing and modifying the cache status of files. For example, here are the effects with and without nocache:
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]
$ ./nocache cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]\
$ cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 1945/1945 (100.0%) [filesize=7776.2K, pagesize=4K]
So hopefully that will work for other backup programs (rsnapshot, duplicity, rdiff-backup, amanda, s3sync, s3ql, tar, etc.) and other commands that you don't want trashing your cache.
Kristof Provost was very close, but in my situation, I didn't want to use dd or write my own software, so the solution was to use the "--drop-cache" option in rsync.
I have used this many times since creating this question, and it seems to fix the problem completely. One exception was when I am using rsync to copy from a FreeBSD machine, which doesn't support "--drop-cache". So I wrote a wrapper to replace the /usr/local/bin/rsync command, and remove that option, and now it works copying from there too.
It still uses huge amount of memory for buffers and seems to keep almost no cache, but it works smoothly anyway.
$ free
total used free shared buffers cached
Mem: 24731544 24531576 199968 0 15349680 850624
-/+ buffers/cache: 8331272 16400272
Swap: 4194300 602648 3591652
You have practically two choices:
Limit the maximum disk buffer size: the problem you're seeing is probably caused by default kernel configuration that allows using huge piece of RAM for disk buffering and, when you try to write lots of stuff to a really slow device, you'll end up lots of your precious RAM for disk caching to that slow the device.
The kernel does this because it assumes that the processes can continue to do stuff when they are not slowed down by the slow device and that RAM can be automatically freed if needed by simply writing the pages on storage (the slow USB stick - but the kernel doesn't consider the actual performance of that process). The quick fix:
# Wake up background writing process if there's more than 50 MB of dirty memory
echo 50000000 > /proc/sys/vm/dirty_background_bytes
# Limit background dirty bytes to 200 MB (source: http://serverfault.com/questions/126413/limit-linux-background-flush-dirty-pages)
echo 200000000 > /proc/sys/vm/dirty_bytes
Adjust the numbers to match the RAM you're willing to spend on disk write cache. A sensible value depends on your actual write performance, not the amount of RAM you have. You should target on having barely enough RAM for caching to allow full write performance for your devices. Note that this is a global setting, so you have to set this according to the slowest devices you're using.
Reserve a minimum memory size for each task you want to keep going fast. In practice this means creating cgroups for stuff you care about and defining the minimum memory you want to have for any such group. That way, the kernel can use the remaining memory as it sees fit. For details, see this presentation: SREcon19 Asia/Pacific - Linux Memory Management at Scale: Under the Hood
Update year 2022:
You can also try creating new file /etc/udev/rules.d/90-set-default-bdi-max_ratio-and-min_ratio.rules with the following contents:
# For every BDI device, set max cache usage to 30% and min reserved cache to 2% of the whole cache
# https://unix.stackexchange.com/a/481356/20336
ACTION=="add|change", SUBSYSTEM=="bdi", ATTR{max_ratio}="30", ATTR{min_ratio}="2"
The idea is to put limit per device for maximum cache utilization. With the above limit (30%) you can have two totally stalled devices and still have 40% of the disk cache available for the rest of the system. If you have 4 or more stalled devices in parallel, even this workaround cannot help alone. That's why I have also added minimum cache space of 2% for every device but I don't know how to check if this actually effective. I've been running with this config for about half a year and I think it's working nicely.
See https://unix.stackexchange.com/a/481356/20336 for details.
The kernel can not know that you won't use the cached data from copying again. This is your information advantage.
But you could set the swapiness to 0: sudo sysctl vm.swappiness=0. This will cause Linux to drop the cache before libraries, etc. are written to the swap.
It works nice for me too, especially very performant in combination with huge amount of RAM (16-32 GB).
It's not possible if you're using plain old cp, but if you're willing to reimplement or patch it yourself, setting posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE) on both input and output file will probably help.
posix_fadvise() tells the kernel about your intended access pattern. In this case, you'd only use the data once, so there isn't any point in caching it.
The Linux kernel honours these flags, so it shouldn't be caching the data any more.
Try using dd instead of cp.
Or mount the filesystem with the sync flag.
I'm not completely sure if these methods bypass the swap, but it may be worth giving a try.
I am copying some NTFS disks [...] the system runs slow. [...]
Since it is USB [...]
The slowdown is a known memory management issue.
Use a newer Linux Kernel. The older ones have a problem with USB data and "Transparent Huge Pages". See this LWN article. Very recently this issue was addressed - see "Memory Management" in LinuxChanges.
Ok, now that I know that you're using rsync and I could dig a bit more:
It seems that rsync is ineffective when used with tons of files at the same time. There's an entry in their FAQ, and it's not a Linux/cache problem. It's an rsync problem eating too much RAM.
Googling around someone recommended to split the syncing in multiple rsync invocations.

Linux Shared Memory allocation behaviour

First a little background: we used this tutorial to place our Minecraft world folder (which won't get bigger than 150MB or so with our settings) inside the Linux shared memory folder. We back it up every 10 minutes to the HDD using rsync. This should reduce the amount of I/O operations the HDD (a single 1TB drive) has to endure.
We set this up yesterday evening, but we forgot that every Tuesday night our backup program starts running too (which backs up the entire server to another machine on the network). Normally this isn't a problem, but this time our server went into a coma. It started swapping memory because it ran out of RAM.
Now I find this a little odd, since I'd think the shared memory would only allocate 150MB on the RAM to store the data on. With 4GB installed, you'd think that it doesn't matter that much.
My question is: does the Shared Memory actually allocate as much space on the RAM as the amount of data you put on it, or does it behave differently (like it reserves larger blocks)? I'm having a hard time finding info on the net about this.
If you could give some other tips on why the server might've freaked out, please do.
You can probably find out what you want by using
df /dev/shm
du -shc /dev/shm/minecraft/world/*
Or you can use the better road and create a separate mount, and limit it:
mkdir /tmp/minecraft
sudo mount -o size=150M,noexec,nodev -t tmpfs none /tmp/minecraft
HTH
PS: it is entirely possible to setup this mount from fstab, e.g.
none /mnt/minecraft tmpfs auto,size=400M,noexec,nodev 0 0

Resources