Drop cache does not work

Drop cache does not work - linux

I am currently working on optimizing the memory management of a large program. For some pupose, I want to drop the page cache in my main memory.
I used sync && echo 3 > /proc/sys/vm/drop_caches as widely suggested by the internet, but it does not drop the cache to the level where it was before the program starts. This means there are some undroppable cache in the main memory after the program starts.
But isn't echo 3 means to free pagecache, dentries and inodes in cache memory? Is there any other kinds of cache that cannot be freed by this command?

Yes, there are some types of caches that cannot be dropped. For instance, tmpfs filesystems are stored in page cache. But these could not be flushed while in use. You can get better picture of how much memory you really have available by using free command, and checking available column. You'll notice that available memory is smaller than free + buffers + caches. Sometimes much smaller.
For more information on tmpfs using caches see this answer.

Collect output of cat /proc/vmstat before and after you issue drop cache.
It will give nr_inactive_file,nr_active_file ,nr_file_pages,nr_isolated_file. If drop cache works then total of above 4 should be less than before issuing drop cache.

Related

Prioritize write cache over read cache on Linux

My pc (with 4 GB of RAM) is running several IO bound applications, and I want to avoid as many writes as possible on my SSD.
In /etc/sysctl.conf file I have set:
vm.dirty_background_ratio = 75
vm.dirty_ratio = 90
vm.dirty_expire_centisecs = 360000
vm.swappiness = 0
And in /etc/fstab I added the commit=3600 parameter.
According to free command, my pc usually stays on with 1 GB of RAM used by applications and about 2500 of available ram. So with my settings I should be able to write at least about 1500-2000 MB of data without writing actually on the disk.
I have done some tests with moderate writes (300MB - 1000MB) and with free and cat /proc/meminfo | grep Dirty commands I noticed that often a few time later these writes (far less that dirty_expire_centisecs time), the dirty bytes go down to a value next to 0.
I suspect that the subsequent read operations fill the cache until the machine is near a OOM condition and is forced to flush dirty writes ignoring my sysctl.conf settings (correct me if my hypothesis is wrong).
So the question is: is it possible disabling only read caching (AFAIK not possible), or at least change pagecache replace policy, giving more priority to write cache, so that read cache can not force a writes flushing (maybe tweaking kernel source code...)? I know that I can solve easily this problem using tmpfs or union-fs like AUFS or OverlayFS, but for many reason I would like to avoid them.
Sorry for my bad english, I hope you understand my question. Thank you.

Does link/rm/mv sync dentry metadata immediately when finished?

Does link/rm/mv sync dentry metadata to permanent storage immediately when finished? If not, when?

From https://www.kernel.org/doc/Documentation/sysctl/vm.txt, I get this:
drop_caches
Writing to this will cause the kernel to drop clean caches, as well as
reclaimable slab objects like dentries and inodes. Once dropped,
their memory becomes free.
To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches
This is a non-destructive operation and will not free any dirty
objects. To increase the number of objects freed by this operation,
the user may run `sync' prior to writing to /proc/sys/vm/drop_caches.
This will minimize the number of dirty objects on the system and
create more candidates to be dropped.
This file is not a means to control the growth of the various kernel
caches (inodes, dentries, pagecache, etc...) These objects are
automatically reclaimed by the kernel when memory is needed elsewhere
on the system.
Use of this file can cause performance problems. Since it discards
cached objects, it may cost a significant amount of I/O and CPU to
recreate the dropped objects, especially if they were under heavy use.
Because of this, use outside of a testing or debugging environment is
not recommended.
You may see informational messages in your kernel log when this file
is used:
cat (1234): drop_caches: 3
These are informational only. They do not mean that anything is wrong
with your system. To disable them, echo 4 (bit 3) into drop_caches.

Cache memory occupied in RHEL

I am running my app servers (one instance each of Karaf, Tomcat, Mongo and Zookeeper) in a RHEL environment and often see that (using free -m) of my total 12GB RAM almost 8GM is shown as cached. The app slows down as well. Why is this happening. I even tried to stop all of these services gracefully until i have only the Linux OS alone running on my box. Even then the cache is not freed. I have to manually free it to bring it down.
Why is the cache being accumulated like this and Does it have something to do with my application? Is it a good practise to run a chron job like this just to free the cache?

Try clearing the cache.
#sync; echo 3 > /proc/sys/vm/drop_caches

if you are talking about the "cache" from the last column here:
$ free -m
total used free shared buffers cached
Mem: 3954 3580 374 0 1 1448
then there is no reason to clear it. This cache is absolutely harmless, it retains (caches) for example previously opened files for faster access. When more memory is needed, this cache is automatically cleared. There is no reason why this cache would slow down any apps.
Update: Some apps store temporary files in memory, /dev/shm is usually a place for this, but you can check these on your system using:
$ mount|grep tmpfs
These files also show up in the cached column, but this data is not harmless in the sense that it cannot be dropped when more free memory is needed.

How can I limit the cache used by copying so there is still memory available for other caches?

Basic situation:
I am copying some NTFS disks in openSUSE. Each one is 2 TB. When I do this, the system runs slow.
My guesses:
I believe it is likely due to caching. Linux decides to discard useful caches (for example, KDE 4 bloat, virtual machine disks, LibreOffice binaries, Thunderbird binaries, etc.) and instead fill all available memory (24 GB total) with stuff from the copying disks, which will be read only once, then written and never used again. So then any time I use these applications (or KDE 4), the disk needs to be read again, and reading the bloat off the disk again makes things freeze/hiccup.
Due to the cache being gone and the fact that these bloated applications need lots of cache, this makes the system horribly slow.
Since it is USB, the disk and disk controller are not the bottleneck, so using ionice does not make it faster.
I believe it is the cache rather than just the motherboard going too slow, because if I stop everything copying, it still runs choppy for a while until it recaches everything.
And if I restart the copying, it takes a minute before it is choppy again. But also, I can limit it to around 40 MB/s, and it runs faster again (not because it has the right things cached, but because the motherboard busses have lots of extra bandwidth for the system disks). I can fully accept a performance loss from my motherboard's I/O capability being completely consumed (which is 100% used, meaning 0% wasted power which makes me happy), but I can't accept that this caching mechanism performs so terribly in this specific use case.
# free
total used free shared buffers cached
Mem: 24731556 24531876 199680 0 8834056 12998916
-/+ buffers/cache: 2698904 22032652
Swap: 4194300 24764 4169536
I also tried the same thing on Ubuntu, which causes a total system hang instead. ;)
And to clarify, I am not asking how to leave memory free for the "system", but for "cache". I know that cache memory is automatically given back to the system when needed, but my problem is that it is not reserved for caching of specific things.
Is there some way to tell these copy operations to limit memory usage so some important things remain cached, and therefore any slowdowns are a result of normal disk usage and not rereading the same commonly used files? For example, is there a setting of max memory per process/user/file system allowed to be used as cache/buffers?

The nocache command is the general answer to this problem! It is also in Debian and Ubuntu 13.10 (Saucy Salamander).
Thanks, Peter, for alerting us to the --drop-cache" option in rsync. But that was rejected upstream (Bug 9560 – drop-cache option), in favor of a more general solution for this: the new "nocache" command based on the rsync work with fadvise.
You just prepend "nocache" to any command you want. It also has nice utilities for describing and modifying the cache status of files. For example, here are the effects with and without nocache:
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]
$ ./nocache cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]\
$ cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 1945/1945 (100.0%) [filesize=7776.2K, pagesize=4K]
So hopefully that will work for other backup programs (rsnapshot, duplicity, rdiff-backup, amanda, s3sync, s3ql, tar, etc.) and other commands that you don't want trashing your cache.

Kristof Provost was very close, but in my situation, I didn't want to use dd or write my own software, so the solution was to use the "--drop-cache" option in rsync.
I have used this many times since creating this question, and it seems to fix the problem completely. One exception was when I am using rsync to copy from a FreeBSD machine, which doesn't support "--drop-cache". So I wrote a wrapper to replace the /usr/local/bin/rsync command, and remove that option, and now it works copying from there too.
It still uses huge amount of memory for buffers and seems to keep almost no cache, but it works smoothly anyway.
$ free
total used free shared buffers cached
Mem: 24731544 24531576 199968 0 15349680 850624
-/+ buffers/cache: 8331272 16400272
Swap: 4194300 602648 3591652

You have practically two choices:
Limit the maximum disk buffer size: the problem you're seeing is probably caused by default kernel configuration that allows using huge piece of RAM for disk buffering and, when you try to write lots of stuff to a really slow device, you'll end up lots of your precious RAM for disk caching to that slow the device.
The kernel does this because it assumes that the processes can continue to do stuff when they are not slowed down by the slow device and that RAM can be automatically freed if needed by simply writing the pages on storage (the slow USB stick - but the kernel doesn't consider the actual performance of that process). The quick fix:
# Wake up background writing process if there's more than 50 MB of dirty memory
echo 50000000 > /proc/sys/vm/dirty_background_bytes
# Limit background dirty bytes to 200 MB (source: http://serverfault.com/questions/126413/limit-linux-background-flush-dirty-pages)
echo 200000000 > /proc/sys/vm/dirty_bytes
Adjust the numbers to match the RAM you're willing to spend on disk write cache. A sensible value depends on your actual write performance, not the amount of RAM you have. You should target on having barely enough RAM for caching to allow full write performance for your devices. Note that this is a global setting, so you have to set this according to the slowest devices you're using.
Reserve a minimum memory size for each task you want to keep going fast. In practice this means creating cgroups for stuff you care about and defining the minimum memory you want to have for any such group. That way, the kernel can use the remaining memory as it sees fit. For details, see this presentation: SREcon19 Asia/Pacific - Linux Memory Management at Scale: Under the Hood
Update year 2022:
You can also try creating new file /etc/udev/rules.d/90-set-default-bdi-max_ratio-and-min_ratio.rules with the following contents:
# For every BDI device, set max cache usage to 30% and min reserved cache to 2% of the whole cache
# https://unix.stackexchange.com/a/481356/20336
ACTION=="add|change", SUBSYSTEM=="bdi", ATTR{max_ratio}="30", ATTR{min_ratio}="2"
The idea is to put limit per device for maximum cache utilization. With the above limit (30%) you can have two totally stalled devices and still have 40% of the disk cache available for the rest of the system. If you have 4 or more stalled devices in parallel, even this workaround cannot help alone. That's why I have also added minimum cache space of 2% for every device but I don't know how to check if this actually effective. I've been running with this config for about half a year and I think it's working nicely.
See https://unix.stackexchange.com/a/481356/20336 for details.

The kernel can not know that you won't use the cached data from copying again. This is your information advantage.
But you could set the swapiness to 0: sudo sysctl vm.swappiness=0. This will cause Linux to drop the cache before libraries, etc. are written to the swap.
It works nice for me too, especially very performant in combination with huge amount of RAM (16-32 GB).

It's not possible if you're using plain old cp, but if you're willing to reimplement or patch it yourself, setting posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE) on both input and output file will probably help.
posix_fadvise() tells the kernel about your intended access pattern. In this case, you'd only use the data once, so there isn't any point in caching it.
The Linux kernel honours these flags, so it shouldn't be caching the data any more.

Try using dd instead of cp.
Or mount the filesystem with the sync flag.
I'm not completely sure if these methods bypass the swap, but it may be worth giving a try.

I am copying some NTFS disks [...] the system runs slow. [...]
Since it is USB [...]
The slowdown is a known memory management issue.
Use a newer Linux Kernel. The older ones have a problem with USB data and "Transparent Huge Pages". See this LWN article. Very recently this issue was addressed - see "Memory Management" in LinuxChanges.

Ok, now that I know that you're using rsync and I could dig a bit more:
It seems that rsync is ineffective when used with tons of files at the same time. There's an entry in their FAQ, and it's not a Linux/cache problem. It's an rsync problem eating too much RAM.
Googling around someone recommended to split the syncing in multiple rsync invocations.

How to clean caches used by the Linux kernel

I want to force the Linux kernel to allocate more memory to applications after the cache starts taking up too much memory (as can be seen by the output of 'free').
I've run
sudo sync; sudo sysctl -w vm.drop_caches=3; free
(to free both disc dentry/inode cache and page cache) and I see that only about half of the used cache was freed - the rest remains. How can I tell what is taking up the rest of the cache and force it to be freed?

You may want to increase vfs_cache_pressure as well as set swappiness to 0.
Doing that will make the kernel reclaim cache faster, while giving processes equal or more favor when deciding what gets paged out.
You may only want to do this if processes you care about do very little disk I/O.
If a network I/O bound process has to swap in to serve requests, that's a problem and the real solution is to put it on a less competitive server.
With the default swappiness setting, the kernel is almost always going to favour keeping FS related cache in real memory.
As such, if you increase the cache pressure, be sure to equally adjust swappiness.

The contents of /proc/meminfo tell you what the kernel uses RAM for.
You can use /proc/sys/vm/vfs_cache_pressure to force the kernel to reclaim memory that is used for filesystem-related caches more lazily or eagerly.
Note that your application may only benefit from tuning this parameter if it does little or no disk I/O.

You might find John Nilsson's answer to my Question useful for purging the cache in order to test whether that is related to your problem:
sync && echo 1 > /proc/sys/vm/drop_caches
Though I'm guessing the only real difference is 1 vs 3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string