Linux Shared Memory allocation behaviour

Linux Shared Memory allocation behaviour - linux

First a little background: we used this tutorial to place our Minecraft world folder (which won't get bigger than 150MB or so with our settings) inside the Linux shared memory folder. We back it up every 10 minutes to the HDD using rsync. This should reduce the amount of I/O operations the HDD (a single 1TB drive) has to endure.
We set this up yesterday evening, but we forgot that every Tuesday night our backup program starts running too (which backs up the entire server to another machine on the network). Normally this isn't a problem, but this time our server went into a coma. It started swapping memory because it ran out of RAM.
Now I find this a little odd, since I'd think the shared memory would only allocate 150MB on the RAM to store the data on. With 4GB installed, you'd think that it doesn't matter that much.
My question is: does the Shared Memory actually allocate as much space on the RAM as the amount of data you put on it, or does it behave differently (like it reserves larger blocks)? I'm having a hard time finding info on the net about this.
If you could give some other tips on why the server might've freaked out, please do.

You can probably find out what you want by using
df /dev/shm
du -shc /dev/shm/minecraft/world/*
Or you can use the better road and create a separate mount, and limit it:
mkdir /tmp/minecraft
sudo mount -o size=150M,noexec,nodev -t tmpfs none /tmp/minecraft
HTH
PS: it is entirely possible to setup this mount from fstab, e.g.
none /mnt/minecraft tmpfs auto,size=400M,noexec,nodev 0 0

Related

inotifywait secretely consumes a lot of memory

On Ubuntu 14.04:
$ cat /proc/sys/fs/inotify/max_queued_events
16384
$ cat /proc/sys/fs/inotify/max_user_instances
128
$ cat /proc/sys/fs/inotify/max_user_watches
1048576
Right after computer restart I had 1GB of RAM consumed. After 20-30 minutes (having just 1 terminal opened) I had 6GB RAM used and growing, however none of the processes seemed to be using so much memory (according to htop and top). When I've killed inotifywait process memory was not freed but stopped growing. Then I've restarted PC, killed inotifywait right away and memory usage stopped at 1GB.
I have 2 hard drives, one is 1TB and second is 2TB. Was inotifywait somehow caching those or in general is it normal that it caused such behavior?

This is the Linux disk cache at work, it is not a memory leak in inotifywait or anything else.

I've accepted previous answer because it explains what's going on. However I have some second thoughts regarding this topic. What the page is basically saying is "caching doesn't occupy memory because this memory can at any point be taken back, so you should not worry, you should not panic, you should be grateful!". Well ... I'm not. I believe there should be some decent but at the same time hard limit for caching.
The idea behind this process was good -> "let's not waste user time and cache everything till we have space". However what this process is actually doing in my case is wasting my time. Currently I'm working on Linux which is running in virtual machine. Since I have a lot jira pages opened, a lot terminals in many desktops, some tools opened and etc I don't want to open all that stuff every day, so I just save virtual machine state instead of turning it off at the end of the day. Now let's say that my stuff occupies 4GB RAM. What will happen is that 4GB will be written into my hard drive after I save state and then 4GB will have to be loaded into RAM when I start virtual machine. Unfortunately that's only theory. In practice due to inotifywait which will happily fill my 32 GB RAM I have to wait 8 times longer for saving and restoring virtual machine. Yes my RAM is "fine" as the page is saying, yes I can open different app and I will not hit OOM but at the same time caching is wasting my time.
If the limit was decent let's say 1GB for caching then it would not be so painful. I think if I would have VM on HDD instead of SSD it would took forever to save the state and I would probably not use it at all.

Cache memory occupied in RHEL

I am running my app servers (one instance each of Karaf, Tomcat, Mongo and Zookeeper) in a RHEL environment and often see that (using free -m) of my total 12GB RAM almost 8GM is shown as cached. The app slows down as well. Why is this happening. I even tried to stop all of these services gracefully until i have only the Linux OS alone running on my box. Even then the cache is not freed. I have to manually free it to bring it down.
Why is the cache being accumulated like this and Does it have something to do with my application? Is it a good practise to run a chron job like this just to free the cache?

Try clearing the cache.
#sync; echo 3 > /proc/sys/vm/drop_caches

if you are talking about the "cache" from the last column here:
$ free -m
total used free shared buffers cached
Mem: 3954 3580 374 0 1 1448
then there is no reason to clear it. This cache is absolutely harmless, it retains (caches) for example previously opened files for faster access. When more memory is needed, this cache is automatically cleared. There is no reason why this cache would slow down any apps.
Update: Some apps store temporary files in memory, /dev/shm is usually a place for this, but you can check these on your system using:
$ mount|grep tmpfs
These files also show up in the cached column, but this data is not harmless in the sense that it cannot be dropped when more free memory is needed.

How can I limit the cache used by copying so there is still memory available for other caches?

Basic situation:
I am copying some NTFS disks in openSUSE. Each one is 2 TB. When I do this, the system runs slow.
My guesses:
I believe it is likely due to caching. Linux decides to discard useful caches (for example, KDE 4 bloat, virtual machine disks, LibreOffice binaries, Thunderbird binaries, etc.) and instead fill all available memory (24 GB total) with stuff from the copying disks, which will be read only once, then written and never used again. So then any time I use these applications (or KDE 4), the disk needs to be read again, and reading the bloat off the disk again makes things freeze/hiccup.
Due to the cache being gone and the fact that these bloated applications need lots of cache, this makes the system horribly slow.
Since it is USB, the disk and disk controller are not the bottleneck, so using ionice does not make it faster.
I believe it is the cache rather than just the motherboard going too slow, because if I stop everything copying, it still runs choppy for a while until it recaches everything.
And if I restart the copying, it takes a minute before it is choppy again. But also, I can limit it to around 40 MB/s, and it runs faster again (not because it has the right things cached, but because the motherboard busses have lots of extra bandwidth for the system disks). I can fully accept a performance loss from my motherboard's I/O capability being completely consumed (which is 100% used, meaning 0% wasted power which makes me happy), but I can't accept that this caching mechanism performs so terribly in this specific use case.
# free
total used free shared buffers cached
Mem: 24731556 24531876 199680 0 8834056 12998916
-/+ buffers/cache: 2698904 22032652
Swap: 4194300 24764 4169536
I also tried the same thing on Ubuntu, which causes a total system hang instead. ;)
And to clarify, I am not asking how to leave memory free for the "system", but for "cache". I know that cache memory is automatically given back to the system when needed, but my problem is that it is not reserved for caching of specific things.
Is there some way to tell these copy operations to limit memory usage so some important things remain cached, and therefore any slowdowns are a result of normal disk usage and not rereading the same commonly used files? For example, is there a setting of max memory per process/user/file system allowed to be used as cache/buffers?

The nocache command is the general answer to this problem! It is also in Debian and Ubuntu 13.10 (Saucy Salamander).
Thanks, Peter, for alerting us to the --drop-cache" option in rsync. But that was rejected upstream (Bug 9560 – drop-cache option), in favor of a more general solution for this: the new "nocache" command based on the rsync work with fadvise.
You just prepend "nocache" to any command you want. It also has nice utilities for describing and modifying the cache status of files. For example, here are the effects with and without nocache:
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]
$ ./nocache cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 154/1945 (7.9%) [filesize=7776.2K, pagesize=4K]\
$ cp ~/file.mp3 /tmp
$ ./cachestats ~/file.mp3
pages in cache: 1945/1945 (100.0%) [filesize=7776.2K, pagesize=4K]
So hopefully that will work for other backup programs (rsnapshot, duplicity, rdiff-backup, amanda, s3sync, s3ql, tar, etc.) and other commands that you don't want trashing your cache.

Kristof Provost was very close, but in my situation, I didn't want to use dd or write my own software, so the solution was to use the "--drop-cache" option in rsync.
I have used this many times since creating this question, and it seems to fix the problem completely. One exception was when I am using rsync to copy from a FreeBSD machine, which doesn't support "--drop-cache". So I wrote a wrapper to replace the /usr/local/bin/rsync command, and remove that option, and now it works copying from there too.
It still uses huge amount of memory for buffers and seems to keep almost no cache, but it works smoothly anyway.
$ free
total used free shared buffers cached
Mem: 24731544 24531576 199968 0 15349680 850624
-/+ buffers/cache: 8331272 16400272
Swap: 4194300 602648 3591652

You have practically two choices:
Limit the maximum disk buffer size: the problem you're seeing is probably caused by default kernel configuration that allows using huge piece of RAM for disk buffering and, when you try to write lots of stuff to a really slow device, you'll end up lots of your precious RAM for disk caching to that slow the device.
The kernel does this because it assumes that the processes can continue to do stuff when they are not slowed down by the slow device and that RAM can be automatically freed if needed by simply writing the pages on storage (the slow USB stick - but the kernel doesn't consider the actual performance of that process). The quick fix:
# Wake up background writing process if there's more than 50 MB of dirty memory
echo 50000000 > /proc/sys/vm/dirty_background_bytes
# Limit background dirty bytes to 200 MB (source: http://serverfault.com/questions/126413/limit-linux-background-flush-dirty-pages)
echo 200000000 > /proc/sys/vm/dirty_bytes
Adjust the numbers to match the RAM you're willing to spend on disk write cache. A sensible value depends on your actual write performance, not the amount of RAM you have. You should target on having barely enough RAM for caching to allow full write performance for your devices. Note that this is a global setting, so you have to set this according to the slowest devices you're using.
Reserve a minimum memory size for each task you want to keep going fast. In practice this means creating cgroups for stuff you care about and defining the minimum memory you want to have for any such group. That way, the kernel can use the remaining memory as it sees fit. For details, see this presentation: SREcon19 Asia/Pacific - Linux Memory Management at Scale: Under the Hood
Update year 2022:
You can also try creating new file /etc/udev/rules.d/90-set-default-bdi-max_ratio-and-min_ratio.rules with the following contents:
# For every BDI device, set max cache usage to 30% and min reserved cache to 2% of the whole cache
# https://unix.stackexchange.com/a/481356/20336
ACTION=="add|change", SUBSYSTEM=="bdi", ATTR{max_ratio}="30", ATTR{min_ratio}="2"
The idea is to put limit per device for maximum cache utilization. With the above limit (30%) you can have two totally stalled devices and still have 40% of the disk cache available for the rest of the system. If you have 4 or more stalled devices in parallel, even this workaround cannot help alone. That's why I have also added minimum cache space of 2% for every device but I don't know how to check if this actually effective. I've been running with this config for about half a year and I think it's working nicely.
See https://unix.stackexchange.com/a/481356/20336 for details.

The kernel can not know that you won't use the cached data from copying again. This is your information advantage.
But you could set the swapiness to 0: sudo sysctl vm.swappiness=0. This will cause Linux to drop the cache before libraries, etc. are written to the swap.
It works nice for me too, especially very performant in combination with huge amount of RAM (16-32 GB).

It's not possible if you're using plain old cp, but if you're willing to reimplement or patch it yourself, setting posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE) on both input and output file will probably help.
posix_fadvise() tells the kernel about your intended access pattern. In this case, you'd only use the data once, so there isn't any point in caching it.
The Linux kernel honours these flags, so it shouldn't be caching the data any more.

Try using dd instead of cp.
Or mount the filesystem with the sync flag.
I'm not completely sure if these methods bypass the swap, but it may be worth giving a try.

I am copying some NTFS disks [...] the system runs slow. [...]
Since it is USB [...]
The slowdown is a known memory management issue.
Use a newer Linux Kernel. The older ones have a problem with USB data and "Transparent Huge Pages". See this LWN article. Very recently this issue was addressed - see "Memory Management" in LinuxChanges.

Ok, now that I know that you're using rsync and I could dig a bit more:
It seems that rsync is ineffective when used with tons of files at the same time. There's an entry in their FAQ, and it's not a Linux/cache problem. It's an rsync problem eating too much RAM.
Googling around someone recommended to split the syncing in multiple rsync invocations.

Why the process is getting killed at 4GB?

I have written a program which works on huge set of data. My CPU and OS(Ubuntu) both are 64 bit and I have got 4GB of RAM. Using "top" (%Mem field), I saw that the process's memory consumption went up to around 87% i.e 3.4+ GB and then it got killed.
I then checked how much memory a process can access using "uname -m" which comes out to be "unlimited".
Now, since both the OS and CPU are 64 bit and also there exists a swap partition, the OS should have used the virtual memory i.e [ >3.4GB + yGB from swap space ] in total and only if the process required more memory, it should have been killed.
So, I have following ques:
How much physical memory can a process access theoretically on 64 bit m/c. My answer is 2^48 bytes.
If less than 2^48 bytes of physical memory exists, then OS should use virtual memory, correct?
If ans to above ques is YES, then OS should have used SWAP space as well, why did it kill the process w/o even using it. I dont think we have to use some specific system calls which coding our program to make this happen.
Please suggest.

It's not only the data size that could be the reason. For example, do ulimit -a and check the max stack size. Have you got a kill reason? Set 'ulimit -c 20000' to get a core file, it shows you the reason when you examine it with gdb.

Check with file and ldd that your executable is indeed 64 bits.
Check also the resource limits. From inside the process, you could use getrlimit system call (and setrlimit to change them, when possible). From a bash shell, try ulimit -a. From a zsh shell try limit.
Check also that your process indeed eats the memory you believe it does consume. If its pid is 1234 you could try pmap 1234. From inside the process you could read the /proc/self/maps or /proc/1234/maps (which you can read from a terminal). There is also the /proc/self/smaps or /proc/1234/smaps and /proc/self/status or /proc/1234/status and other files inside your /proc/self/ ...
Check with  free that you got the memory (and the swap space) you believe. You can add some temporary swap space with swapon /tmp/someswapfile (and use mkswap to initialize it).
I was routinely able, a few months (and a couple of years) ago, to run a 7Gb process (a huge cc1 compilation), under Gnu/Linux/Debian/Sid/AMD64, on a machine with 8Gb RAM.
And you could try with a tiny test program, which e.g. allocates with malloc several memory chunks of e.g. 32Mb each. Don't forget to write some bytes inside (at least at each megabyte).
standard C++ containers like std::map or std::vector are rumored to consume more memory than what we usually think.
Buy more RAM if needed. It is quite cheap these days.

In what can be addressed literally EVERYTHING has to fit into it, including your graphics adaptors, OS kernel, BIOS, etc. and the amount that can be addressed can't be extended by SWAP either.
Also worth noting that the process itself needs to be 64-bit also. And some operating systems may become unstable and therefore kill the process if you're using excessive RAM with it.

What is the best way to partition terabyte drive in a linux development machine? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have a new 1 TB drive coming in tomorrow. What is the best way to divide this space for a development workstation?
The biggest problem I think I'm going to have is that some partitions (probably /usr) will become to small after a bit of use. Other partitions are probably to huge. The swap drive for example is currently 2GB (2x 1GB RAM), but it is almost never used (only once that I know of).

If you partition your drive using LVM you won't have to worry about any individual partition running out of space in the future. Just move space around as necessary.

My standard strategy for normal "utility" boxes is to give them a swap partition twice the size of their RAM, a 1GB /boot partition and leave the rest as one vast partition. Whilst I see why some people want a separate /var, separate /home, etc., if I only have trusted users and I'm not running some production service, I don't think the reasons I've heard to date apply. Instead, I do my best to avoid any resizing, or any partition becoming too small - which is best achieved with one huge partition.
As for the size of swap and /boot - if your machine has 4GB memory, you may not want to have double that in swap. It's nonetheless wise to at least have some. Even if you nonetheless have double, you're using a total of 9GB, for 0.9% of your new drive. /boot can be smaller than 1GB, this is just my standard "will not become full, ever" size.

If you want a classic setup, I'd go for a 50GB "/" partition, for all your application goodness, and split the rest across users, or a full 950GB for a single user. Endless diskspace galore!

#wvdschel:
Don't create separate partitions for each user. Unused space on each partition is wasted.
Instead create one partition for all users. Use quota if necessary to limit each user's space. It's much more flexible than partitioning or LVM.
OTOH, one huge partition is usually a bit slower, depending on the file system.

I always setup LVM on Linux, and use the following layout to start with:
/ = 10GB
swap = 4GB
/boot = 100MB
/var = 5GB
/home = 10GB OR remainder of drive.
And then, later on if I need more space, I could simply increase /home, /var or / as needed. Since I work a lot with XEN Virtual Machines, I tend to leave the remaining space open so that I can quickly create LVM volumes for the XEN virtual machines.

Did you know 1TB can easily take up to half an hour to fsck? Workstations usually crash and reboot more often than servers, so that can get quite annoying. Do you really need all that space?

I would go with a 1 GB for /boot, 100 GB for /, and the rest for /home. 1 GB is probably too high for /boot, but it's not like you'll miss it. 100 GB might seem like a lot for everything outside home, until you start messing around with Databases and realize that MySQL keeps databases in /var. Best to leave some room to grow in that area. The reason that I recommend using a separtate partition for /home, is that when you want to completely switch distros, or if the upgrade option on your distro of choice, for whatever reason doesn't work, or if you just want to start from scratch and do a clean system install, you can just format / and /boot, and leave home with all the user data intact.

I would have two partitions. A small one (~20 GB) mounted on / would store all your programs, and then have a large one on /home. Many people have mentioned a partition for /boot but that is not really necessary. If you are worried about resizing, use LVM.

i give 40gb to / then how ever much ram i have i give the same to /swap then the rest to /home

Please tell me what are you doing to /boot that you need more than 64MB on it? Unless you never intend to clean it, anything more is a waste of space. Kernel image + initrd + System.map won't take more than 10MB (probably less - mine weight 5MB) and you really don't need to keep more than two spares.
And with the current prices of RAM - if you are needing swap, you'll be much better off buying more memory. Reserve 1GB for swap and have something monitoring it's usage (no swap at all is bad idea because the machine might lock up when it runs out of free memory).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string