How do you prevent a long-running memory-intensive tar-based backup script from getting killed?
I have a cron job that runs daily a command like:
tar --create --verbose --preserve-permissions --gzip --file "{backup_fn}" {excludes} / 2> /var/log/backup.log
It writes to an external USB drive. Normally the file generated is 100GB, but after I upgraded to Ubuntu 16, now the log file shows the process gets killed about 25% of the way through, presumably because it's consuming a lot of memory and/or putting the system under too much load.
How do I tell the kernel not to kill this process, or tweak it so it doesn't consume so many resources that it needs to be killed?
If you are certain about the fact that - the gets killed due to consuming too much memory, then you can try increasing the swappiness value in /proc/sys/vm/swappiness. By increasing swappiness you might able to get away from this scenario. You can also try tuning oom_kill_allocating_task, default is 0 , which tries to find out the rouge memory-hogging task and kills that one. If you change that one to 1, oom_killer will kill the calling task.
If none of the above works then you can try oom_score_adj under /proc/$pid/oom_score_adj. oom_score_adj accepts value range from -1000 to 1000. Lower the value less likely to be killed by oom_killer. If you set this value to -1000 then it disables oom killing. But, you should know what exactly you are doing.
Hope this will give you some idea.
Related
i have an script that is taking a lot of memory to use and it is just "killed" or cannot be assigned more memory. When i run that same script on windows it consumes my whole memory and freezes my pc. I want the same to happen in my linux server instead of killing the process.
I have tried changinc vm.overcommit_memory to 0, 1 and 2 but none of them work, i tried some other things like disabling the oom killer in linux but cant find the value vm.oom-killer, please help
update:
Another solution would be to limit the memory it is using for example 10gb but if it exceedes dont consume more or kill the process, just let if finish.
We're working with a reasonably busy web server. We wanted to use rsync to do some data-moving which was clearly going to hammer the magnetic disk, so we used ionice to put the rsync process in the idle class. The queues for both disks on the system (SSD+HDD) are set to use the CFQ scheduler.
The result... was that the disk was absolutely hammered and the website performance was appalling.
I've done some digging to see if any tuning might help with this.
The man page for ionice says:
Idle: A program running with idle I/O priority will only get disk time
when no other program has asked for disk I/O for a defined grace period.
The impact of an idle I/O process on normal system activity should be zero.
This "defined grace period" is not clearly explained anywhere I can find with the help of Google. One posting suggest that it's the value of fifo_expire_async but I can't find any real support for this.
However, on our system, both fifo_expire_async and fifo_expire_sync are set sufficiently long (250ms, 125ms, which are the defaults) that the idle class should actually get NO disk bandwidth at all. Even if the person who believes that the grace period is set by fifo_expire_async is plain wrong, there's not a lot of wiggle-room in the statement "The impact of an idle I/O process on normal system activity should be zero".
Clearly this is not what's happening on our machine so I am wondering if CFQ+idle is simply broken.
Has anyone managed to get it to work? Tips greatly appreciated!
Update:
I've done some more testing today. I wrote a small Python app to read random sectors from all over the disk with short sleeps in between. I ran a copy of this without ionice and set it up to perform around 30 reads per second. I then ran a second copy of the app with various ionice classes to see if the idle class did what it said on the box. I saw no difference at all between the results when I used classes 1, 2, 3 (real-time, best-effort, idle). This, despite the fact that I'm now absolutely certain that the disk was busy.
Thus, I'm now certain that - at least for our setup - CFQ+idle does not work. [see Update 2 below - it's not so much "does not work" as "does not work as expected"...]
Comments still very welcome!
Update 2:
More poking about today. Discovered that when I push the I/O rate up dramatically, the idle-class processes DO in fact start to become starved. In my testing, this happened at I/O rates hugely higher than I had expected - basically hundreds of I/Os per second. I'm still trying to work out what the tuning parameters do...
I also discovered the rather important fact that async disk writes aren't included at all in the I/O prioritisation system! The ionice manpage I quoted above makes no reference to that fact, but the manpage for the syscall ioprio_set() helpfully states:
I/O priorities are supported for reads and for synchronous (O_DIRECT,
O_SYNC) writes. I/O priorities are not supported for asynchronous
writes because they are issued outside the context of the program
dirtying the memory, and thus program-specific priorities do not
apply.
This pretty significantly changes the way I was approaching the performance issues and I will be proposing an update for the ionice manpage.
Some more info on kernel and iosched settings (sdb is the HDD):
Linux 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux
/etc/debian_version = 9.3
(cd /sys/block/sdb/queue/iosched; grep . *)
back_seek_max:16384
back_seek_penalty:2
fifo_expire_async:250
fifo_expire_sync:125
group_idle:8
group_idle_us:8000
low_latency:1
quantum:8
slice_async:40
slice_async_rq:2
slice_async_us:40000
slice_idle:8
slice_idle_us:8000
slice_sync:100
slice_sync_us:100000
target_latency:300
target_latency_us:300000
AFAIK, the only opportunity to solve your problem is using CGroup v2 (kernel v. 4.5 or newer). Please see the following article:
https://andrestc.com/post/cgroups-io/
Also please note, that you may use the systemd's wrappers to configure CGroup limits on per-service basis:
http://0pointer.de/blog/projects/resources.html
Add nocache to that and you're set (you can join it with ionice and nice):
https://github.com/Feh/nocache
On Ubuntu install with:
apt install nocache
It simply omits cache on IO and thanks to that other processes won't starve when the cache is flushed.
It's like calling the commands with O_DIRECT, so now you can limit the IO for example with:
systemd-run --scope -q --nice=19 -p BlockIOAccounting=true -p BlockIOWeight=10 -p "BlockIOWriteBandwidth=/dev/sda 10M" nocache youroperation_here
I usually use it with:
nice -n 19 ionice -c 3 nocache youroperation_here
I have a Thecus N8900 NAS, which is a Linux based file server, providing files via NFS to six clients. For some reason that Thecus support has yet to explain, it runs a script that checks /proc/meminfo every 60 seconds and if the disk cache exceeds 50% of available RAM they do a "echo 3 > /proc/sys/vm/drop_caches" command to flush the cache.
Leaving aside the issue of whether that makes sense or not, the actual "echo 3 > /proc/sys/vm/drop_caches" command can take hours to complete, which seems way too long to me.
The big problem is that when this happens, the load on the machine spikes, as does the disk utilization, making all NFS traffic crawl until the command finally completes, at which point things are responsive again.
The NAS itself has 16 gigs of RAM, 7 drives in a raid6 configuration (plus a hot spare), no drive problems at all (according to S.M.A.R.T. tests).
So the question is: what would cause the drop_caches command to take so long?
The command itself should complete instantaneously. The consequences, i.e. everything needs to be cached again, can take a lot of time. It doesn't make sense: if you can remove it completely it would be a good idea. (Also this is off topic in StackOverflow)
Edit: does it executes also a sync before echo 3 > /proc/sys/vm/drop_caches, such in
sync; echo 3 > /proc/sys/vm/drop_caches? Because the sync operation, which flushes all writes to the disk, may take a bit to complete. Also, while also the sync have performance issue, it may have some sense, in case of sudden power failure the data has been written to the disk already so you are going to be safe.
I have a LINUX server running a process with a large memory footprint (some sort of a database engine). The memory allocated by this process is so large that part of it needs to be swapped (paged) out.
What I would like to do is to lock the memory pages of all the other processes (or a subset of the running processes) in memory, so that only the pages of the database process get swapped out. For example I would like to make sure that i can continue to connect remotely and monitor the machine without having the processes impacted by swapping. I.e. I want sshd, X, top, vmstat, etc to have all pages memory resident.
On linux there are the mlock(), mlockall() system calls that seem to offer the right knob to do the pinning. Unfortunately, it seems to me that I need to make an explicit call inside every process and cannot invoke mlock() from a different process or from the parent (mlock() is not inherited after fork() or evecve()).
Any help is greatly appreciated. Virtual pizza & beer offered :-).
It has been a while since I've done this so I may have missed a few steps.
Make a GDB command file that contains something like this:
call mlockall(3)
detach
Then on the command line, find the PID of the process you want to mlock. Type:
gdb --pid [PID] --batch -x [command file]
If you get fancy with pgrep that could be:
gdb --pid $(pgrep sshd) --batch -x [command file]
Actually locking the pages of most of the stuff on your system seems a bit crude/drastic, not to mention being such an abuse of the mechanism it seems bound to cause some other unanticipated problems.
Ideally, what you probably actually want is to control the "swappiness" of groups of processes so the database is first in line to be swapped while essential system admin tools are the last, and there is a way of doing this.
While searching for mlockall information I ran across this tool. You may be able to find it for your distribution. I only found the man page.
http://linux.die.net/man/8/memlockd
Nowadays, the easy and right way to tackle the problem is cgroup.
Just restrict memory usage of database process:
1. create a memory cgroup
sudo cgcreate -g memory:$test_db -t $User:$User -a $User:$User
2. limit the group's RAM usage to 1G.
echo 1000M > /sys/fs/cgroup/memory/$test_db/memory.limit_in_bytes
or
echo 1000M > /sys/fs/cgroup/memory/$test_db/memory.soft_limit_in_bytes
3. run the database program in the $test_db cgroup
cgexec -g memory:$test_db $db_program_name
Sorry for the vague question, but I've just written some php code that executes itself as CLI, and I'm pretty sure it's misbehaving. When I run "top" on the command line it's showing very little resources given to any individual process, but between 40-98% to iowait time (%wa). I usually have about .7% distributed between %us and %sy, with the remaining resources going to idle processes (somewhere between 20-50% usually).
This server is executing MySQL queries in, easily, 300x the time it takes other servers to run the same query, and it even takes what seems like forever to log on via SSH... so despite there being some idle cpu time left over, it seems clear that something very bad is happening. Whatever scripts are running, are updating my MySQL database, but it seems to be exponentially slower then when they started.
I need some ideas to serve as launch points for me to diagnose what's going on.
Some things that I would like to know are:
How I can confirm how many scripts are actually running
Is there anyway to confirm that these scripts are actually shutting down when they are through, and not just "hanging around" taking up CPU time and memory?
What kind of bottlenecks should I be checking to make sure I don't create too many instances of this script so this doesn't happen again.
I realize this is probably a huge question, but I'm more then willing to follow any links provided and read up on this... I just need to know where to start looking.
High iowait means that your disk bandwidth is saturated. This might be just because you're flooding your MySQL server with too many queries, and it's maxing out the disk trying to load the data to execute them.
Alternatively, you might be running low on physical memory, causing large amounts of disk IO for swapping.
To start diagnosing, run vmstat 60 for 5 minutes and check the output - the si and so columns show swap-in and swap-out, and the bi and bo lines show other IO. (Edit your question and paste the output in for more assistance).
High iowait may mean you have a slow/defective disk. Try checking it out with a S.M.A.R.T. disk monitor.
http://www.linuxjournal.com/magazine/monitoring-hard-disks-smart
ps auxww | grep SCRIPTNAME
same.
Why are you running more than one instance of your script to begin with?