monitor and kill runaway processes using 100% IO? - linux

i have a few processes that have to be run at high priority (chrt 98) that will occasionally decide to hard-lock and peg 1 core at 100% (not a huge deal) but more importantly it will use all the IO on a system, so much that its impossible to log into the machine via ssh to kill it or perform any task on the machine that isn't loaded into ram. If i happen to have something like htop already running i am able to end the process fine. Is there any type of utility/way to monitor for this type of runaway process and kill anything that uses 100% of system IO for more than X amount of time? Thanks!

Can't you start the program with nice (and with a lower priority)? This way at least you should be able to ssh into the box and kill it easily.
The better solution would off course be to fix the behaviour of the offending process (details needed).
This serverfault thread also seems to contain what you ask for specifically.

Assuming that it's disk IO that the app is consuming, can you just move the filesystems it's accessing onto separate disks? That way you'll have IO to spare on the disks which the OS is installed on, and should be able to log in and manage (i.e. kill!) the process.

As another poster said, running your process with nice is the way to go, but you did mention that you want to run it at a high priority, which is odd... be aware that if you're running a process at the highest priority and it's pegged, your monitoring system might not even be able to kill it, unless your monitor is at a higher priority still. Anyway....
god, as well as several other process managment tools, can easily kill a process if it's misbehaving in any of several ways.. config looks like this - you set checks at a particular interval, and then you can say "after five checks, nuke it if it's been above 98% CPU usage consistently":
restart.condition(:cpu_usage) do |c|
c.above = 98.percent
c.times = 5
end
Another, different take that you might have a look at is chpst from the runit system - it allows you to elegantly set bounds on things (but for CPU limiting, nice is still the tool I'd reach for first).

Related

Force no more than one write/sync to disk in X seconds

I'm worried that I see through the disk LED and iotop quite some write activity every couple of seconds, mostly coming from the chromium's processes, on a completely idle system.
It doesn't make any sense at all to have such a high number of writes to disk, even less with SSD disks. The reads isn't a problem for me, also because I have plenty of disk cache on my 20gb RAM notebook.
The commit option (which is by default 30s) obviously isn't the solution. Tried to increase or even decrease and still see one write every couple of seconds.
So is there a way to force not more than one write per arbitrary interval?
At first check your linux is using CFQ scheduler. then you can use ionice to control I/O scheduling class and priority of a program.
It supports following three scheduling classes (quoting from the man page):
Idle : A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero. This scheduling class does not take a priority argument.
Best effort : This is the default scheduling class for any process that hasn't asked for a specific io priority. Programs inherit the CPU nice setting for io priorities. This class takes a priority argument from 0-7, with lower number being higher priority. Programs running at the same best effort priority are served in a round-robin fashion. This is usually recommended for most application.
Real time : The RT scheduling class is given first access to the disk, regardless of what else is going on in the system. Thus the RT class needs to be used with some care, as it can starve other processes. As with the best effort class, 8 priority levels are defined denoting how big a time slice a given process will receive on each scheduling window. This is should be avoided for all heavily loaded system.
ionice options PID
ionice options -p PID
ionice -c1 -n0 PID
for limiting more than this i think you should use of your SAN utilities.
Take a look at eatmydata (https://github.com/stewartsmith/libeatmydata).
It could be ok for you, but read all the docs and think twice before using it...
PSD - Profile Sync Daemon - is the specific solution for Chromium and other browsers https://wiki.archlinux.org/title/profile-sync-daemon
https://github.com/graysky2/profile-sync-daemon

These days, what are good reasons for setting thread affinity rather than leaving it to the OS?

Searching answers here for "thread affinity", I see a lot of interest in doing it but little justification for it save possibly getting stable QueryPerformanceTimer results.
Assuming a modern OS and a modern 2-4 socket workstation/server class machine with modern 4-6 core CPUs, what good reasons would anyone have for thinking they know better than their OS's scheduler ? Are there any real world situations where taking more control of thead affinity is the right thing to do ? What sort of performance benefits can be demonstrated ?
The last time I saw a really good case for setting thread affinity somewhere (as in, it was backed up by concrete results showing genuine and significant improvements in system performance), it was some obscure thing to do with Win2K device drivers. But I haven't seen anything like that in years so when someone tells me they need to control thread affinity (but not why) these days I am deeply sceptical... but curious to be shown otherwise.
The primary reason is if you have something that depends heavily upon caching. The OS scheduler doesn't necessarily take that into account to the degree you might like.
I use it to assign threads to cores; for example in a simulation you do the physics entirely on one core, and allow the rest of the computation to be executed on another one. It makes sense to be able to control this, if you're on a tight environment where you know the hardware.
Of course, configuring this needs to be done per system, so by default I let the OS decide the cores on which to run, but keep the option of restricting core usage.
In the OS kernel and sometimes in kernel mode drivers you need to perform the same action on every CPU (e.g. update a system register). You can do that in a loop in a single thread, changing the affinity on each iteration.
For desktops it's quite unnecessary.
But I can see some applications where it would help. For example the CPU cache likes it if the app that runs on it doesn't change.
Another possibility is you have a critical task - you give it an entire CPU, and the other tasks use the rest of the CPUs.
Or the opposite: You have some low priority tasks, you put them all on one CPU, then leave the others free for more important tasks (using process priority will give you most of this benefit without having affinity, but I can imagine some memory heavy cases where it wouldn't).
I would agree its best to leave to the CPU to figure this out in most situations. However, the most common reason to go for thread affinity as far as I have seen is when you need good cache dependency. In multiple CPU systems, when a particular CPU caches something individually for itself and if the same thing has been cached in some other CPU, then I believe it can automatically get invalidated on the other CPU. So if a particular thread keeps changing CPUs on which it executes, then the cache hit rate will be too less. So in this case I guess it makes sense for the programmer to be a better judge of the COU affinities.
I also think the above point by Ariel about making sure a critical task constantly gets a CPU without throttling other low priority processes also makes sense.

Two processes on two CPUs -- is it possible that they complete at exactly the same moment?

This is sort of a strange question that's been bothering me lately. In our modern world of multi-core CPUs and multi-threaded operating systems, we can run many processes with true hardware concurrency. Let's say I spawn two instances of Program A in two separate processes at the same time. Disregarding OS-level interference which may alter the execution time for either or both processes, is it possible for both of these processes to complete at exactly the same moment in time? Is there any specific hardware/operating-system mechanism that may prevent this?
Now before the pedants grill me on this, I want to clarify my definition of "exactly the same moment". I'm not talking about time in the cosmic sense, only as it pertains to the operation of a computer. So if two processes complete at the same time, that means that they complete
with a time difference that is so small, the computer cannot tell the difference.
EDIT : by "OS-level interference" I mean things like interrupts, various techniques to resolve resource contention that the OS may use, etc.
Actually, thinking about time in the "cosmic sense" is a good way to think about time in a distributed system (including multi-core systems). Not all systems (or cores) advance their clocks at exactly the same rate, making it hard to actually tell which events happened first (going by wall clock time). Because of this inability to agree, systems tend to measure time by logical clocks. Two events happen concurrently (i.e., "exactly at the same time") if they are not ordered by sharing data with each other or otherwise coordinating their execution.
Also, you need to define when exactly a process has "exited." Thinking in Linux, is it when it prints an "exiting" message to the screen? When it returns from main()? When it executes the exit() system call? When its process state is run set to "exiting" in the kernel? When the process's parent receives a SIGCHLD?
So getting back to your question (with a precise definition for "exactly at the same time"), the two processes can end (or do any other event) at exactly the same time as long as nothing coordinates their exiting (or other event). What counts as coordination depends on your architecture and its memory model, so some of the "exited" conditions listed above might always be ordered at a low level or by synchronization in the OS.
You don't even need "exactly" at the same time. Sometimes you can be close enough to seem concurrent. Even on a single core with no true concurrency, two processes could appear to exit at the same time if, for instance, two child processes exited before their parent was next scheduled. It doesn't matter which one really exited first; the parent will see that in an instant while it wasn't running, both children died.
So if two processes complete at the same time, that means that they complete with a time difference that is so small, the computer cannot tell the difference.
Sure, why not? Except for shared memory (and other resources, see below), they're operating independently.
Is there any specific hardware/operating-system mechanism that may prevent this?
Anything that is a resource contention:
memory access
disk access
network access
explicit concurrency management via locks/semaphores/mutexes/etc.
To be more specific: these are separate CPU cores. That means they have computing circuitry implemented in separate logic circuits. From the wikipedia page:
The fact that each core can have its own memory cache means that it is quite possible for most of the computation to occur as interaction of each core with its own cache. Once you have that, it's just a matter of probability. That's not to say that algorithms take a nondeterministic amount of time, but their inputs may come from a probabilistic distribution and the amount of time it takes to run is unlikely to be completely independent of input data unless the algorithm has been carefully designed to take the same amount of time.
Well I'm going to go with I doubt it:
Internally any sensible OS maintains a list of running processes.
It therefore seems sensible for us to define the moment that the process completes as the moment that it is removed from this list.
It also strikes me as fairly unlikely (but not impossible) that a typical OS will go to the effort to construct this list in such a way that two threads can independently remove an item from this list at exactly the same time (processes don't terminate that frequently and removing an item from a list is relatively inexpensive - I can't see any real reason why they wouldn't just lock the entire list instead).
Therefore for any two terminating processes A and B (where A terminates before B), there will always be a reasonably large time period (in a cosmic sense) where A has terminated and B has not.
That said it is of course possible to produce such a list, and so in reality it depends on the OS.
Also I don't really understand the point of this question, in particular what do you mean by
the computer cannot tell the difference
In order for the computer to tell the difference it has to be able to check the running process table at a point where A has terminated and B has not - if the OS schedules removing process B from the process table immediately after process A then it could very easily be that no such code gets a chance to execute and so by some definitions it isn't possible for the computer to tell the difference - this sutation holds true even on a single core / CPU processor.
Yes, without any OS Scheduling interference they could finish at the same time, if they don't have any resource contention (shared memory, external io, system calls). When either of them have a lock on a resource they will force the other to stall waiting for resource to free up.

Is it possible to "hang" a Linux box with a SCHED_FIFO process?

I want to have a real-time process take over my computer. :)
I've been playing a bit with this. I created a process which is essentially a while (1) (never blocks nor yields the processor) and used schedtool to run it with SCHED_FIFO policy (also tried chrt). However, the process was letting other processes run as well.
Then someone told me about sched_rt_runtime_us and sched_rt_period_us. So I set the runtime to -1 in order to make the real-time process take over the processor (and also tried making both values the same), but it didn't work either.
I'm on Linux 2.6.27-16-server, in a virtual machine with just one CPU. What am I doing wrong?
Thanks,
EDIT: I don't want a fork bomb. I just want one process to run forever, without letting other processes run.
There's another protection I didn't know about.
If you have just one processor and want a SCHED_FIFO process like this (one that never blocks nor yields the processor voluntarily) to monopolize it, besides giving it a high priority (not really necessary in most cases, but doesn't hurt) you have to:
Set sched_rt_runtime_us to -1 or to the value in sched_rt_period_us
If you have group scheduling configured, set /cgroup/cpu.rt_runtime_us to -1 (in case
you mount the cgroup filesystem on /cgroup)
Apparently, I had group scheduling configured and wasn't bypassing that last protection.
If you have N processors, and want your N processes to monopolize the processor, you just do the same but launch all of them from your shell (the shell shouldn't get stuck until you launch the last one, since it will have processors to run on). If you want to be really sure each process will go to a different processor, set its CPU affinity accordingly.
Thanks to everyone for the replies.
I'm not sure about schedtool, but if you successfully change the scheduler using sched_setscheduler to SCHED_FIFO, then run a task which does not block, then one core will be entirely allocated to the task. If this is the only core, no SCHED_OTHER tasks will run at all (i.e. anything except a few kernel threads).
I've tried it myself.
So I speculate that either your "non blocking" task was blocking, or your schedtool program failed to change the scheduler (or changed it for the wrong task).
Also You can make you process a SCHED_FIFO with highest priority of 1. So the process would run forever and it wont be pre-empted.

Health check for application

I wish to know what are the methods exist to check the Health of a process. Considering that on a system
10000 process are running and you have to make sure that in case any of these process goes down we need to make the process UP.
Use the Process ID (PID) and poll whether the process is still alive or is dead periodically; and if it's dead, then revive it.
However, if you have 10000 process, you will probably hit the OS's process limit first. I suggest redesigning your program so you don't need that much processes in the first place.
Re-spawning processes that go down is usually handled by having specific launcher programs to exec() the program and wait for a SIGCHILD to indicate the child process ended.
For boot time applications (servers etc) daemons like upstart can do this for you automatically.
While others are pointing out that applications already exists (which you really should use unless you have a clear reason not to) I'll throw out a random idea for a custom solution.
If you control all N processes then make them all have one shared memory area N bits large (so, 10000 processes ~ 1KB, not bad). When starting each process give it a number, i, ranging from 0 to N. Every T seconds have each process will set bit i in the shared memory to 1. A monitoring process can check that all N bits are 1 every k*T seconds, resetting them all to 0 in the process.
This is still O(n), which you won't avoid, but the primitives are all really fast and should scale fine up to the OS thread limit.
An alternate idea for obtaining i would be just to use the PID, but then the shared memory will have to be larger (probably will still be OK though; for example, the Linux PID range is small).
there is an utility called monit which does what you are looking for. But it is for certain important processes in Linux.. all 10000 processes are important !!!

Resources