Limit the number of concurrent processes spawned by incrond - linux

I'm working on developing a process that will eventually be resident on a CentOS (latest) virtual machine, I'm developing in Ubuntu 12.04 LTS...
So, I have incron set to monitor my drop folder with IN_CLOSE_WRITE so that when a file is written into it a rather resource intensive script is then run on the file (Images, imagemagick). this all works fine; unless too many files are dropped at once. the script, as I said, is rather resource intensive and if more than 4 or so instances run concurrently my development machine is brought to its knees (the eventual virtual machine will be beefier, but I foresee instances where perhaps HUNDREDS of files will be dropped at once!)
dangerous incrontab:
/path/to/dropfolder IN_CLOSE_WRITE bash /path/to/resourceintensivescript.sh $#/$#
so the question is: how to limit the number of jobs spawned by incrond? I tried using gnu parallel but couldn't figure out how to make that work...
for example:
/path/to/dropfolder IN_CLOSE_WRITE parallel --gnu -j 4 bash /path/to/resourceintensivescript.sh $#/$#
seems to do nothing :/
and:
/path/to/dropfolder IN_CLOSE_WRITE;IN_NO_LOOP bash /path/to/resourceintensivescript.sh $#/$#
ends up missing files :P
Ideas on how to deal with this?

A very basic way to do this is to simply use grep and count the processes... something like:
processName=myprocess
if [ $(ps -ef |grep -v grep|grep ${processName} |wc -l) -le 4 ]
then
do something
fi
With the loop suggestion:
processName=myprocess
while true
do
if [ $(ps -ef |grep -v grep|grep ${processName} |wc -l) -le 4 ]
then
do something
break
fi
sleep 5
done

You can use the sem utility that comes with parallel:
/path/to/dropfolder IN_CLOSE_WRITE sem --gnu --id myjobname -j 4 /path/to/resourceintensivescript.sh $#/$#

Related

Bash, display processes in specific folder

I need to display processes, that are running in specific folder.
For example, there are folders "TEST" and "RUN". 3 sql files are running from TEST, and 2 from RUN. So when I use command ps xa, I can see all processes, runned from TEST and RUN together. What I want is to see processes, runned only from TEST folder, so only 3. Any commands, solutions to do this?
You can use lsof for this.
lsof | grep '/path/of/RUN'.
If you want to include both RUN and TEST in same command
lsof | grep -E "/path/of/RUN|/path/of/TEST"
Hope it helps.
You can try fuser to see which processes have particular files open; or, on Linux, examine the /proc/12345/cwd symlink for each of the candidate processes (replace 12345 with the process id of each).
fuser TEST/*.sql
for proc in /proc/[1-9]*; do
readlink "$proc/cwd" | grep -q TEST && echo "$proc"
done
The latter is not portable to other U*xes, though some may offer similar facilities.

Linux bash script that kills a process (not started by me) after x amount of time

I'm pretty inexperienced with Linux bash. That being said, I have a CentOS7 machine that runs a COTS application server. This application server runs other processes that sometimes hang. Since I have no control over the start of these processes, I'm looking for a script that runs every 2 minutes that kills processes of the name "spicer" that have been running for longer than 10 minutes. I've looked around and have only been able to find answers for processes that are run and owned by me.
I use the command ps -eo pid, command,etime | grep spicer to get all the spicer processes. The output of this command looks like:
18216 spicer -l/opt/otmm-10.5/Spi 14:20
18415 spicer -l/opt/otmm-10.5/Spi 11:49
etc...
18588 grep --color=auto spicer
I don't know if there's a way to parse this directly in bash. I'm also not well-versed at all in other Linux tools. I know that awk (or gawk) could possibly help.
EDIT
I have no control over the data that the process is working on.
What about wrapping the executable of spicer and start it using the timeout command? Let's say it is installed in /usr/bin/spicer. Then issue:
cp /usr/bin/spicer{,.orig}
echo '#!/bin/bash' > /usr/bin/spicer
echo 'timeout 10m spicer.orig "$#"' >> /usr/bin/spicer
Another approach would be to create a cronjob defintion into /etc/cron.d/kill_spicer. Like this:
* * * * * root kill $(ps --no-headers -C spicer -o pid,etimes | awk '$2>=600{print $1}')
The cronjob will get executed minutely and uses ps to obtain a list of spicer processes that run longer than 10minutes and passes them to kill.
Probably you even want kill -9 if the process is hanging.
You can use the -C option of ps to select processes by name.
ps --no-headers -C spicer -o pid,etime
Then you can use cut to filter the results, if the spacing is consistent. On my system the pid field takes up 8 characters, so I'd use
kill $(ps --no-headers -C spicer -o pid,etime | cut -c-8)
If the spacing is inconsistent (but if so, what kind of messed up ps are you using? :-P), you can use awk { print $1 } instead of cut.

SSH bash script to test if java process is running?

I need to create a SSH BASH script (on Debian linux) to test if 'java' process is running.
Here how it should look like:
IF 'java' process is not running THEN run ./start.sh
to test if java process is running, I can make this test:
ps -A | grep java
This script should run every minute (I guess in a CRON)
Regards
First of all, to run a job every minute in cron, your crontab should look like this:
* * * * * /path/to/script.sh
Next, you have a few different options for detecting a Java process.
Note that each of the following is a negation: they detect the absence of Java:
With pgrep:
if [ ! $(pgrep java) ] ; then
# no java running
fi
With pidof:
if [ ! $(pidof java) ] ; then
# no java running
fi
With ps and grep:
if [ ! $(ps -A | grep 'java') ] ; then
# no java running
fi
Of these, pgrep and pidofare probably the most efficient. Don't quote me on that, though.
The check you are doing with PS and GREP doesn't look very detailed. What if other Java processes are running ? You may detect those, and come to a wrong conclusion, because you are just checking "any" Java, not some specific Java.
With pidof would be something like this:
script.sh
pidof java;
if[$? -ne 0];
then
# here put your code when errorcode of `pidof` wasn't 0, means that it didn't find process
# for example: /home/user/start.sh
# (please don't forget to use full paths if you want to use it in cron)
fi
Especially for haters:
man pidof:
EXIT STATUS
0 At least one program was found with the requested name.
1 No program was found with the requested name.

Perl or Bash threadpool script?

I have a script - a linear list of commands - that takes a long time to run sequentially. I would like to create a utility script (Perl, Bash or other available on Cygwin) that can read commands from any linear script and farm them out to a configurable number of parallel workers.
So if myscript is
command1
command2
command3
I can run:
threadpool -n 2 myscript
Two threads would be created, one commencing with command1 and the other command2. Whichever thread finishes its first job first would then run command3.
Before diving into Perl (it's been a long time) I thought I should ask the experts if something like this already exists. I'm sure there should be something like this because it would be incredibly useful both for exploiting multi-CPU machines and for parallel network transfers (wget or scp). I guess I don't know the right search terms. Thanks!
If you need the output not to be mixed up (which xargs -P risks doing), then you can use GNU Parallel:
parallel -j2 ::: command1 command2 command3
Or if the commands are in a file:
cat file | parallel -j2
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
In Perl you can do this with Parallel::ForkManager:
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 8 ); # number of jobs to run in parallel
open FILE, "<commands.txt" or die $!;
while ( my $cmd = <FILE> ) {
$pm->start and next;
system( $cmd );
$pm->finish;
}
close FILE or die $!;
$pm->wait_all_children;
There is xjobs which is better at separating individual job output then xargs -P.
http://www.maier-komor.de/xjobs.html
You could also use make. Here is a very interesting article on how to use it creatively
Source: http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/posts/7
# That's commands.txt file
echo Hello world
echo Goodbye world
echo Goodbye cruel world
cat commands.txt | xargs -I CMD --max-procs=3 bash -c CMD

Automatically kill process that consume too much memory or stall on linux

I would like a "system" that monitors a process and would kill said process if:
the process exceeds some memory requirements
the process does not respond to a message from the "system" in some period of time
I assume this "system" could be something as simple as a monitoring process? A code example of how this could be done would be useful. I am of course not averse to a completely different solution to this problem.
For the first requirement, you might want to look into either using ulimit, or tweaking the kernel OOM-killer settings on your system.
Monitoring daemons exist for this sort of thing as well. God is a recent example.
I wrote a script that runs as a cron job and can be customized to kill problem processes:
#!/usr/local/bin/perl
use strict;
use warnings;
use Proc::ProcessTable;
my $table = Proc::ProcessTable->new;
for my $process (#{$table->table}) {
# skip root processes
next if $process->uid == 0 or $process->gid == 0;
# skip anything other than Passenger application processes
#next unless $process->fname eq 'ruby' and $process->cmndline =~ /\bRails\b/;
# skip any using less than 1 GiB
next if $process->rss < 1_073_741_824;
# document the slaughter
(my $cmd = $process->cmndline) =~ s/\s+\z//;
print "Killing process: pid=", $process->pid, " uid=", $process->uid, " rss=", $process->rss, " fname=", $process->fname, " cmndline=", $cmd, "\n";
# try first to terminate process politely
kill 15, $process->pid;
# wait a little, then kill ruthlessly if it's still around
sleep 5;
kill 9, $process->pid;
}
https://www.endpointdev.com/blog/2012/08/automatically-kill-process-using-too/
To limit memory usage of processes, check /etc/security/limits.conf
Try Process Resource Monitor for a classic, easy-to-use process monitor. Code available under the GPL.
There's a few other monitoring scripts there you might find interesting too.
If you want to set up a fairly comprehensive monitoring system, check out monit. It can be very chatty at times, but it will do a lot of monitoring, restart services, alert you, etc.
That said, don't be surprised if you're getting dozens of e-mails a day until you get used to configuring it and telling it what not to bug you about.
I have a shell script here that could be your start point. I did it because I also had some issues with processes exceeding memory limit. Actually it just checks a given limit of CPU usage, but you can easily change to watch memory, or the jobs list for an idle process.
file: pkill.sh
#!/bin/bash
if [ -z "$1" ]
then
maxlimit=99
else
maxlimit=$1
fi
ps axo user,%cpu,pid,vsz,rss,uid,gid --sort %cpu,rss\
| awk -v max=$maxlimit '$6 != 0 && $7 != 0 && $2 > max'\
| awk '{print $3}'\
| while read line;\
do\
ps u --no-headers -p $line;\
echo "$(date) - $(ps u --no-headers -p $line)" >> pkill.log;\
notify-send 'Killing proccess!' $(ps -p $line -o command --no-headers | awk '{print $1}') -u normal -i dialog-warning -t 3000;\
kill $line;\
done;
Simple run it once like: sh ./pkill.sh <limit-cpu>
Or, to keep it running: watch -n 10 sh ./pkill.sh 90
In the case above it will keep running each 10 seconds, killing processes that exceeds 90% of CPU
Are the monitored processes ones you're writing, or just any process?
If they're arbitrary processes then it might be hard to monitor for responsiveness. Unless the process is already set up to handle and respond to events that you can send it, then I doubt you'll be able to monitor them. If they're processes that you're writing, you'd need to add some kind of message handling that you can use the check against.

Resources