I am seeing two issues with my Perl script:
My if condition that has ==, ., != etc. is failing
The ps -p $pid -o etime= throws an error sh: line 1: -o: command not found
I am trying to check whether a process is already running, and if so
If it has been running for more than 40 minutes then kill the process
If it has been running for 30-40 minutes then issue a notification
If it has been running for less than 30 minutes then exit
A simple if test fails too. I have attached example code at the end.
Could some one please let me know the cause of these issues?
if ( `ps -ef | grep example | grep -v grep` ) {
print "Process is already running\n";
my $pid = `ps -ef | grep example | grep -v grep | awk '{print \$2}'`;
if ( `ps -p $pid -o etime= | sed -e "s/:/\\n/g" | wc -l | grep 3` ) {
print "1. Running for more than 40 mins\n";
`ps -ef | grep example | grep -v grep | awk '{print \$2}' | xargs kill -9`;
}
elsif ( `ps -p $pid -o etime= | sed -e "s/:/\\n/g" | wc -l|grep "2"` ) {
my $pmin = `ps -p $pid -o etime= | awk -F: '{print \$1}'`;
if ( $pmin < 30 ) {
print "Process running for 15 mins. Exiting";
exit;
}
elsif ( $pid >= 40 ) {
print "2.Running for more than 40 mins\n";
`ps -ef | grep example | grep -v grep | awk '{print \$2}' | xargs kill -9`;
}
else {
print "Process running for 30 mins. Notify";
}
}
}
my $psc = `ps -ef | grep example | grep -v grep >/dev/null 2>&1 && echo "Yes" || echo "No"`;
print "PSC - $psc";
if ( $psc eq "Yes" ) {
print "running";
}
else {
print "not running";
}
./test.pl
PSC - Yes
not running
I think this might be the root of your problem:
elsif ($pid >= 40) {
Because that's the process ID. Not $pmin. So you're basically killing and process ID >40, which will be almost any process, apart from occasionally if it randomly gets a low pid.
But pretty fundamentally - shelling out to ps and grep is painful. Replacing : with \n and then counting lines is a bit of a nasty thing to do - and then using grep to match a string is also pretty dirty.
Why not rewrite using something like Proc::ProcessTable instead?
Here's an example of how you'd read the process table, find a particular process id (or set of) and then query the time:
#!/usr/bin/env perl
use strict;
use warnings;
use Proc::ProcessTable;
use Data::Dumper;
my $ps = Proc::ProcessTable->new;
my #target_processes = grep { $_->pid eq $$ } #{ $ps->table };
print Dumper \#target_processes;
sleep 10;
while (1) {
foreach my $process ( grep { $_->cmndline =~ m/perl/ } #{ $ps->table } ) {
sleep 5;
print $process ->cmndline, " has been running for ",
$process->time / 10000, "s\n";
print Dumper \$process;
}
}
Note - the time is a high res time
Try the below change
ps -A | grep firefox >/dev/null 2>&1 && echo "Yes" || echo "No"
I defined an interactive function called pk in my shell script to kill programs, such as pk emacs to kill emacs programs, but if multiple instances are running, then ask you to choose the pid to kill or kill them all.
This happens occasionally when one of my Emacs freezes since my CentOS in my company is old, but in my script function pk, I use ps to filter the commands and their PIDs, AFAIK ps tells no window title in this case, it just prints one or more "/usr/bin/emacs", no more details, and I don't know which PID freezes or no-response which I am going to kill.
I know I can use system tools like System Activity(KDE) to check the window title and kill the program, but I want to kill program in terminal using pk function, so is there any tool like ps but showing "window-title + command + pid" so I can use in my script to kill that program.
Since if you open a file using vim or emacs from terminal, ps with options will show the file it is editing, so I know the details of the PID and know which one to kill, so here, the Window title is like the Window title in System Activity.
Of course, if getting the Widow title is the wrong way, if anyone knows how to kill one of multiple instances of the same program just like I said, the answers would be welcome.
I just found another solution I can use in my pk function to kill the frozen emacs with the following line:
kill -SIGUSR2 (xprop | grep -i pid | grep -Po "[0-9]+")
The (xprop...) part will return the PID when you click on a GUI program using your mouse.
If anyone is interesting in my pk function, here it is(NOTE that I'm using fish-shell, so this is fish script function):
function pk --description 'kill processes containg a pattern'
set done 1
set result (psg $argv[1] | wc -l)
if test $result = 0
echo "No '$argv[1]' process is running!"
else if test $result = 1
psg $argv[1] | awk '{print $2}' | xargs kill -9
if test $status = 123 # Operation not permitted
read -p 'echo "Use sudo to kill it? [y/N]: "' -l arg
if test "$arg" = "y"
psg $argv[1] | awk '{print $2}' | xargs sudo kill -9
end
end
else
psg $argv[1]
while test $done = 1
read -p 'echo "Kill all of them or specific PID? [y/N/pid]: "' -l arg
if test "$arg" = "y"
psg $argv[1] | awk '{print $2}' | xargs kill -9
if test $status -eq 123 # Operation not permitted
read -p 'echo "Use sudo to kill them all? [y/N]: "' -l arg2
if test "$arg2" = "y"
psg $argv[1] | awk '{print $2}' | xargs sudo kill -9
end
end
set done 0
else if test $arg -a "$arg" != "y" -a "$arg" != "n"
# the fist cond in test means you typed something, RET will not pass
if test (psg $argv[1] | awk '{print $2}' | grep -i $arg)
kill -9 $arg #2>/dev/null
if test $status -eq 1 # kill failed
read -p 'echo "Use sudo to kill it? [y/N]: "' -l arg2
if test "$arg2" = "y"
sudo kill -9 $arg
end
end
echo -e "Continue...\n"
usleep 100000
psg $argv[1]
else if test "$arg" = "p"
# This may be used for frozen emacs specifically, -usr2 or -SIGUSR2
# will turn on `toggle-debug-on-quit`, turn it off once emacs is alive again
# Test on next frozen Emacs
kill -SIGUSR2 (xprop | grep -i pid | grep -Po "[0-9]+")
# kill -usr2 (xprop | grep -i pid | grep -Po "[0-9]+")
return
else
echo "PID '$arg[1]' is not in the list!"
echo
end
set done 1
else
# RET goes here, means `quit` like C-c
set done 0
end
end
end
end
I had a cron script running on my Ubuntu PC which periodically checks for disk space. There was a "sleep 5" also used in this script. This system was up for a month, but an issue is encountered. There were multiple instances of cron script running (around thousands) and multiple instances of "sleep".
Crontab entry:
* * * * * /root/disk_check_script.sh
Content of disk_check_script.sh :
#!/bin/sh -x
/home/user/linux/process_health_daemons status
if [ $? -ne 0 ]; then
/home/user/linux/process_health_daemons start
fi
sleep 5
MINIMUM_FREE_SPACE_REQUIRED=50
AVAILABLE_FREE_SPACE=$(df -Ph $HOME | tail -1 | awk '{ print $4}' | awk -F "G" '{ print $1 }')
USED_SPACE=$(df -Ph $HOME | tail -1 | awk '{ print $3}' | awk -F "G" '{ print $1 }')
if [ "$AVAILABLE_FREE_SPACE" -lt "$MINIMUM_FREE_SPACE_REQUIRED" ]; then
touch /home/user/flag/no_space_left
fi
When I killed one of the sleep process, cron script associated with it also completed.
Does sleep hangs often? Is it required to restart the machine to avoid this? Any other method to avoid this.
I have a bash init script that runs this command:
sudo -umyuser APPLICATION_ENV=production php script/push-server.php >> /var/log/push-server.log 2>&1 &
I then try to capture both pids and put them into a file:
echo $! > /var/log/push_server.pid
childpid=$(ps --no-heading --ppid $! | tail -1 | awk '{ print $1 }')
echo $childpid >> /var/log/push_server.pid
However, if I use the --no-heading flag it returns blank. If I run that very same ps command on the command line, it returns the proper pid number. The same happens if I modify the command a little bit like so:
childpid=$(ps --no-heading --ppid $! | awk '{NR>1}' | tail -1 | awk '{ print $1 }')
I've tried removing the NR, tail, adding the --no-header, and even going all the way down to just doing:
chidlpid=$(ps --no-heading --ppid $!)
and it still won't return the child pid.
Any ideas?
The second time you use $! you actually use the pid of the echo. Save it in a variable for later use.
Above statement is not true, as #mklement0 pointed out $! only updated when a new background process is started.
The most likely problem therefore is the timing: maybe the child process is not forked yet by the time the script checks for the pid.
Thanks for everyone who jumped in to help! The answer was indeed to add a sleep n:
sudo -umyuser APPLICATION_ENV=production php script/push-server.php >> /var/log/push-server.log 2>&1 &
mainpid=$!
echo $mainpid > /var/log/push_server.pid
sleep 3
childpid=$(ps --no-heading --ppid $mainpid | tail -1 | awk '{ print $1 }')
echo $childpid >> /var/log/push_server.pid
echo -n "push_server started on pid $mainpid $childpid"
return
I have a problem with some zombie-like processes on a certain server that need to be killed every now and then. How can I best identify the ones that have run for longer than an hour or so?
Found an answer that works for me:
warning: this will find and kill long running processes
ps -eo uid,pid,etime | egrep '^ *user-id' | egrep ' ([0-9]+-)?([0-9]{2}:?){3}' | awk '{print $2}' | xargs -I{} kill {}
(Where user-id is a specific user's ID with long-running processes.)
The second regular expression matches the a time that has an optional days figure, followed by an hour, minute, and second component, and so is at least one hour in length.
If they just need to be killed:
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1h someprocessname;fi
If you want to see what it's matching
if [[ "$(uname)" = "Linux" ]];then killall -i --older-than 1h someprocessname;fi
The -i flag will prompt you with yes/no for each process match.
For anything older than one day,
ps aux
will give you the answer, but it drops down to day-precision which might not be as useful.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 7200 308 ? Ss Jun22 0:02 init [5]
root 2 0.0 0.0 0 0 ? S Jun22 0:02 [migration/0]
root 3 0.0 0.0 0 0 ? SN Jun22 0:18 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S Jun22 0:00 [watchdog/0]
If you're on linux or another system with the /proc filesystem, In this example, you can only see that process 1 has been running since June 22, but no indication of the time it was started.
stat /proc/<pid>
will give you a more precise answer. For example, here's an exact timestamp for process 1, which ps shows only as Jun22:
ohm ~$ stat /proc/1
File: `/proc/1'
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 3h/3d Inode: 65538 Links: 5
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-06-22 15:37:44.347627750 -0700
Modify: 2008-06-22 15:37:44.347627750 -0700
Change: 2008-06-22 15:37:44.347627750 -0700
In this way you can obtain the list of the ten oldest processes:
ps -elf | sort -r -k12 | head -n 10
Jodie C and others have pointed out that killall -i can be used, which is fine if you want to use the process name to kill. But if you want to kill by the same parameters as pgrep -f, you need to use something like the following, using pure bash and the /proc filesystem.
#!/bin/sh
max_age=120 # (seconds)
naughty="$(pgrep -f offlineimap)"
if [[ -n "$naughty" ]]; then # naughty is running
age_in_seconds=$(echo "$(date +%s) - $(stat -c %X /proc/$naughty)" | bc)
if [[ "$age_in_seconds" -ge "$max_age" ]]; then # naughty is too old!
kill -s 9 "$naughty"
fi
fi
This lets you find and kill processes older than max_age seconds using the full process name; i.e., the process named /usr/bin/python2 offlineimap can be killed by reference to "offlineimap", whereas the killall solutions presented here will only work on the string "python2".
Perl's Proc::ProcessTable will do the trick:
http://search.cpan.org/dist/Proc-ProcessTable/
You can install it in debian or ubuntu with sudo apt-get install libproc-processtable-perl
Here is a one-liner:
perl -MProc::ProcessTable -Mstrict -w -e 'my $anHourAgo = time-60*60; my $t = new Proc::ProcessTable;foreach my $p ( #{$t->table} ) { if ($p->start() < $anHourAgo) { print $p->pid, "\n" } }'
Or, more formatted, put this in a file called process.pl:
#!/usr/bin/perl -w
use strict;
use Proc::ProcessTable;
my $anHourAgo = time-60*60;
my $t = new Proc::ProcessTable;
foreach my $p ( #{$t->table} ) {
if ($p->start() < $anHourAgo) {
print $p->pid, "\n";
}
}
then run perl process.pl
This gives you more versatility and 1-second-resolution on start time.
You can use bc to join the two commands in mob's answer and get how many seconds ellapsed since the process started:
echo `date +%s` - `stat -t /proc/<pid> | awk '{print $14}'` | bc
edit:
Out of boredom while waiting for long processes to run, this is what came out after a few minutes fiddling:
#file: sincetime
#!/bin/bash
init=`stat -t /proc/$1 | awk '{print $14}'`
curr=`date +%s`
seconds=`echo $curr - $init| bc`
name=`cat /proc/$1/cmdline`
echo $name $seconds
If you put this on your path and call it like this:
sincetime
it will print the process cmdline and seconds since started. You can also put this in your path:
#file: greptime
#!/bin/bash
pidlist=`ps ax | grep -i -E $1 | grep -v grep | awk '{print $1}' | grep -v PID | xargs echo`
for pid in $pidlist; do
sincetime $pid
done
And than if you run:
greptime <pattern>
where patterns is a string or extended regular expression, it will print out all processes matching this pattern and the seconds since they started. :)
do a ps -aef. this will show you the time at which the process started. Then using the date command find the current time. Calculate the difference between the two to find the age of the process.
I did something similar to the accepted answer but slightly differently since I want to match based on process name and based on the bad process running for more than 100 seconds
kill $(ps -o pid,bsdtime -p $(pgrep bad_process) | awk '{ if ($RN > 1 && $2 > 100) { print $1; }}')
stat -t /proc/<pid> | awk '{print $14}'
to get the start time of the process in seconds since the epoch. Compare with current time (date +%s) to get the current age of the process.
Using ps is the right way. I've already done something similar before but don't have the source handy.
Generally - ps has an option to tell it which fields to show and by which to sort. You can sort the output by running time, grep the process you want and then kill it.
HTH
In case anyone needs this in C, you can use readproc.h and libproc:
#include <proc/readproc.h>
#include <proc/sysinfo.h>
float
pid_age(pid_t pid)
{
proc_t proc_info;
int seconds_since_boot = uptime(0,0);
if (!get_proc_stats(pid, &proc_info)) {
return 0.0;
}
// readproc.h comment lies about what proc_t.start_time is. It's
// actually expressed in Hertz ticks since boot
int seconds_since_1970 = time(NULL);
int time_of_boot = seconds_since_1970 - seconds_since_boot;
long t = seconds_since_boot - (unsigned long)(proc_info.start_time / Hertz);
int delta = t;
float days = ((float) delta / (float)(60*60*24));
return days;
}
Came across somewhere..thought it is simple and useful
You can use the command in crontab directly ,
* * * * * ps -lf | grep "user" | perl -ane '($h,$m,$s) = split /:/,$F
+[13]; kill 9, $F[3] if ($h > 1);'
or, we can write it as shell script ,
#!/bin/sh
# longprockill.sh
ps -lf | grep "user" | perl -ane '($h,$m,$s) = split /:/,$F[13]; kill
+ 9, $F[3] if ($h > 1);'
And call it crontab like so,
* * * * * longprockill.sh
My version of sincetime above by #Rafael S. Calsaverini :
#!/bin/bash
ps --no-headers -o etimes,args "$1"
This reverses the output fields: elapsed time first, full command including arguments second. This is preferred because the full command may contain spaces.