I have file, where each line is pid of some process. What I would like to achieve, is displaying file descriptors summary.
So basically my steps are like this:
ps -aux | grep -E 'riak|erlang' | tr -s " " | cut -f2 -d " " | xargs lsof -a -p $param | (wc -l per process)
I am lost here: $param I don't know how to put it from stdin, also I don't have an idea how to make wc -l grouped per each lsof -a -p result, not for total result - I am expecting number of open files per process, not by them all.
Bonus question: How to convert such input:
123 foo-exe
234 bar-exe
(first column pid, second name)
to result like
123 foo-exe 1234
234 foo-exe 12344
where first column is pid, second is name, third is number of open files.
I know it could be different way of doing it (which I would like to know), but knowledge how to do it using bash tools would be nice :)
Assuming that riak, erlang are user names.
ps -e -o pid=,comm= -U riak,erlang | while read pid comm; do lsof=`lsof -a -p $param | wc -l`; echo $pid $comm $lsof; done
Pure lsof+awk based approach (should be faster) than earlier approach.
{ lsof -u riak +c 0; lsof -u erlang +c 0; } | awk '{cmd[$2]=$1;count[$2]++;}function cmp_num_idx(i1, v1, i2, v2) {return (i1 - i2);} END{PROCINFO["sorted_in"]="cmp_num_idx"; for (pid in cmd){ printf "%10d %20s %10d\n", pid, cmd[pid], count[pid];}}'
Related
I want to read result of ps command and the proc number into two variables, but all the output assigned to the first variable.
my shell followed like this
#!/bin/bash
function status() {
proc_num=`ps -ef | grep noah.*super | grep -v grep | tee /dev/stderr | wc -l`
return $proc_num
}
IFS=$'#' read -r -d '' ret proc_num <<< `status 2>&1; echo "#$?"`
echo -e "proc_num: $proc_num\n"
echo -e "ret: $ret"
the result followed like this:
proc_num:
ret: root 7140 21935 0 Jul27 ? 00:00:00 /bin/sh -- /noah/modules/cecb4af2fce3393df49e748f86d7a176/supervise.minos-agent --run
root 8213 7140 0 Jul27 ? 00:00:00 /bin/sh -- /noah/modules/cecb4af2fce3393df49e748f86d7a176/supervise.minos-agent --run
root 8919 21935 0 Jul27 ? 00:00:00 /bin/sh -- /noah/modules/cecb4af2fce3393df49e748f86d7a176/supervise.minos-agent --run
root 18530 1 0 17:04 ? 00:00:00 /bin/sh -- /noah/modules/c0b527e8b1ce71007f8164d07195a8a2/supervise.logagent --run
root 21935 1 0 Jul10 ? 00:00:00 /bin/sh -- /noah/modules/cecb4af2fce3393df49e748f86d7a176/supervise.minos-agent --run
root 32278 32276 0 2019 ? 00:00:00 /bin/sh /noah/modules/f314c3a2b201042b9545e255364e9a9d/bin/supervise.noah-ccs-agent --run
root 34836 1 0 Sep18 ? 00:00:00 /bin/sh /noah/modules/488dddfee9441251c82ea773a97dfcd3/bin/supervise.noah-client --run
root 56155 1 0 Jun07 ? 00:00:00 /bin/sh /noah/modules/11e7054f8e14a30bd0512113664584b4/bin/supervise.server_inspector --run
8
thanks for your help.
The immediate problem is that you're running into a bug in how earlier versions of bash treat unquoted here-strings (see this question). You can avoid it by double-quoting the here-string:
IFS=$'#' read -r -d '' ret proc_num <<< "`status 2>&1; echo "#$?"`"
...but please don't do that; this whole approach is overcomplicated and prone to problems.
Before I get to the more significant problems, I'll recommend using $( ) rather than backticks for command substitutions; they're easier to read, and avoid some parsing weirdnesses that backticks have.
Quote everything that might be misinterpreted. In grep noah.*super, the shell will try to turn noah.*super into a list of matching filenames. It's unlikely to find any matches, but if it somehow does the script will break in really weird ways. So use grep 'noah.*super' instead.
Do you have the pgrep command available? If so, use it instead of all of the ps | grep | grep stuff.
Exit/return statuses are for reporting status (i.e. success/failure, and maybe what failed), not returning data. Returning the number of processes found, as you're doing, will run into trouble if the number ever exceeds 255 (because the status is just a single byte, so that's the max it can hold). If there are ever 256 processes, the function will return 0. If there are 300, it'll return 44. etc. Return data as output, rather than abusing the return status like this.
Also, it's best to have functions produce output via stdout, rather than stderr as this one's doing. If you need to sneak a copy of the output past something like $( ), redirect it back to stdout afterward. And I'd tend to use something other than stderr anyway, to avoid mixing in any actual errors with the output stream. Here's an example using FD #3 (and BTW use local variables in functions when possible):
{ local proc_num=$(ps -ef | grep 'noah.*super' | grep -v grep | tee /dev/fd/3 | wc -l); } 3>&1
...or just capture the output, then do multiple things with it:
local output="$(ps -ef | grep 'noah.*super' | grep -v grep)"
echo "$output"
local proc_num="$(echo "$output" | wc -l | tr -d ' ')" # tr is to remove spaces from the output
status 2>&1; echo "#$?" is also trouble-prone; here you're taking that return status (which should've been output rather than a return status), and converting it to part of the output (which is what it should've been in the first place). And you're doing it so you can then re-split them back into separate bits of data with read. If you ever actually do need to capture both the output and return status from something, capture them separately:
output="$(status 2>&1)"
return_status=$?
(BTW, the right side of a simple assignment like this is one of the very few places it's safe to omit double-quotes around a process or variable substitution. But using double-quotes doesn't hurt, and it's easier to just reflexively double-quote than remember the list of safe places, so I went ahead and double-quoted it here.)
Don't use the function keyword, it's nonstandard. Just use funcname() { definition... }.
I'd avoid using echo -e -- different versions of echo (including the bash builtin complied with different options) will treat -e differently. Some will treat it as meaning to interpret escape sequences in the output, but some will print it as part of the output(!). Either just avoid it:
echo "proc_num: $proc_num"
echo
echo "ret: $ret"
Or use printf and put the escape stuff in the format string:
printf 'proc_num: %s\n\nret: %s\n' "$proc_num" "$ret"
...or...
printf '%s\n' "proc_num: $proc_num" "" "ret: $ret"
So, how would I do this? My first preference would be to move the number-of-processes calculation outside of the status function entirely:
#!/bin/bash
status() {
ps -ef | grep 'noah.*super' | grep -v grep
}
ret="$(status)"
proc_num="$(echo "$ret" | wc -l | tr -d ' ')" # tr -d ' ' to remove spaces from the string
echo "proc_num: $proc_num"
echo
echo "ret: $ret"
If you do need to have the function compute that count, I'd have it also take care of adding that to its output (and probably use process substitution instead of a here-string):
...
status() {
local output="$(ps -ef | grep 'noah.*super' | grep -v grep)"
echo "$output"
printf '#'
echo "$output" | wc -l | tr -d ' '
}
IFS='#' read -r -d '' ret proc_num < <(status)
...
Final note: run your scripts through shellcheck.net -- it'spot many common problems (like incorrect quoting).
I am trying to understand this peculiar behavior. Basically, I'm trying to grep an output of a command while still keeping the first line/header. Thanks for the help in advance.
Success Case
ps -ef | { head -1; grep bash; }
Output:
UID PID PPID C STIME TTY TIME CMD
username 1008 1 0 Jan21 tty1 00:00:00 -bash
username 1173 1008 0 Jan21 tty1 00:00:00 -bash
Failed Case
ls -tlrh / | { head -1; grep tmp; }
Output:
total 100K
(i.e.: it ignores the /tmp folder)
#Jotne's answer is better, but sometimes you can use grep -E if you know something in the first line, then you can search for that OR the other thing you want like this with the pipe symbol to express the alternation:
ps -ef | grep -E "UID|bash"
Output
UID PID PPID C STIME TTY TIME CMD
502 510 509 0 8:01am ttys000 0:00.08 -bash
502 48806 510 0 10:18am ttys000 0:00.00 grep -E UID|bash
Try use awk, eks:
ls -tlrh / | awk 'NR==1 || /tmp/'
This will print line number 1 or lines with tmp
NR==1; print line number 1
/tmp/ print all lines that contains tmp
The reason this does not work, is that the first of the two processes (head -n1) reads more than it outputs. It eats up the output of ls and leaves nothing for the grep process; ps creates its output linewise.
The correct way to solve this would be to duplicate STDOUT for every process that needs it, as described here
redirect COPY of stdout to log file from within bash script itself
However here it would suffice to simply feed the reading scripts line by line to avoid any buffering issues:
ls -ltrh / | while { read a; } do echo $a; done | { head -n 1; grep tmp; }
However, this means that grep can not see the line(s), head has consumed.
well.. this works to...but just for noting a very convoluted solution
(ps aux | tee >(head -n1 >&3 ) | grep bio >&3 ) 3>&1
this is not as nice as i wanted it to be, fd3 usage makes it weird
note: it might even theorecally possible that grep output preceeds the header
pure sed solution ;)
ps aux|sed '1p;/kwork/p;d'
I'm working on an application that monitors the processes' resources and gives a periodic report in Linux, but I faced a problem in extracting the open files count per process.
This takes quite a while if I take all of the files and group them according to their PID and count them.
How can I take the open files count for each process in Linux?
Have a look at the /proc/ file system:
ls /proc/$pid/fd/ | wc -l
To do this for all processes, use this:
cd /proc
for pid in [0-9]*
do
echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors"
done
As a one-liner (filter by appending | grep -v "0 FDs"):
for pid in /proc/[0-9]*; do printf "PID %6d has %4d FDs\n" $(basename $pid) $(ls $pid/fd | wc -l); done
As a one-liner including the command name, sorted by file descriptor count in descending order (limit the results by appending | head -10):
for pid in /proc/[0-9]*; do p=$(basename $pid); printf "%4d FDs for PID %6d; command=%s\n" $(ls $pid/fd | wc -l) $p "$(ps -p $p -o comm=)"; done | sort -nr
Credit to #Boban for this addendum:
You can pipe the output of the script above into the following script to see the ten processes (and their names) which have the most file descriptors open:
...
done | sort -rn -k5 | head | while read -r _ _ pid _ fdcount _
do
command=$(ps -o cmd -p "$pid" -hc)
printf "pid = %5d with %4d fds: %s\n" "$pid" "$fdcount" "$command"
done
Here's another approach to list the top-ten processes with the most open fds, probably less readable, so I don't put it in front:
find /proc -maxdepth 1 -type d -name '[0-9]*' \
-exec bash -c "ls {}/fd/ | wc -l | tr '\n' ' '" \; \
-printf "fds (PID = %P), command: " \
-exec bash -c "tr '\0' ' ' < {}/cmdline" \; \
-exec echo \; | sort -rn | head
Try this:
ps aux | sed 1d | awk '{print "fd_count=$(lsof -p " $2 " | wc -l) && echo " $2 " $fd_count"}' | xargs -I {} bash -c {}
I used this to find top filehandler-consuming processes for a given user (username) where dont have lsof or root access:
for pid in `ps -o pid -u username` ; do echo "$(ls /proc/$pid/fd/ 2>/dev/null | wc -l ) for PID: $pid" ; done | sort -n | tail
This works for me:
ps -opid= -ax | xargs -L 1 -I{} -- sudo bash -c 'echo -n "{} ";lsof -p {} 2>/dev/null | wc -l' | sort -n -k2
It prints numopenfiles per pid sorted by numopenfiles.
It will ask for sudo password once.
Note that the sum of the above numbers might be bigger than the total number of open files from all processes.
As I read here: forked processes can share file handles
How can I take the open files count for each process in Linux?
procpath query -f stat,fd
if you're running it from root (e.g. prefixing the command with sudo -E env PATH=$PATH), otherwise it'll only return file descriptor counts per process whose /proc/{pid}/fd you may list. This will give you a big JSON document/tree whose nodes look something like:
{
"fd": {
"anon": 3,
"blk": 0,
"chr": 1,
"dir": 0,
"fifo": 0,
"lnk": 0,
"reg": 0,
"sock": 3
},
"stat": {
"pid": 25649,
"ppid": 25626,
...
},
...
}
The content of fd dictionary is counts per file descriptor type. The most interesting ones are probably these (see procfile.Fd description or man fstat for more details):
reg – count of open (regular) files
sock – count of open sockets
I'm the author of Procpath, which is a tool that provides a nicer interface to procfs for process analysis. You can record a process tree's procfs stats (in a SQLite database) and plot any of them later. For instance this is how my Firefox's process tree (root PID 2468) looks like with regards to open file descriptor count (sum of all types):
procpath --logging-level ERROR record -f stat,fd -i 1 -d ff_fd.sqlite \
'$..children[?(#.stat.pid == 2468)]'
# Ctrl+C
procpath plot -q fd -d ff_fd.sqlite -f ff_df.svg
If I'm interested in only a particular type of open file descriptors (say, sockets) I can plot it like this:
procpath plot --custom-value-expr fd_sock -d ff_fd.sqlite -f ff_df.svg
I have the command:
ps -ef | grep kde | tr -s ' ' '#'
I`m getting output like this :
user2131#1626#1584#0#15:50#?#00:00:00#/bin/sh#/usr/bin/startkdeere
how can I get # symbol only for column separation using linux or smth else like awk ?
Use pgrep to get your PIDs instead of using ps. pgrep will eliminate the grep issue where one of the processes you discover is the grep doing your filtering.
You can also specify the output of the ps command itself using the -o or -O option. You can do this to get the fields you want, and eliminate the header.
You can also use the read command to parse your output. The only field you have with possible blank space is the last one -- the command and arguments.
ps -o uid= -o gid= -o tty= -o args= -p $(pgrep kde) | while read uid gid tty cmd
do
echo "UID = $uid PID = $pid TTY = $tty"
echo "Command = $cmd"
done
The while will split on whitespace except for the $cmd which will include all the leftover fields (i.e. the entire command with arguments).
The ps command differs from platform to platform, so read the manpage on ps.
Nasty but it works. Tweak the number 8 to suit the number of columns your variant of ps outputs.
ps -ef | awk -v OFS="" '{ for(i=1; i < 8; i++) printf("%s#",$i); for(i=8; i <= NF; i++) printf("%s ", $i); printf("\n")}'
If you mean process your output with '#' as a column/field separator, in awk you can use -F:
echo "user2131#1626#1584#0#15:50#?#00:00:00#/bin/sh#/usr/bin/startkdeere" | awk -F'#' -v OFS='\t' '{$1=$1;print $0}'
Output:
user2131 1626 1584 0 15:50 ? 00:00:00 /bin/sh /usr/bin/startkdeere
I have a problem with some zombie-like processes on a certain server that need to be killed every now and then. How can I best identify the ones that have run for longer than an hour or so?
Found an answer that works for me:
warning: this will find and kill long running processes
ps -eo uid,pid,etime | egrep '^ *user-id' | egrep ' ([0-9]+-)?([0-9]{2}:?){3}' | awk '{print $2}' | xargs -I{} kill {}
(Where user-id is a specific user's ID with long-running processes.)
The second regular expression matches the a time that has an optional days figure, followed by an hour, minute, and second component, and so is at least one hour in length.
If they just need to be killed:
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1h someprocessname;fi
If you want to see what it's matching
if [[ "$(uname)" = "Linux" ]];then killall -i --older-than 1h someprocessname;fi
The -i flag will prompt you with yes/no for each process match.
For anything older than one day,
ps aux
will give you the answer, but it drops down to day-precision which might not be as useful.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 7200 308 ? Ss Jun22 0:02 init [5]
root 2 0.0 0.0 0 0 ? S Jun22 0:02 [migration/0]
root 3 0.0 0.0 0 0 ? SN Jun22 0:18 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S Jun22 0:00 [watchdog/0]
If you're on linux or another system with the /proc filesystem, In this example, you can only see that process 1 has been running since June 22, but no indication of the time it was started.
stat /proc/<pid>
will give you a more precise answer. For example, here's an exact timestamp for process 1, which ps shows only as Jun22:
ohm ~$ stat /proc/1
File: `/proc/1'
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 3h/3d Inode: 65538 Links: 5
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-06-22 15:37:44.347627750 -0700
Modify: 2008-06-22 15:37:44.347627750 -0700
Change: 2008-06-22 15:37:44.347627750 -0700
In this way you can obtain the list of the ten oldest processes:
ps -elf | sort -r -k12 | head -n 10
Jodie C and others have pointed out that killall -i can be used, which is fine if you want to use the process name to kill. But if you want to kill by the same parameters as pgrep -f, you need to use something like the following, using pure bash and the /proc filesystem.
#!/bin/sh
max_age=120 # (seconds)
naughty="$(pgrep -f offlineimap)"
if [[ -n "$naughty" ]]; then # naughty is running
age_in_seconds=$(echo "$(date +%s) - $(stat -c %X /proc/$naughty)" | bc)
if [[ "$age_in_seconds" -ge "$max_age" ]]; then # naughty is too old!
kill -s 9 "$naughty"
fi
fi
This lets you find and kill processes older than max_age seconds using the full process name; i.e., the process named /usr/bin/python2 offlineimap can be killed by reference to "offlineimap", whereas the killall solutions presented here will only work on the string "python2".
Perl's Proc::ProcessTable will do the trick:
http://search.cpan.org/dist/Proc-ProcessTable/
You can install it in debian or ubuntu with sudo apt-get install libproc-processtable-perl
Here is a one-liner:
perl -MProc::ProcessTable -Mstrict -w -e 'my $anHourAgo = time-60*60; my $t = new Proc::ProcessTable;foreach my $p ( #{$t->table} ) { if ($p->start() < $anHourAgo) { print $p->pid, "\n" } }'
Or, more formatted, put this in a file called process.pl:
#!/usr/bin/perl -w
use strict;
use Proc::ProcessTable;
my $anHourAgo = time-60*60;
my $t = new Proc::ProcessTable;
foreach my $p ( #{$t->table} ) {
if ($p->start() < $anHourAgo) {
print $p->pid, "\n";
}
}
then run perl process.pl
This gives you more versatility and 1-second-resolution on start time.
You can use bc to join the two commands in mob's answer and get how many seconds ellapsed since the process started:
echo `date +%s` - `stat -t /proc/<pid> | awk '{print $14}'` | bc
edit:
Out of boredom while waiting for long processes to run, this is what came out after a few minutes fiddling:
#file: sincetime
#!/bin/bash
init=`stat -t /proc/$1 | awk '{print $14}'`
curr=`date +%s`
seconds=`echo $curr - $init| bc`
name=`cat /proc/$1/cmdline`
echo $name $seconds
If you put this on your path and call it like this:
sincetime
it will print the process cmdline and seconds since started. You can also put this in your path:
#file: greptime
#!/bin/bash
pidlist=`ps ax | grep -i -E $1 | grep -v grep | awk '{print $1}' | grep -v PID | xargs echo`
for pid in $pidlist; do
sincetime $pid
done
And than if you run:
greptime <pattern>
where patterns is a string or extended regular expression, it will print out all processes matching this pattern and the seconds since they started. :)
do a ps -aef. this will show you the time at which the process started. Then using the date command find the current time. Calculate the difference between the two to find the age of the process.
I did something similar to the accepted answer but slightly differently since I want to match based on process name and based on the bad process running for more than 100 seconds
kill $(ps -o pid,bsdtime -p $(pgrep bad_process) | awk '{ if ($RN > 1 && $2 > 100) { print $1; }}')
stat -t /proc/<pid> | awk '{print $14}'
to get the start time of the process in seconds since the epoch. Compare with current time (date +%s) to get the current age of the process.
Using ps is the right way. I've already done something similar before but don't have the source handy.
Generally - ps has an option to tell it which fields to show and by which to sort. You can sort the output by running time, grep the process you want and then kill it.
HTH
In case anyone needs this in C, you can use readproc.h and libproc:
#include <proc/readproc.h>
#include <proc/sysinfo.h>
float
pid_age(pid_t pid)
{
proc_t proc_info;
int seconds_since_boot = uptime(0,0);
if (!get_proc_stats(pid, &proc_info)) {
return 0.0;
}
// readproc.h comment lies about what proc_t.start_time is. It's
// actually expressed in Hertz ticks since boot
int seconds_since_1970 = time(NULL);
int time_of_boot = seconds_since_1970 - seconds_since_boot;
long t = seconds_since_boot - (unsigned long)(proc_info.start_time / Hertz);
int delta = t;
float days = ((float) delta / (float)(60*60*24));
return days;
}
Came across somewhere..thought it is simple and useful
You can use the command in crontab directly ,
* * * * * ps -lf | grep "user" | perl -ane '($h,$m,$s) = split /:/,$F
+[13]; kill 9, $F[3] if ($h > 1);'
or, we can write it as shell script ,
#!/bin/sh
# longprockill.sh
ps -lf | grep "user" | perl -ane '($h,$m,$s) = split /:/,$F[13]; kill
+ 9, $F[3] if ($h > 1);'
And call it crontab like so,
* * * * * longprockill.sh
My version of sincetime above by #Rafael S. Calsaverini :
#!/bin/bash
ps --no-headers -o etimes,args "$1"
This reverses the output fields: elapsed time first, full command including arguments second. This is preferred because the full command may contain spaces.