Counting open files per process

Counting open files per process - linux

I'm working on an application that monitors the processes' resources and gives a periodic report in Linux, but I faced a problem in extracting the open files count per process.
This takes quite a while if I take all of the files and group them according to their PID and count them.
How can I take the open files count for each process in Linux?

Have a look at the /proc/ file system:
ls /proc/$pid/fd/ | wc -l
To do this for all processes, use this:
cd /proc
for pid in [0-9]*
do
echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors"
done
As a one-liner (filter by appending | grep -v "0 FDs"):
for pid in /proc/[0-9]*; do printf "PID %6d has %4d FDs\n" $(basename $pid) $(ls $pid/fd | wc -l); done
As a one-liner including the command name, sorted by file descriptor count in descending order (limit the results by appending | head -10):
for pid in /proc/[0-9]*; do p=$(basename $pid); printf "%4d FDs for PID %6d; command=%s\n" $(ls $pid/fd | wc -l) $p "$(ps -p $p -o comm=)"; done | sort -nr
Credit to #Boban for this addendum:
You can pipe the output of the script above into the following script to see the ten processes (and their names) which have the most file descriptors open:
...
done | sort -rn -k5 | head | while read -r _ _ pid _ fdcount _
do
command=$(ps -o cmd -p "$pid" -hc)
printf "pid = %5d with %4d fds: %s\n" "$pid" "$fdcount" "$command"
done
Here's another approach to list the top-ten processes with the most open fds, probably less readable, so I don't put it in front:
find /proc -maxdepth 1 -type d -name '[0-9]*' \
-exec bash -c "ls {}/fd/ | wc -l | tr '\n' ' '" \; \
-printf "fds (PID = %P), command: " \
-exec bash -c "tr '\0' ' ' < {}/cmdline" \; \
-exec echo \; | sort -rn | head

Try this:
ps aux | sed 1d | awk '{print "fd_count=$(lsof -p " $2 " | wc -l) && echo " $2 " $fd_count"}' | xargs -I {} bash -c {}

I used this to find top filehandler-consuming processes for a given user (username) where dont have lsof or root access:
for pid in `ps -o pid -u username` ; do echo "$(ls /proc/$pid/fd/ 2>/dev/null | wc -l ) for PID: $pid" ; done | sort -n | tail

This works for me:
ps -opid= -ax | xargs -L 1 -I{} -- sudo bash -c 'echo -n "{} ";lsof -p {} 2>/dev/null | wc -l' | sort -n -k2
It prints numopenfiles per pid sorted by numopenfiles.
It will ask for sudo password once.
Note that the sum of the above numbers might be bigger than the total number of open files from all processes.
As I read here: forked processes can share file handles

How can I take the open files count for each process in Linux?
procpath query -f stat,fd
if you're running it from root (e.g. prefixing the command with sudo -E env PATH=$PATH), otherwise it'll only return file descriptor counts per process whose /proc/{pid}/fd you may list. This will give you a big JSON document/tree whose nodes look something like:
{
"fd": {
"anon": 3,
"blk": 0,
"chr": 1,
"dir": 0,
"fifo": 0,
"lnk": 0,
"reg": 0,
"sock": 3
},
"stat": {
"pid": 25649,
"ppid": 25626,
...
},
...
}
The content of fd dictionary is counts per file descriptor type. The most interesting ones are probably these (see procfile.Fd description or man fstat for more details):
reg – count of open (regular) files
sock – count of open sockets
I'm the author of Procpath, which is a tool that provides a nicer interface to procfs for process analysis. You can record a process tree's procfs stats (in a SQLite database) and plot any of them later. For instance this is how my Firefox's process tree (root PID 2468) looks like with regards to open file descriptor count (sum of all types):
procpath --logging-level ERROR record -f stat,fd -i 1 -d ff_fd.sqlite \
'$..children[?(#.stat.pid == 2468)]'
# Ctrl+C
procpath plot -q fd -d ff_fd.sqlite -f ff_df.svg
If I'm interested in only a particular type of open file descriptors (say, sockets) I can plot it like this:
procpath plot --custom-value-expr fd_sock -d ff_fd.sqlite -f ff_df.svg

Related

Copy files containing a word and not containing other. / grep not working with for loop

I am new to Linux and got stuck when I tried to used pipe grep or find commands. I need to find a file with:
name pattern request_q_t.xml
contains "Phrase 1"
not contains "word 2" copy it to specific location.
I tried pipe grep command to locate the file and than copy.
for filename in $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"')
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done
When I tried this grep command in command line its working fine
grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ | xargs grep -L '"word 2"'
but I am getting error in for loop. for some reason output of grep command is
(Standard Input)
(Standard Input)
(Standard Input)
(Standard Input)
I am not sure what I am doing wrong.
what is the efficient way to do it.. Its a huge filesystem I have to search in.

find . -name "request_q*_t*.xml" -exec sh -c "if grep -q phrase\ 1 {} && ! grep -q word\ 2 {} ;then cp {} /path/to/somewhere/;fi;" \;

You can use AWK for this in combination with xargs. The problem is that you have to read all files completely as they cannot contain that single string, but you can also terminate early if that string is found:
awk '(FNR==1){if(a) print fname; fname=FILENAME; a=0}
/Phrase 1/{a=1}
/Word 2/{a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"
If you want to store "Phrase 1" and "Word 2" in variables, you can use:
awk -v include="Phrase 1" -v exclude="Word 2" \
'(FNR==1){if(a) print fname; fname=FILENAME; a=0}
($0~include){a=1}
($0~exclude){a=0;nextfile}
END{if(a) print fname}' request_q*_t*.xml \
| xargs -I{} cp "{}" "$outputpath"

You can nest the $() constructs:
for filename in $( grep -L '"word 2"' $(grep --include=request_q*_t*.xml -li '"phrase 1"' $d/ ))
do
echo "coping file: '$filename'"
cp $filename $outputpath
filefound=true
done

How to pipe custom script in bash?

I have file, where each line is pid of some process. What I would like to achieve, is displaying file descriptors summary.
So basically my steps are like this:
ps -aux | grep -E 'riak|erlang' | tr -s " " | cut -f2 -d " " | xargs lsof -a -p $param | (wc -l per process)
I am lost here: $param I don't know how to put it from stdin, also I don't have an idea how to make wc -l grouped per each lsof -a -p result, not for total result - I am expecting number of open files per process, not by them all.
Bonus question: How to convert such input:
123 foo-exe
234 bar-exe
(first column pid, second name)
to result like
123 foo-exe 1234
234 foo-exe 12344
where first column is pid, second is name, third is number of open files.
I know it could be different way of doing it (which I would like to know), but knowledge how to do it using bash tools would be nice :)

Assuming that riak, erlang are user names.
ps -e -o pid=,comm= -U riak,erlang | while read pid comm; do lsof=`lsof -a -p $param | wc -l`; echo $pid $comm $lsof; done
Pure lsof+awk based approach (should be faster) than earlier approach.
{ lsof -u riak +c 0; lsof -u erlang +c 0; } | awk '{cmd[$2]=$1;count[$2]++;}function cmp_num_idx(i1, v1, i2, v2) {return (i1 - i2);} END{PROCINFO["sorted_in"]="cmp_num_idx"; for (pid in cmd){ printf "%10d %20s %10d\n", pid, cmd[pid], count[pid];}}'

How to get result from background process linux shell script?

For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n

I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)

You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.

You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.

How to read file from another file

This script lists the unit-*-slides.txt files in from directory to a filelist.txt file and from that file list it goes to the file and reads the file and gives the count of st^ lines to a file.but it is not counting in order for ex 1,2,3,4,.... it is counting like 10,1,2,3,4......
How to read it in order.
#!/bin/sh
#
outputdir=filelist
mk=$(mkdir $outputdir)
$mk
dest=$outputdir
cfile=filelist.txt
ofile="combine-slide.txt"
output=file-list.txt
path=/home/user/Desktop/script
ls $path/unit-*-slides.txt | sort -n -t '-' -k 2 > $dest/$cfile
echo "Generating files list..."
echo "Done"
#Combining
while IFS= read file
do
if [ -f "$file" ]; then
tabs=$(cat unit-*-slides.txt | grep "st^" | split -l 200)
fi
done < "$dest/$cfile"
echo "Combining Done........!"

Try with sort -n
tabs=$(cat $( ls unit-*-slides.txt | sort -n ) | grep "st^" | split -l 200)
sort -n means numeric sort, so output of ls is ordered by number.

Combining greps to make script to count files in folder

I need some help combining elements of scripts to form a read output.
Basically I need to get the file name of a user for the folder structure listed below and using count the number of lines in the folder for that user with the file type *.ano
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
/home/user/Drive-backup/2011 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/3.ano
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.ano
awk -F/ '{print $(NF-2)}'
This will give me the username I need but I also need to know how many non blank lines they are in that users folder for file type *.ano. I have the grep below that works but I dont know how to put it all together so it can output a file that makes sense.
grep -cv '^[[:space:]]*$' *.ano | awk -F: '{ s+=$2 } END { print s }'
Example output needed
UserA 500
UserB 2
UserC 20

find /home -name '*.ano' | awk -F/ '{print $(NF-2)}' | sort | uniq -c
That ought to give you the number of "*.ano" files per user given your awk is correct. I often use sort/uniq -c to count the number of instances of a string, in this case username, as opposed to 'wc -l' only counting input lines.
Enjoy.

Have a look at wc (word count).

To count the number of *.ano files in a directory you can use
find "$dir" -iname '*.ano' | wc -l
If you want to do that for all directories in some directory, you can just use a for loop:
for dir in * ; do
echo "user $dir"
find "$dir" -iname '*.ano' | wc -l
done

Execute the bash-script below from folder
/home/user/Drive-backup/2010 Backup/2010 Account/Jan
and it will report the number of non-blank lines per user.
#!/bin/bash
#save where we start
base=$(pwd)
# get all top-level dirs, skip '.'
D=$(find . \( -type d ! -name . -prune \))
for d in $D; do
cd $base
cd $d
# search for all files named *.ano and count blank lines
sum=$(find . -type f -name *.ano -exec grep -cv '^[[:space:]]*$' {} \; | awk '{sum+=$0}END{print sum}')
echo $d $sum
done

This might be what you want (untested): requires bash version 4 for associative arrays
declare -A count
cd /home/user/Drive-backup
for userdir in */*/*/*; do
username=${userdir##*/}
lines=$(grep -cv '^[[:space:]]$' $userdir/user.dir/*.ano | awk '{sum += $2} END {print sum}')
(( count[$username] += lines ))
done
for user in "${!count[#]}"; do
echo $user ${count[$user]}
done

Here's yet another way of doing it (on Mac OS X 10.6):
find -x "$PWD" -type f -iname "*.ano" -exec bash -c '
ar=( "${#%/*}" ) # perform a "dirname" command on every array item
printf "%s\000" "${ar[#]%/*}" # do a second "dirname" and add a null byte to every array item
' arg0 '{}' + | sort -uz |
while IFS="" read -r -d '' userDir; do
# to-do: customize output to get example output needed
echo "$userDir"
basename "$userDir"
find -x "${userDir}" -type f -iname "*.ano" -print0 |
xargs -0 -n 500 grep -hcv '^[[:space:]]*$' | awk '{ s+=$0 } END { print s }'
#xargs -0 -n 500 grep -cv '^[[:space:]]*$' | awk -F: '{ s+=$NF } END { print s }'
printf '%s\n' '----------'
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Counting open files per process - linux

Try this: ps aux | sed 1d | awk '{print "fd_count=$(lsof -p " $2 " | wc -l) && echo " $2 " $fd_count"}' | xargs -I {} bash -c {}

I used this to find top filehandler-consuming processes for a given user (username) where dont have lsof or root access: for pid in `ps -o pid -u username` ; do echo "$(ls /proc/$pid/fd/ 2>/dev/null | wc -l ) for PID: $pid" ; done | sort -n | tail

Related

Copy files containing a word and not containing other. / grep not working with for loop

How to pipe custom script in bash?

How to get result from background process linux shell script?

How to read file from another file

Combining greps to make script to count files in folder

Categories

Resources