How to find out which processes are using swap space in Linux? - linux
Under Linux, how do I find out which process is using the swap space more?
The best script I found is on this page : http://northernmost.org/blog/find-out-what-is-using-your-swap/
Here's one variant of the script and no root needed:
#!/bin/bash
# Get current swap usage for all running processes
# Erik Ljungstrom 27/05/2011
# Modified by Mikko Rantalainen 2012-08-09
# Pipe the output to "sort -nk3" to get sorted output
# Modified by Marc Methot 2014-09-18
# removed the need for sudo
SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"`
do
PID=`echo $DIR | cut -d / -f 3`
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep VmSwap $DIR/status 2>/dev/null | awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
if (( $SUM > 0 )); then
echo "PID=$PID swapped $SUM KB ($PROGNAME)"
fi
let OVERALL=$OVERALL+$SUM
SUM=0
done
echo "Overall swap used: $OVERALL KB"
Run top then press OpEnter. Now processes should be sorted by their swap usage.
Here is an update as my original answer does not provide an exact answer to the problem as pointed out in the comments. From the htop FAQ:
It is not possible to get the exact size of used swap space of a
process. Top fakes this information by making SWAP = VIRT - RES, but
that is not a good metric, because other stuff such as video memory
counts on VIRT as well (for example: top says my X process is using
81M of swap, but it also reports my system as a whole is using only 2M
of swap. Therefore, I will not add a similar Swap column to htop
because I don't know a reliable way to get this information (actually,
I don't think it's possible to get an exact number, because of shared
pages).
Here's another variant of the script, but meant to give more readable output (you need to run this as root to get exact results):
#!/bin/bash
# find-out-what-is-using-your-swap.sh
# -- Get current swap usage for all running processes
# --
# -- rev.0.3, 2012-09-03, Jan Smid - alignment and intendation, sorting
# -- rev.0.2, 2012-08-09, Mikko Rantalainen - pipe the output to "sort -nk3" to get sorted output
# -- rev.0.1, 2011-05-27, Erik Ljungstrom - initial version
SCRIPT_NAME=`basename $0`;
SORT="kb"; # {pid|kB|name} as first parameter, [default: kb]
[ "$1" != "" ] && { SORT="$1"; }
[ ! -x `which mktemp` ] && { echo "ERROR: mktemp is not available!"; exit; }
MKTEMP=`which mktemp`;
TMP=`${MKTEMP} -d`;
[ ! -d "${TMP}" ] && { echo "ERROR: unable to create temp dir!"; exit; }
>${TMP}/${SCRIPT_NAME}.pid;
>${TMP}/${SCRIPT_NAME}.kb;
>${TMP}/${SCRIPT_NAME}.name;
SUM=0;
OVERALL=0;
echo "${OVERALL}" > ${TMP}/${SCRIPT_NAME}.overal;
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"`;
do
PID=`echo $DIR | cut -d / -f 3`
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
if (( $SUM > 0 ));
then
echo -n ".";
echo -e "${PID}\t${SUM}\t${PROGNAME}" >> ${TMP}/${SCRIPT_NAME}.pid;
echo -e "${SUM}\t${PID}\t${PROGNAME}" >> ${TMP}/${SCRIPT_NAME}.kb;
echo -e "${PROGNAME}\t${SUM}\t${PID}" >> ${TMP}/${SCRIPT_NAME}.name;
fi
let OVERALL=$OVERALL+$SUM
SUM=0
done
echo "${OVERALL}" > ${TMP}/${SCRIPT_NAME}.overal;
echo;
echo "Overall swap used: ${OVERALL} kB";
echo "========================================";
case "${SORT}" in
name )
echo -e "name\tkB\tpid";
echo "========================================";
cat ${TMP}/${SCRIPT_NAME}.name|sort -r;
;;
kb )
echo -e "kB\tpid\tname";
echo "========================================";
cat ${TMP}/${SCRIPT_NAME}.kb|sort -rh;
;;
pid | * )
echo -e "pid\tkB\tname";
echo "========================================";
cat ${TMP}/${SCRIPT_NAME}.pid|sort -rh;
;;
esac
rm -fR "${TMP}/";
Use smem
smem -s swap -r
Here is a link which tells you both how to install it and how to use it: http://www.cyberciti.biz/faq/linux-which-process-is-using-swap/
It's not entirely clear if you mean you want to find the process who has most pages swapped out or process who caused most pages to be swapped out.
For the first you may run top and order by swap (press 'Op'), for the latter you can run vmstat and look for non-zero entries for 'so'.
Another script variant avoiding the loop in shell:
#!/bin/bash
grep VmSwap /proc/[0-9]*/status | awk -F':' -v sort="$1" '
{
split($1,pid,"/") # Split first field on /
split($3,swp," ") # Split third field on space
cmdlinefile = "/proc/"pid[3]"/cmdline" # Build the cmdline filepath
getline pname[pid[3]] < cmdlinefile # Get the command line from pid
swap[pid[3]] = sprintf("%6i %s",swp[1],swp[2]) # Store the swap used (with unit to avoid rebuilding at print)
sum+=swp[1] # Sum the swap
}
END {
OFS="\t" # Change the output separator to tabulation
print "Pid","Swap used","Command line" # Print header
if(sort) {
getline max_pid < "/proc/sys/kernel/pid_max"
for(p=1;p<=max_pid;p++) {
if(p in pname) print p,swap[p],pname[p] # print the values
}
} else {
for(p in pname) { # Loop over all pids found
print p,swap[p],pname[p] # print the values
}
}
print "Total swap used:",sum # print the sum
}'
Standard usage is script.sh to get the usage per program with random order (down to how awk stores its hashes) or script.sh 1 to sort the output by pid.
I hope I've commented the code enough to tell what it does.
Yet two more variants:
Because top or htop could be not installed on small systems, browsing /proc stay always possible.
Even on small systems, you will found a shell...
A shell variant! (Not bash only)
This is exactly same than lolotux script, but without any fork to grep, awk or ps. This is a lot quicker!
And as bash is one of the poorest shell regarding performance, a little work was done to ensure this script will run well under dash, busybox and some other. Then, (thanks to Stéphane Chazelas,) become a lot quicker again!
#!/bin/sh
# Get current swap usage for all running processes
# Felix Hauri 2016-08-05
# Rewritted without fork. Inspired by first stuff from
# Erik Ljungstrom 27/05/2011
# Modified by Mikko Rantalainen 2012-08-09
# Pipe the output to "sort -nk3" to get sorted output
# Modified by Marc Methot 2014-09-18
# removed the need for sudo
OVERALL=0
for FILE in /proc/[0-9]*/status ;do
SUM=0
while read FIELD VALUE;do
case $FIELD in
Pid ) PID=$VALUE ;;
Name ) PROGNAME="$VALUE" ;;
VmSwap ) SUM=${VALUE%% *} ; break ;;
esac
done <$FILE
[ $SUM -gt 0 ] &&
printf "PID: %9d swapped: %11d KB (%s)\n" $PID $SUM "$PROGNAME"
OVERALL=$((OVERALL+SUM))
done
printf "Total swapped memory: %14u KB\n" $OVERALL
Don't forgot to double quote "$PROGNAME" ! See Stéphane Chazelas's comment:
read FIELD PROGNAME < <(
perl -ne 'BEGIN{$0="/*/*/../../*/*"} print if /^Name/' /proc/self/status
)
echo $FIELD "$PROGNAME"
Don't try echo $PROGNAME without double quote on sensible system, and be ready to kill current shell before!
And a perl version
As this become a not so simple script, time is comming to write a dedicated tool by using more efficient language.
#!/usr/bin/perl -w
use strict;
use Getopt::Std;
my ($tot,$mtot)=(0,0);
my %procs;
my %opts;
getopt('', \%opts);
sub sortres {
return $a <=> $b if $opts{'p'};
return $procs{$a}->{'cmd'} cmp $procs{$b}->{'cmd'} if $opts{'c'};
return $procs{$a}->{'mswap'} <=> $procs{$b}->{'mswap'} if $opts{'m'};
return $procs{$a}->{'swap'} <=> $procs{$b}->{'swap'};
};
opendir my $dh,"/proc";
for my $pid (grep {/^\d+$/} readdir $dh) {
if (open my $fh,"</proc/$pid/status") {
my ($sum,$nam)=(0,"");
while (<$fh>) {
$sum+=$1 if /^VmSwap:\s+(\d+)\s/;
$nam=$1 if /^Name:\s+(\S+)/;
}
if ($sum) {
$tot+=$sum;
$procs{$pid}->{'swap'}=$sum;
$procs{$pid}->{'cmd'}=$nam;
close $fh;
if (open my $fh,"</proc/$pid/smaps") {
$sum=0;
while (<$fh>) {
$sum+=$1 if /^Swap:\s+(\d+)\s/;
};
};
$mtot+=$sum;
$procs{$pid}->{'mswap'}=$sum;
} else { close $fh; };
};
};
map {
printf "PID: %9d swapped: %11d (%11d) KB (%s)\n",
$_, $procs{$_}->{'swap'}, $procs{$_}->{'mswap'}, $procs{$_}->{'cmd'};
} sort sortres keys %procs;
printf "Total swapped memory: %14u (%11u) KB\n", $tot,$mtot;
could by run with one of
-c sort by command name
-p sort by pid
-m sort by swap values
by default, output is sorted by status's vmsize
The top command also contains a field to display the number of page faults for a process. The process with maximum page faults would be the process which is swapping most.
For long running daemons it might be that they incur large number of page faults at the beginning and the number does not increase later on. So we need to observe whether the page faults is increasing.
I adapted a different script on the web to this long one-liner:
{ date;for f in /proc/[0-9]*/status; do
awk '{k[$1]=$2} END { if (k["VmSwap:"]) print k["Pid:"],k["Name:"],k["VmSwap:"];}' $f 2>/dev/null;
done | sort -n ; }
Which I then throw into a cronjob and redirect output to a logfile. The information here is the same as accumulating the Swap: entries in the smaps file, but if you want to be sure, you can use:
{ date;for m in /proc/*/smaps;do
awk '/^Swap/ {s+=$2} END { if (s) print FILENAME,s }' $m 2>/dev/null;
done | tr -dc ' [0-9]\n' |sort -k 1n; }
The output of this version is in two columns: pid, swap amount. In the above version, the tr strips the non-numeric components. In both cases, the output is sorted numerically by pid.
Gives totals and percentages for process using swap
smem -t -p
Source : https://www.cyberciti.biz/faq/linux-which-process-is-using-swap/
On MacOSX, you run top command as well but need to type "o" then "vsize" then ENTER.
Since the year 2015 kernel patch that adds SwapPss (https://lore.kernel.org/patchwork/patch/570506/) one can finally get proportional swap count meaning that if a process has swapped a lot and then it forks, both forked processes will be reported to swap 50% each. And if either then forks, each process is counted 33% of the swapped pages so if you count all those swap usages together, you get real swap usage instead of value multiplied by process count.
In short:
(cd /proc; for pid in [0-9]*; do printf "%5s %6s %s\n" "$pid" "$(awk 'BEGIN{sum=0} /SwapPss:/{sum+=$2} END{print sum}' $pid/smaps)" "$(cat $pid/comm)"; done | sort -k2n,2 -k1n,1)
First column is pid, second column is swap usage in KiB and rest of the line is command being executed. Identical swap counts are sorted by pid.
Above may emit lines such as
awk: cmd. line:1: fatal: cannot open file `15407/smaps' for reading (No such file or directory)
which simply means that process with pid 15407 ended between seeing it in the list for /proc/ and reading the process smaps file. If that matters to you, simply add 2>/dev/null to the end. Note that you'll potentially lose any other possible diagnostics as well.
In real world example case, this changes other tools reporting ~40 MB swap usage for each apache child running on one server to actual usage of between 7-3630 KB really used per child.
That is my one liner:
cat /proc/*/status | grep -E 'VmSwap:|Name:' | grep VmSwap -B1 | cut -d':' -f2 | grep -v '\-\-' | grep -o -E '[a-zA-Z0-9]+.*$' | cut -d' ' -f1 | xargs -n2 echo | sort -k2 -n
The steps in this line are:
Get all the data in /proc/process/status for all processes
Select the fields VmSwap and Name for each
Remove the processes that don't have the VmSwap field
Remove the names of the fields (VmSwap: and Name:)
Remove lines with -- that were added by the previous step
Remove the spaces at the start of the lines
Remove the second part of each process name and " kB" after the swap usage number
Take name and number (process name and swap usage) and put them in one line, one after the other
Sort the lines by the swap usage
I suppose you could get a good guess by running top and looking for active processes using a lot of memory. Doing this programatically is harder---just look at the endless debates about the Linux OOM killer heuristics.
Swapping is a function of having more memory in active use than is installed, so it is usually hard to blame it on a single process. If it is an ongoing problem, the best solution is to install more memory, or make other systemic changes.
Here's a version that outputs the same as the script by #loolotux, but is much faster(while less readable).
That loop takes about 10 secs on my machine, my version takes 0.019 s, which mattered to me because I wanted to make it into a cgi page.
join -t / -1 3 -2 3 \
<(grep VmSwap /proc/*/status |egrep -v '/proc/self|thread-self' | sort -k3,3 --field-separator=/ ) \
<(grep -H '' --binary-files=text /proc/*/cmdline |tr '\0' ' '|cut -c 1-200|egrep -v '/proc/self|/thread-self'|sort -k3,3 --field-separator=/ ) \
| cut -d/ -f1,4,7- \
| sed 's/status//; s/cmdline//' \
| sort -h -k3,3 --field-separator=:\
| tee >(awk -F: '{s+=$3} END {printf "\nTotal Swap Usage = %.0f kB\n",s}') /dev/null
I don't know of any direct answer as how to find exactly what process is using the swap space, however, this link may be helpful. Another good one is over here
Also, use a good tool like htop to see which processes are using a lot of memory and how much swap overall is being used.
iotop is a very useful tool. It gives live stats of I/O and swap usage per process/thread. By default it shows per thread but you can do iotop -P to get per process info. This is not available by default. You may have to install via rpm/apt.
You can use Procpath (author here), to simplify parsing of VmSwap from /proc/$PID/status.
$ procpath record -f stat,cmdline,status -r 1 -d db.sqlite
$ sqlite3 -column db.sqlite \
'SELECT status_name, status_vmswap FROM record ORDER BY status_vmswap DESC LIMIT 5'
Web Content 192136
okular 186872
thunderbird 183692
Web Content 143404
MainThread 86300
You can also plot VmSwap of processes of interest over time like this. Here I'm recording my Firefox process tree while opening a couple tens of tabs along with statrting a memory-hungry application to try to cause it to swap (which wasn't convincing for Firefox, but your kilometrage may vary).
$ procpath record -f stat,cmdline,status -i 1 -d db2.sqlite \
'$..children[?(#.stat.pid == 6029)]'
# interrupt by Ctrl+C
$ procpath plot -d db2.sqlite -q cpu --custom-value-expr status_vmswap \
--title "CPU usage, % vs Swap, kB"
The same answer as #lolotux, but with sorted output:
printf 'Computing swap usage...\n';
swap_usages="$(
SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"`
do
PID="$(printf '%s' "$DIR" | cut -d / -f 3)"
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep VmSwap $DIR/status 2>/dev/null | awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
if (( $SUM > 0 )); then
printf "$SUM KB ($PROGNAME) swapped PID=$PID\\n"
fi
let OVERALL=$OVERALL+$SUM
SUM=0
break
done
printf '9999999999 Overall swap used: %s KB\n' "$OVERALL"
)"
printf '%s' "$swap_usages" | sort -nk1
Example output:
Computing swap usage...
2064 KB (systemd) swapped PID=1
59620 KB (xfdesktop) swapped PID=21405
64484 KB (nemo) swapped PID=763627
66740 KB (teamviewerd) swapped PID=1618
68244 KB (flameshot) swapped PID=84209
763136 KB (plugin_host) swapped PID=1881345
1412480 KB (java) swapped PID=43402
3864548 KB (sublime_text) swapped PID=1881327
9999999999 Overall swap used: 2064 KB
I use this, useful if you only have /proc and nothing else useful. Just set nr to the number of top swappers you want to see and it will tell you the process name, swap footprint(MB) and it's full process line from ps -ef:
nr=10;for pid in $(for file in /proc//status ; do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r|head -${nr}|awk '{ print $2 }');do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' /proc/$pid/status|awk '{print $1" "$2" "$3/1024" MB"}'|sed -e 's/.[0-9]//g';ps -ef|awk "$2==$pid {print}";echo;done
Related
How to efficiently loop through the lines of a file in Bash?
I have a file example.txt with about 3000 lines with a string in each line. A small file example would be: >cat example.txt saudifh sometestPOIFJEJ sometextASLKJND saudifh sometextASLKJND IHFEW foo bar I want to check all repeated lines in this file and output them. The desired output would be: >checkRepetitions.sh found two equal lines: index1=1 , index2=4 , value=saudifh found two equal lines: index1=3 , index2=5 , value=sometextASLKJND I made a script checkRepetions.sh: #!bin/bash size=$(cat example.txt | wc -l) for i in $(seq 1 $size); do i_next=$((i+1)) line1=$(cat example.txt | head -n$i | tail -n1) for j in $(seq $i_next $size); do line2=$(cat example.txt | head -n$j | tail -n1) if [ "$line1" = "$line2" ]; then echo "found two equal lines: index1=$i , index2=$j , value=$line1" fi done done However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds... I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow...
When you do not want to use awk (a good tool for the job, parsing the input only once), you can run through the lines several times. Sorting is expensive, but this solution avoids the loops you tried. grep -Fnxf <(uniq -d <(sort example.txt)) example.txt With uniq -d <(sort example.txt) you find all lines that occur more than once. Next grep will search for these (option -f) complete (-x) lines without regular expressions (-F) and show the line it occurs (-n).
See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow. $ cat tst.awk { val2hits[$0] = val2hits[$0] FS NR } END { for (val in val2hits) { numHits = split(val2hits[val],hits) if ( numHits > 1 ) { printf "found %d equal lines:", numHits for ( hitNr=1; hitNr<=numHits; hitNr++ ) { printf " index%d=%d ,", hitNr, hits[hitNr] } print " value=" val } } } $ awk -f tst.awk file found 2 equal lines: index1=1 , index2=4 , value=saudifh found 2 equal lines: index1=3 , index2=5 , value=sometextASLKJND To give you an idea of the performance difference using a bash script that's written to be as efficient as possible and an equivalent awk script: bash: $ cat tst.sh #!/bin/bash case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac # initialize an associative array, mapping each string to the last line it was seen on declare -A lines=( ) lineNum=0 while IFS= read -r line; do (( ++lineNum )) if [[ ${lines[$line]} ]]; then printf 'Content previously seen on line %s also seen on line %s: %s\n' \ "${lines[$line]}" "$lineNum" "$line" fi lines[$line]=$lineNum done < "$1" $ time ./tst.sh file100k > ou.sh real 0m15.631s user 0m13.806s sys 0m1.029s awk: $ cat tst.awk lines[$0] { printf "Content previously seen on line %s also seen on line %s: %s\n", \ lines[$0], NR, $0 } { lines[$0]=NR } $ time awk -f tst.awk file100k > ou.awk real 0m0.234s user 0m0.218s sys 0m0.016s There are no differences in the output of both scripts: $ diff ou.sh ou.awk $ The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script: awk 'BEGIN{for (i=1; i<=10000; i++) for (j=1; j<=10; j++) print j}' > file100k When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above: $ time ./tst.sh nodups100k > ou.sh real 0m15.179s user 0m13.322s sys 0m1.278s $ time awk -f tst.awk nodups100k > ou.awk real 0m0.078s user 0m0.046s sys 0m0.015s
To demonstrate a relatively efficient (within the limits of the language and runtime) native-bash approach, which you can see running in an online interpreter at https://ideone.com/iFpJr7: #!/bin/bash case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac # initialize an associative array, mapping each string to the last line it was seen on declare -A lines=( ) lineNum=0 while IFS= read -r line; do lineNum=$(( lineNum + 1 )) if [[ ${lines[$line]} ]]; then printf 'found two equal lines: index1=%s, index2=%s, value=%s\n' \ "${lines[$line]}" "$lineNum" "$line" fi lines[$line]=$lineNum done <example.txt Note the use of while read to iterate line-by-line, as described in BashFAQ #1: How can I read a file line-by-line (or field-by-field)?; this permits us to open the file only once and read through it without needing any command substitutions (which fork off subshells) or external commands (which need to be individually started up by the operating system every time they're invoked, and are likewise expensive). The other part of the improvement here is that we're reading the whole file only once -- implementing an O(n) algorithm -- as opposed to running O(n^2) comparisons as the original code did.
Need to reduce the execution time
We are trying to execute below script for finding out the occurrence of a particular word in a log file Need suggestions to optimize the script. Test.log size - Approx to 500 to 600 MB $wc -l Test.log 16609852 Test.log po_numbers - 11 to 12k po's to search $more po_numbers xxx1335 AB1085 SSS6205 UY3347 OP9111 ....and so on Current Execution Time - 2.45 hrs while IFS= read -r po do check=$(grep -c "PO_NUMBER=$po" Test.log) echo $po "-->" $check >>list3 if [ "$check" = "0" ] then echo $po >>po_to_server #else break fi done < po_numbers
You are reading your big file too many times when you execute grep -c "PO_NUMBER=$po" Test.log You can try to split your big file into smaller ones or write your patterns to a file and make grep use it echo -e "PO_NUMBER=$po\n" >> patterns.txt then grep -f patterns.txt Test.log
$ grep -Fwf <(sed 's/.*/PO_NUMBER=&/' po_numbers) Test.log create the lookup file from po_numbers (process substitution) check for literal word matches from the log file. This assumes the searched PO_NUMBER=xxx is a separate word, if not remove -w, also assumes there is no regex but just literal matches, if not remove -F, however both will slow down searches.
Using Grep : sed -e 's|^|PO_NUMBER=|' po_numbers | grep -o -F -f - Test.log | sed -e 's|^PO_NUMBER=||' | sort | uniq -c > list3 grep -o -F -f po_numbers list3 | grep -v -o -F -f - po_numbers > po_to_server Using awk : This awk program might work faster awk '(NR==FNR){ po[$0]=0; next } { for(key in po) { str=$0 po[key]+=gsub("PO_NUMBER="key,"",str) } } END { for(key in po) { if (po[key]==0) {print key >> "po_to_server" } else {print key"-->"po[key] >> "list3" } } }' po_numbers Test.log This does the following : The first line loads the po keys from the file po_numbers The second awk parser, will pars the file for occurences of PO_NUMBER=key per line. (gsub is a function which performs a substitutation and returns the substitution count) In the end we print out the requested output to the requested files. The assumption here is that is might be possible that multiple patterns could occure multiple times on a single line of Test.log Comment: the original order of po_numbers will not be satisfied.
"finding out the occurrence" Not sure if you mean to count the number of occurrences for each searched word or to output the lines in the log that contain at least one of the searched words. This is how you could solve it in the latter case: (cat po_numbers; echo GO; cat Test.log) | \ perl -nle'$r?/$r/&&print:/GO/?($r=qr/#{[join"|",#s]}/):push#s,$_'
Recommended way to batch stdin lines to another repeated command, like xargs but via stdin rather than arguments?
I've got a data-import script that reads lines and adds them to a database, so far so good. Unfortunately something in the script (or its runtime or database library or whatever) is memory leaky, so large imports use monotonically increasing main memory, leading to slow swap and then memory-exhausted process death. Breaking the import up into multiple runs is a workaround; I've been doing that with split then doing a looped execution of the import script on each piece. But I'd rather skip making the split files and this feels like it should be a 1-liner. In fact, it seems there should be an equivalent to xargs that passes the lines through to the specified command on stdin, instead of as arguments. If this hypothetical command were xlines, then I'd expect the following to run the myimport script for each batch of up-to 50,000 lines in giantfile.txt: cat giantfile.txt | xlines -L 50000 myimport Am I missing an xlines-like capability under some other name, or hidden in some other command's options? Or can xlines be done in a few lines of BASH script?
Use GNU Parallel - available here. You will need the --pipe option and also the --block option (which takes a byte size, rather than a line count). Something along the lines of: cat giantfile.txt | parallel -j 8 --pipe --block 4000000 myimport (That's choosing a blocksize of 50,000 lines * 80 bytes = 4000000, which could also be abbreviated 4m here.) If you don't want the jobs to actually run in parallel, change the 8 to 1. Or, you can leave it out altogether and it will run one job per CPU core. You can also avoid the cat, by running parallel ... < giantfile.txt
My approach, without installing parallel, and without writing temporary files: #!/bin/bash [ ! -f "$1" ] && echo "missing file." && exit 1 command="$(which cat)" # just as example, insert your command here totalSize="$(wc -l $1 | cut -f 1 -d ' ')" chunkSize=3 # just for the demo, set to 50000 in your version offset=1 while [ $[ $totalSize + 1 ] -gt $offset ]; do tail -n +$offset $1 | head -n $chunkSize | $command let "offset = $offset + $chunkSize" echo "----" done Test: seq 1000 1010 > testfile.txt ./splitter.sh testfile.txt Output: 1000 1001 1002 ---- 1003 1004 1005 ---- 1006 1007 1008 ---- 1009 1010 ---- This way, the solution remains portable, and the performance is better than with temporary files.
Save following code as a test.sh script. #!/bin/bash tempFile=/tmp/yourtempfile.temp rm -f tempFile > /dev/null 2>&1 declare -i cnt=0 while read line do cnt=$(($cnt+1)) if [[ $cnt < $1 || $cnt == $1 ]]; then echo $line >> tempFile else echo $line >> tempFile cat tempFile | myimport rm -f tempFile > /dev/null 2>&1 cnt=$((0)) fi done < $2 exit 0 Then run ./test.sh 500000 giantfile.txt. I use a tempFile to save specified number of lines, then use your import script dealing with it. I hope it helps.
grep a single word from multiple results
I am trying to write a shell script to monitor file system. script logic is, for each file system from df -H command, read the file system threshold file and get the critical threshold, warning threshold. Based on the condition, it will send notification. Here is my script: #!/bin/sh df -H | grep -vE '^Filesystem|none|boot|tmp|tmpfs' | awk '{ print $5 " " $6 }' | while read $output do echo $output fsuse=$(echo $output | awk '{ print $1}' | cut -d'%' -f1 ) fsname=$(echo $output | awk '{ print $2 }' ) server=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $2}'` fscrit=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $3}'` fswarn=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $4}'` serenv=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $5}'` if [ $fsuse -ge $fscrit ]; then message="CRITICAL:${server}:${serenv}:$fsname Is $fsuse Filled" _notify; elif [ $fsuse -gt $fswarn ] && [ $fsuse -lt $fscrit ]; then message="WARNING: $fsname is $fsuse Filled" _notify; else echo "File system space looks good" fi done Here is /workspace/OSE/scripts/fs_alert: /:hlpdbq001:90:80:QA:dba_mail /dev/shm:hlpdbq001:90:80:QA:dba_mail /boot:hlpdbq001:90:80:QA:dba_mail /home:hlpdbq001:90:80:QA:dba_mail /opt:hlpdbq001:90:80:QA:dba_mail /opt/security:hlpdbq001:90:80:QA:dba_mail /tmp:hlpdbq001:90:80:QA:dba_mail /var:hlpdbq001:90:80:QA:dba_mail /u01/app:hlpdbq001:90:80:QA:dba_mail /u01/app/oracle:hlpdbq001:90:80:QA:dba_mail /oratrace:hlpdbq001:90:80:QA:dba_mail /u01/app/emagent:hlpdbq001:90:80:QA:dba_mail /gg:hlpdbq001:90:80:QA:dba_mail /workspace:hlpdbq001:90:80:QA:dba_mail /dbaudit:hlpdbq001:90:80:QA:dba_mail /tools:hlpdbq001:90:80:QA:dba_mail My problem is when the script is trying to get crit_va, warn_val from the file for /u01 file system, I am getting three results. How do I get/filter one file system at a time? $ df -H|grep /u01 /dev/mapper/datavg-gridbaselv 53G 12G 39G 24% /u01/app /dev/mapper/datavg-rdbmsbaselv 53G 9.6G 41G 20% /u01/app/oracle /dev/mapper/datavg-oemagentlv 22G 980M 20G 5% /u01/app/emagent what is the best way to handle this issue? do i need logic based on Filesystem or Mounted on.
Don't reinvent the wheel. There are tools out there that can do this for your. Try monit for example: http://sysadminman.net/blog/2011/monit-disk-space-monitoring-1716
well, monit is fine ), if you need alternative take a look at df-check - a wrapper for df utility to verify that thresholds are not exceeded , on per partition basis. At least it seems very close to what you started to implement in your bash script, but it's written on perl and has a neat and simple installation layout. ready to use tool. -- Regards PS discloser - I am the tool author
What does the SWAP column in top command stand for? [duplicate]
Under Linux, how do I find out which process is using the swap space more?
The best script I found is on this page : http://northernmost.org/blog/find-out-what-is-using-your-swap/ Here's one variant of the script and no root needed: #!/bin/bash # Get current swap usage for all running processes # Erik Ljungstrom 27/05/2011 # Modified by Mikko Rantalainen 2012-08-09 # Pipe the output to "sort -nk3" to get sorted output # Modified by Marc Methot 2014-09-18 # removed the need for sudo SUM=0 OVERALL=0 for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"` do PID=`echo $DIR | cut -d / -f 3` PROGNAME=`ps -p $PID -o comm --no-headers` for SWAP in `grep VmSwap $DIR/status 2>/dev/null | awk '{ print $2 }'` do let SUM=$SUM+$SWAP done if (( $SUM > 0 )); then echo "PID=$PID swapped $SUM KB ($PROGNAME)" fi let OVERALL=$OVERALL+$SUM SUM=0 done echo "Overall swap used: $OVERALL KB"
Run top then press OpEnter. Now processes should be sorted by their swap usage. Here is an update as my original answer does not provide an exact answer to the problem as pointed out in the comments. From the htop FAQ: It is not possible to get the exact size of used swap space of a process. Top fakes this information by making SWAP = VIRT - RES, but that is not a good metric, because other stuff such as video memory counts on VIRT as well (for example: top says my X process is using 81M of swap, but it also reports my system as a whole is using only 2M of swap. Therefore, I will not add a similar Swap column to htop because I don't know a reliable way to get this information (actually, I don't think it's possible to get an exact number, because of shared pages).
Here's another variant of the script, but meant to give more readable output (you need to run this as root to get exact results): #!/bin/bash # find-out-what-is-using-your-swap.sh # -- Get current swap usage for all running processes # -- # -- rev.0.3, 2012-09-03, Jan Smid - alignment and intendation, sorting # -- rev.0.2, 2012-08-09, Mikko Rantalainen - pipe the output to "sort -nk3" to get sorted output # -- rev.0.1, 2011-05-27, Erik Ljungstrom - initial version SCRIPT_NAME=`basename $0`; SORT="kb"; # {pid|kB|name} as first parameter, [default: kb] [ "$1" != "" ] && { SORT="$1"; } [ ! -x `which mktemp` ] && { echo "ERROR: mktemp is not available!"; exit; } MKTEMP=`which mktemp`; TMP=`${MKTEMP} -d`; [ ! -d "${TMP}" ] && { echo "ERROR: unable to create temp dir!"; exit; } >${TMP}/${SCRIPT_NAME}.pid; >${TMP}/${SCRIPT_NAME}.kb; >${TMP}/${SCRIPT_NAME}.name; SUM=0; OVERALL=0; echo "${OVERALL}" > ${TMP}/${SCRIPT_NAME}.overal; for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"`; do PID=`echo $DIR | cut -d / -f 3` PROGNAME=`ps -p $PID -o comm --no-headers` for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'` do let SUM=$SUM+$SWAP done if (( $SUM > 0 )); then echo -n "."; echo -e "${PID}\t${SUM}\t${PROGNAME}" >> ${TMP}/${SCRIPT_NAME}.pid; echo -e "${SUM}\t${PID}\t${PROGNAME}" >> ${TMP}/${SCRIPT_NAME}.kb; echo -e "${PROGNAME}\t${SUM}\t${PID}" >> ${TMP}/${SCRIPT_NAME}.name; fi let OVERALL=$OVERALL+$SUM SUM=0 done echo "${OVERALL}" > ${TMP}/${SCRIPT_NAME}.overal; echo; echo "Overall swap used: ${OVERALL} kB"; echo "========================================"; case "${SORT}" in name ) echo -e "name\tkB\tpid"; echo "========================================"; cat ${TMP}/${SCRIPT_NAME}.name|sort -r; ;; kb ) echo -e "kB\tpid\tname"; echo "========================================"; cat ${TMP}/${SCRIPT_NAME}.kb|sort -rh; ;; pid | * ) echo -e "pid\tkB\tname"; echo "========================================"; cat ${TMP}/${SCRIPT_NAME}.pid|sort -rh; ;; esac rm -fR "${TMP}/";
Use smem smem -s swap -r Here is a link which tells you both how to install it and how to use it: http://www.cyberciti.biz/faq/linux-which-process-is-using-swap/
It's not entirely clear if you mean you want to find the process who has most pages swapped out or process who caused most pages to be swapped out. For the first you may run top and order by swap (press 'Op'), for the latter you can run vmstat and look for non-zero entries for 'so'.
Another script variant avoiding the loop in shell: #!/bin/bash grep VmSwap /proc/[0-9]*/status | awk -F':' -v sort="$1" ' { split($1,pid,"/") # Split first field on / split($3,swp," ") # Split third field on space cmdlinefile = "/proc/"pid[3]"/cmdline" # Build the cmdline filepath getline pname[pid[3]] < cmdlinefile # Get the command line from pid swap[pid[3]] = sprintf("%6i %s",swp[1],swp[2]) # Store the swap used (with unit to avoid rebuilding at print) sum+=swp[1] # Sum the swap } END { OFS="\t" # Change the output separator to tabulation print "Pid","Swap used","Command line" # Print header if(sort) { getline max_pid < "/proc/sys/kernel/pid_max" for(p=1;p<=max_pid;p++) { if(p in pname) print p,swap[p],pname[p] # print the values } } else { for(p in pname) { # Loop over all pids found print p,swap[p],pname[p] # print the values } } print "Total swap used:",sum # print the sum }' Standard usage is script.sh to get the usage per program with random order (down to how awk stores its hashes) or script.sh 1 to sort the output by pid. I hope I've commented the code enough to tell what it does.
Yet two more variants: Because top or htop could be not installed on small systems, browsing /proc stay always possible. Even on small systems, you will found a shell... A shell variant! (Not bash only) This is exactly same than lolotux script, but without any fork to grep, awk or ps. This is a lot quicker! And as bash is one of the poorest shell regarding performance, a little work was done to ensure this script will run well under dash, busybox and some other. Then, (thanks to Stéphane Chazelas,) become a lot quicker again! #!/bin/sh # Get current swap usage for all running processes # Felix Hauri 2016-08-05 # Rewritted without fork. Inspired by first stuff from # Erik Ljungstrom 27/05/2011 # Modified by Mikko Rantalainen 2012-08-09 # Pipe the output to "sort -nk3" to get sorted output # Modified by Marc Methot 2014-09-18 # removed the need for sudo OVERALL=0 for FILE in /proc/[0-9]*/status ;do SUM=0 while read FIELD VALUE;do case $FIELD in Pid ) PID=$VALUE ;; Name ) PROGNAME="$VALUE" ;; VmSwap ) SUM=${VALUE%% *} ; break ;; esac done <$FILE [ $SUM -gt 0 ] && printf "PID: %9d swapped: %11d KB (%s)\n" $PID $SUM "$PROGNAME" OVERALL=$((OVERALL+SUM)) done printf "Total swapped memory: %14u KB\n" $OVERALL Don't forgot to double quote "$PROGNAME" ! See Stéphane Chazelas's comment: read FIELD PROGNAME < <( perl -ne 'BEGIN{$0="/*/*/../../*/*"} print if /^Name/' /proc/self/status ) echo $FIELD "$PROGNAME" Don't try echo $PROGNAME without double quote on sensible system, and be ready to kill current shell before! And a perl version As this become a not so simple script, time is comming to write a dedicated tool by using more efficient language. #!/usr/bin/perl -w use strict; use Getopt::Std; my ($tot,$mtot)=(0,0); my %procs; my %opts; getopt('', \%opts); sub sortres { return $a <=> $b if $opts{'p'}; return $procs{$a}->{'cmd'} cmp $procs{$b}->{'cmd'} if $opts{'c'}; return $procs{$a}->{'mswap'} <=> $procs{$b}->{'mswap'} if $opts{'m'}; return $procs{$a}->{'swap'} <=> $procs{$b}->{'swap'}; }; opendir my $dh,"/proc"; for my $pid (grep {/^\d+$/} readdir $dh) { if (open my $fh,"</proc/$pid/status") { my ($sum,$nam)=(0,""); while (<$fh>) { $sum+=$1 if /^VmSwap:\s+(\d+)\s/; $nam=$1 if /^Name:\s+(\S+)/; } if ($sum) { $tot+=$sum; $procs{$pid}->{'swap'}=$sum; $procs{$pid}->{'cmd'}=$nam; close $fh; if (open my $fh,"</proc/$pid/smaps") { $sum=0; while (<$fh>) { $sum+=$1 if /^Swap:\s+(\d+)\s/; }; }; $mtot+=$sum; $procs{$pid}->{'mswap'}=$sum; } else { close $fh; }; }; }; map { printf "PID: %9d swapped: %11d (%11d) KB (%s)\n", $_, $procs{$_}->{'swap'}, $procs{$_}->{'mswap'}, $procs{$_}->{'cmd'}; } sort sortres keys %procs; printf "Total swapped memory: %14u (%11u) KB\n", $tot,$mtot; could by run with one of -c sort by command name -p sort by pid -m sort by swap values by default, output is sorted by status's vmsize
The top command also contains a field to display the number of page faults for a process. The process with maximum page faults would be the process which is swapping most. For long running daemons it might be that they incur large number of page faults at the beginning and the number does not increase later on. So we need to observe whether the page faults is increasing.
I adapted a different script on the web to this long one-liner: { date;for f in /proc/[0-9]*/status; do awk '{k[$1]=$2} END { if (k["VmSwap:"]) print k["Pid:"],k["Name:"],k["VmSwap:"];}' $f 2>/dev/null; done | sort -n ; } Which I then throw into a cronjob and redirect output to a logfile. The information here is the same as accumulating the Swap: entries in the smaps file, but if you want to be sure, you can use: { date;for m in /proc/*/smaps;do awk '/^Swap/ {s+=$2} END { if (s) print FILENAME,s }' $m 2>/dev/null; done | tr -dc ' [0-9]\n' |sort -k 1n; } The output of this version is in two columns: pid, swap amount. In the above version, the tr strips the non-numeric components. In both cases, the output is sorted numerically by pid.
Gives totals and percentages for process using swap smem -t -p Source : https://www.cyberciti.biz/faq/linux-which-process-is-using-swap/
On MacOSX, you run top command as well but need to type "o" then "vsize" then ENTER.
Since the year 2015 kernel patch that adds SwapPss (https://lore.kernel.org/patchwork/patch/570506/) one can finally get proportional swap count meaning that if a process has swapped a lot and then it forks, both forked processes will be reported to swap 50% each. And if either then forks, each process is counted 33% of the swapped pages so if you count all those swap usages together, you get real swap usage instead of value multiplied by process count. In short: (cd /proc; for pid in [0-9]*; do printf "%5s %6s %s\n" "$pid" "$(awk 'BEGIN{sum=0} /SwapPss:/{sum+=$2} END{print sum}' $pid/smaps)" "$(cat $pid/comm)"; done | sort -k2n,2 -k1n,1) First column is pid, second column is swap usage in KiB and rest of the line is command being executed. Identical swap counts are sorted by pid. Above may emit lines such as awk: cmd. line:1: fatal: cannot open file `15407/smaps' for reading (No such file or directory) which simply means that process with pid 15407 ended between seeing it in the list for /proc/ and reading the process smaps file. If that matters to you, simply add 2>/dev/null to the end. Note that you'll potentially lose any other possible diagnostics as well. In real world example case, this changes other tools reporting ~40 MB swap usage for each apache child running on one server to actual usage of between 7-3630 KB really used per child.
That is my one liner: cat /proc/*/status | grep -E 'VmSwap:|Name:' | grep VmSwap -B1 | cut -d':' -f2 | grep -v '\-\-' | grep -o -E '[a-zA-Z0-9]+.*$' | cut -d' ' -f1 | xargs -n2 echo | sort -k2 -n The steps in this line are: Get all the data in /proc/process/status for all processes Select the fields VmSwap and Name for each Remove the processes that don't have the VmSwap field Remove the names of the fields (VmSwap: and Name:) Remove lines with -- that were added by the previous step Remove the spaces at the start of the lines Remove the second part of each process name and " kB" after the swap usage number Take name and number (process name and swap usage) and put them in one line, one after the other Sort the lines by the swap usage
I suppose you could get a good guess by running top and looking for active processes using a lot of memory. Doing this programatically is harder---just look at the endless debates about the Linux OOM killer heuristics. Swapping is a function of having more memory in active use than is installed, so it is usually hard to blame it on a single process. If it is an ongoing problem, the best solution is to install more memory, or make other systemic changes.
Here's a version that outputs the same as the script by #loolotux, but is much faster(while less readable). That loop takes about 10 secs on my machine, my version takes 0.019 s, which mattered to me because I wanted to make it into a cgi page. join -t / -1 3 -2 3 \ <(grep VmSwap /proc/*/status |egrep -v '/proc/self|thread-self' | sort -k3,3 --field-separator=/ ) \ <(grep -H '' --binary-files=text /proc/*/cmdline |tr '\0' ' '|cut -c 1-200|egrep -v '/proc/self|/thread-self'|sort -k3,3 --field-separator=/ ) \ | cut -d/ -f1,4,7- \ | sed 's/status//; s/cmdline//' \ | sort -h -k3,3 --field-separator=:\ | tee >(awk -F: '{s+=$3} END {printf "\nTotal Swap Usage = %.0f kB\n",s}') /dev/null
I don't know of any direct answer as how to find exactly what process is using the swap space, however, this link may be helpful. Another good one is over here Also, use a good tool like htop to see which processes are using a lot of memory and how much swap overall is being used.
iotop is a very useful tool. It gives live stats of I/O and swap usage per process/thread. By default it shows per thread but you can do iotop -P to get per process info. This is not available by default. You may have to install via rpm/apt.
You can use Procpath (author here), to simplify parsing of VmSwap from /proc/$PID/status. $ procpath record -f stat,cmdline,status -r 1 -d db.sqlite $ sqlite3 -column db.sqlite \ 'SELECT status_name, status_vmswap FROM record ORDER BY status_vmswap DESC LIMIT 5' Web Content 192136 okular 186872 thunderbird 183692 Web Content 143404 MainThread 86300 You can also plot VmSwap of processes of interest over time like this. Here I'm recording my Firefox process tree while opening a couple tens of tabs along with statrting a memory-hungry application to try to cause it to swap (which wasn't convincing for Firefox, but your kilometrage may vary). $ procpath record -f stat,cmdline,status -i 1 -d db2.sqlite \ '$..children[?(#.stat.pid == 6029)]' # interrupt by Ctrl+C $ procpath plot -d db2.sqlite -q cpu --custom-value-expr status_vmswap \ --title "CPU usage, % vs Swap, kB"
The same answer as #lolotux, but with sorted output: printf 'Computing swap usage...\n'; swap_usages="$( SUM=0 OVERALL=0 for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"` do PID="$(printf '%s' "$DIR" | cut -d / -f 3)" PROGNAME=`ps -p $PID -o comm --no-headers` for SWAP in `grep VmSwap $DIR/status 2>/dev/null | awk '{ print $2 }'` do let SUM=$SUM+$SWAP done if (( $SUM > 0 )); then printf "$SUM KB ($PROGNAME) swapped PID=$PID\\n" fi let OVERALL=$OVERALL+$SUM SUM=0 break done printf '9999999999 Overall swap used: %s KB\n' "$OVERALL" )" printf '%s' "$swap_usages" | sort -nk1 Example output: Computing swap usage... 2064 KB (systemd) swapped PID=1 59620 KB (xfdesktop) swapped PID=21405 64484 KB (nemo) swapped PID=763627 66740 KB (teamviewerd) swapped PID=1618 68244 KB (flameshot) swapped PID=84209 763136 KB (plugin_host) swapped PID=1881345 1412480 KB (java) swapped PID=43402 3864548 KB (sublime_text) swapped PID=1881327 9999999999 Overall swap used: 2064 KB
I use this, useful if you only have /proc and nothing else useful. Just set nr to the number of top swappers you want to see and it will tell you the process name, swap footprint(MB) and it's full process line from ps -ef: nr=10;for pid in $(for file in /proc//status ; do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r|head -${nr}|awk '{ print $2 }');do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' /proc/$pid/status|awk '{print $1" "$2" "$3/1024" MB"}'|sed -e 's/.[0-9]//g';ps -ef|awk "$2==$pid {print}";echo;done