Bash script with ssh ocassional bug - linux

I have a script which is doing ssh to a server and checking heap memory utilization of some java processes. However every now and then it seems the script does not have the expected behaviour.
The relevant part is the below :
else
result_line=$(ssh -q -T "$service_host" 2> /dev/null <<EOF
#Get the -Xmx value (eg 4G or 128M etc)
xmx=\$(ps aux | grep $service_pid | grep -v grep | awk -F'Xmx' '{print \$NF}' |cut -d' ' -f1)
#GB or MB ?
m_g=\${xmx: -1}
if [ \$m_g == "m" ] || [ \$m_g == "M" ]
then
#It's MB, no conversion needed
service_xmx=\${xmx::len-1}
else
#It's GB, so convert to MB
gb=\${xmx::len-1}
service_xmx=\$(awk "BEGIN {print \$gb * 1024}")
fi
name=$service_name
jstat_output=\$(jstat -gc $service_pid | grep -v S0C)
a=( \$jstat_output )
usage_KB=\$(echo "scale=2;\${a[2]} + \${a[3]} + \${a[5]} + \${a[7]} + \${a[9]} + \${a[11]}" | bc)
usage_MB=\$(echo "scale=2; \$usage_KB / 1024" | bc)
usage_perc=\$(awk "BEGIN {printf \"%.1f\n\", \$usage_MB / \$service_xmx * 100}")
paste <(printf %-32s "\$name") <(printf %16s "$service_host") <(printf %14s "\$usage_MB") <(printf %14s "\$service_xmx") <(printf %15s "\$usage_perc %%")
EOF
)
printf "$result_line\n"
fi
The script seems to occasionally failing to convert a service Xmx from GB to MB, and as such giving wrong heap utilization percentage. Specifically it happened on a service with -Xmx4g , and did not convert it to MB so $service_xmx ended up with value 4. This rarely happens, but happens.. The part doing the conversion is the below :
else
gb=\${xmx::len-1}
service_xmx=\$(awk "BEGIN {print \$gb * 1024}")
The issue is not systematic, I am unable to reproduce and really struggling to understand why this part fails occassionally... any idea anyone ?
Many thanks

this part: "ps aux | grep $service_pid" might not be very deterministic; add more prints in your code to see what is happening

Related

Identify processes running more than 3 hrs in linux

I want to find out processes running more than 3 hrs, I have written a command for this but it's not returning expected output
ps -u <user> -o pid,stime,pcpu,pmem,etime,cmd --sort=start_time | \
grep <searchString> | grep -v grep| awk '{print $5}' | \
sed 's/:|-/ /g;'| awk '{print $4" "$3" "$2" "$1"}' | \
awk '$1+$2*60+$3*3600+$4*86400 > 10800'
but it's printing the values of etime in output. But expected output is, command should print the values of "pid,stime,pcpu,pmem,etime,cmd"
I am not able to find exact issue with this.
You are executing "awk '{print $5}'" which is taking in the input and printing out only column 5 which in your case is "etime" , everything from this point on is lost.
If your system supports etimes (notice the s on the end), you can easily do this with
ps -eo pid,etimes,etime,comm,user,tty | awk '{if ( $2>10800) print $0}'
on a system not supporting etimes which has a standard output of etime which hh:mm:ss or just mm:ss if no hours have passed
ps -eo pid,etime,comm,user,tty | awk '{seconds_old=10800 ; split($2,a,":",sep) ; if(length(a) < 3) b = (a[1] *60) + (a[2]) ; else b=((a[1]*3600) + (a[2] *60) + (a[3])) ; if(b > seconds_old ) print $0}'
Adjust "seconds_old" to change the age you want to test for:
There are various other methods of doing this using Find for example:
explained here:
https://serverfault.com/questions/181477/how-do-i-kill-processes-older-than-t
However, the solution should match your expected output
Try this:
ps -u <user> -o pid,stime,pcpu,pmem,etime=,cmd --sort=start_time|grep <searchString>|while read z;do tago=$(echo $z|awk '{print $5}'|sed -E 's/(:|-)/ /g'| awk '{print $4+$3*60+$2*3600+$1*86400}');if [ $tago -ge 10800 ];then echo $z;fi;done
It prints only processes >= 10800 secs old.
You can readjust the output further to fit your needs.
Able to find running process for more than 3 hrs with below command.
ps -u <user> -o pid,stime,pcpu,pmem,etime,cmd --sort=start_time |grep -v grep|awk 'substr($0,23,2) > 3'

LINUX- Checking the internal CPU, RAM usage of JVM's

So I am trying to check the internal CPU and RAM usage from JVM's.
The set up is that we have a single server which hosts 39 JVM's each running its own service and has it's own unique CPU & RAM allocation. So for example it could be set up like the following:
CPU(Cores) RAM(MB)
JVM1 2 300
JVM2 1 50
JVM3 5 1024
These are fictional as I don't have the actual values to hand.
I know the PID's for each of the JVM's but I am wondering how would I see the CPU and RAM usage of each JVM on it's own dissregarding the Host systems usage. Also is it possible to pass multiple PID's into a jstat -gc as I will be looking to script this?.
I know that if I use:
ps -p <PID> -o %CPU %RAM
That will give me the CPU and RAM of that process on the host machine.
Any Help is greatly appreciated.
After playing around for a while I came up with the following script which pulls the memory usage and total memory in use to 2DP:
Script
#!/bin/sh
service=$(mktemp)
jstatop=$(mktemp)
ServiceStatsPre=$(mktemp)
datet=$(date +%Y-%m-%d)
hourt=$(date +%H)
ps -ef | grep java | grep Service | awk '{key=substr($9,31,match($9,"#")-31); serv[key]++}END{for(name in serv){if($9 ~ /Service/){print name}}}'>$service
printf "%-8sService\t\t\t\t%7s PID\t%7s Used\t%7s Total\n" > $ServiceStatsPre
filename=$service
while read -r line; do
if [ ! -z $line ]; then
pid=$(ps -ef | grep java | grep Service |awk -v svc="$line" '{if (match($9,svc)){print $2}}')
rncnt=0
rnag=1
while [ $rnag -eq 1 ]; do
jstat -gc $pid > $jstatop
if [ $? -ne 0 ]; then
sleep 5;
rncnt++;
else
rnag=0
fi
if [ $rncnt -eq 5 ]; then
rnag=0
fi
done
cat $jstatop | awk '{if (NR !=1) print}'|awk -v pid="$pid" -v svc="$line" -v d="$datet" -v h="$hourt" '{printf("%-40s %7d %6.2fMB %6.2fMB %11s %3s\n",svc,pid,($1+$5+$7)/1024,($3+$4+$6+$8)/1024,d,h)}' >> $ServiceStatsPre
fi
done < $filename
#printf "Date,Hour,Service,Used,Total\n" > Service_Stats.csv #Uncomment this line on initial Run to create file with headders
cat $ServiceStatsPre | awk '{if (NR !=1) print}' |
awk '{
printf("%-1s,%1s,%1s,%4.2f,%4.2f\n",$5,$6,$1,$3,$4);
}' >> Service_Stats.csv
rm $service;
rm $jstatop;
rm $ServiceStatsPre;
Breakdown
ps -ef | grep java | grep Service | awk '{key=substr($9,31,match($9,"#")-31); serv[key]++}END{for(name in serv){if($9 ~ /Service/){print name}}}'>$service
This is returning the name of the Service and putting it to a tempary file $service
printf "%-8sService\t\t\t\t%7s PID\t%7s Used\t%7s Total\n" > $ServiceStatsPre
This is just creating Headings in a Temp file $ServiceStatsPre (I know this isn't needed but I was originally using it to debug to a file not a temp file)
Outer While loop
filename=$service
if [ ! -z $line ]; then
while read -r line; do
pid=$(ps -ef | grep java | grep Service |awk -v svc="$line" '{if (match($9,svc)){print $2}}')
rncnt=0
rnag=1
...
cat $jstatop | awk '{if (NR !=1) print}'|awk -v pid="$pid" -v svc="$line" -v d="$datet" -v h="$hourt" '{printf("%-40s %7d %6.2fMB %6.2fMB %11s %3s\n",svc,pid,($1+$5+$7)/1024,($3+$4+$6+$8)/1024,d,h)}' >> $ServiceStatsPre
fi
done < $filename
This is first checking to see if the line that it is passing in is not empty (if [ ! -z $line ]; then) then it is extracting the PID for each of the services in the temp file $service and is setting a check (rnag) + retry counter (rncnt). The final part of this outer while loop is calculating the stats, memory usage in MB (($1+$5+$7)/1024), Total Memory used (($3+$4+$6+$8)/1024) as well as retuning the PID,Service Name,Date and Hour.
Inner While Loop
...
while [ $rnag -eq 1 ]; do
jstat -gc $pid > $jstatop
if [ $? -ne 0 ]; then
sleep 5;
rncnt++;
else
rnag=0
fi
if [ $rncnt -eq 5 ]; then
rnag=0
fi
done
...
This is to basically ensure that the jstat is running correctly and isn't returning an error.
The final part of this script:
#printf "Date,Hour,Service,Used,Total\n" > Service_Stats.csv #Uncomment this line on initial Run to create file with headders
cat $ServiceStatsPre | awk '{if (NR !=1) print}' |
awk '{
printf("%-1s,%1s,%1s,%4.2f,%4.2f\n",$5,$6,$1,$3,$4);
}' >> Service_Stats.csv
is just to resolve the formatting and put it into a format so that it can be used in a CSV.
Output
This script out puts like the following:
Date,Hour,Service,Used,Total
2016-07-03,07,undMFSAdapterServiceVM,331.00,188.87
2016-07-03,07,entSCSServiceVM,332.50,278.19
2016-07-03,07,kServiceVM,457.50,132.91
2016-07-03,07,ServiceTuTrackVm,432.00,282.66
2016-07-03,07,ter1WMSAdapterServiceVm,233.00,77.02
2016-07-03,07,kingMFSAdapterServiceVM,451.50,126.69
2016-07-03,07,erBuilderServiceVM,261.50,211.27
2016-07-03,07,ter3MFSAdapterServiceVM,449.50,210.23
2016-07-03,07,rServiceVM1,1187.00,529.26
2016-07-03,07,rServiceVM2,597.50,398.43
2016-07-03,07,rServiceVM3,2786.00,819.30
2016-07-03,07,rServiceVM4,451.50,163.13
2016-07-03,07,MessagingServiceVm,457.50,357.11
2016-07-03,07,viceVM,444.50,263.59
2016-07-03,07,ServiceVM,1910.50,909.19
2016-07-03,07,undPackingMFSAdapterServiceVM,208.00,113.51
2016-07-03,07,gisticLockServiceVM,245.00,173.05
2016-07-03,07,kingMFSAdapterServiceVM,781.50,327.13
2016-07-03,07,ferWMSAdapterServiceVm,196.00,84.02
2016-07-03,07,geMFSAdapterServiceVM,499.50,256.91
2016-07-03,07,ferMFSAdapterServiceVM,456.50,246.89
2016-07-03,07,kingWMSAdapterServiceVm,195.00,73.70
2016-07-03,07,AdapterServiceVm,149.50,72.62
2016-07-03,07,ter2MFSAdapterServiceVM,455.00,136.02
2016-07-03,07,ionServiceVM,484.00,240.46
2016-07-03,07,ter3WMSAdapterServiceVm,266.00,138.70
2016-07-03,07,iceVm,135.50,106.07
2016-07-03,07,viceVm,3317.00,1882.15
2016-07-03,07,entBCSServiceVM,356.50,143.93
2016-07-03,07,ServiceVm,951.00,227.12
2016-07-03,07,lingServiceVM,145.50,76.61
2016-07-03,07,entDBHServiceVM,182.50,4.63
2016-07-03,07,kingWMSAdapterServiceVm,208.00,103.13
2016-07-03,07,gingServiceVM,1529.50,235.84
2016-07-03,07,ServiceVM,249.50,131.78
2016-07-03,07,ter1MFSAdapterServiceVM,453.00,394.11
2016-07-03,07,AdapterServiceVM,461.00,208.41
2016-07-03,07,ter2WMSAdapterServiceVm,178.50,79.93
2016-07-03,07,AdapterServiceVm,395.00,131.08
2016-07-03,07,ingServiceVM,184.50,126.28
The $3 has been trimmed just to hide service names etc..
Helpful Links
A nice explanation of jstat -gc output

Linux - Extract core utilization for a single process to a file

I am trying to figure out how to extract the cpu core utilization for a SINGLE process in linux and parse it. I know that I can get the overall core utilization via top and then press "1". I am already able to parse that. However, now I want to do the same thing for a single process. I tried it with ps and calculating the core utilization myself but I am not so sure if my script is accurate enough, something seems off. (Note that this version calculates overall core utilization since it is WIP) I get errors like this after a while in my terminal:
test.sh: line 31: +: syntax error: operand expected (error token is
"+")
I cannot figure out why this error just randomly occurs.
#!/bin/env bash
read -p "Enter PID to observe:" pid
run=false
ps -p $pid -L -o cputime,etime,psr,pcpu
for (( u = 1 ; u <= 100 ; u++ ))
do
lines=$(ps -p $pid -L -o psr,pcpu | awk 'END{print NR}')
for (( i=1; i<=$lines; i++))
do
core=$(ps -p $pid -L -o psr,pcpu | awk 'NR=='$i'{print $1}')
if [ $run == false ]
then cpuTimeS=+$(ps -p $pid -L -o cputime,etime | awk 'NR=='$i'{print $1}' | awk -F : '{ printf("%.2f\n", $1*60+$2*60+$3); }')
elapsedTimeS=+$(ps -p $pid -L -o cputime,etime | awk 'NR=='$i'{print $2}' | awk -F : '{ printf("%.2f\n", $1*60+$2*60+$3); }')
cpuTimeSi=${cpuTimeS%.*}
elapsedTimeSi=${elapsedTimeS%.*}
cpuTimeSiResult=$(( cpuTimeSi + cpuTimeSiResult ))
elapsedTimeSiResult=$(( elapsedTimeSi + elapsedTimeSiResult ))
else
cpuTimeE=+$(ps -p $pid -L -o cputime,etime | awk 'NR=='$i'{print $1}' | awk -F : '{ printf("%.2f\n", $1*60+$2*60+$3); }')
elapsedTimeE=+$(ps -p $pid -L -o cputime,etime | awk 'NR=='$i'{print $2}' | awk -F : '{ printf("%.2f\n", $1*60+$2*60+$3); }')
cpuTimeEi=${cpuTimeE%.*}
elapsedTimeEi=${elapsedTimeE%.*}
cpuTimeEiResult=$(( cpuTimeEi + cpuTimeEiResult ))
elapsedTimeEiResult=$(( elapsedTimeEi + elapsedTimeEiResult ))
fi
done
if [ "$run" = true ]
then result=$( echo "scale=2; ($cpuTimeEiResult - $cpuTimeSiResult) / ($elapsedTimeEiResult - $elapsedTimeSiResult) * 100.0" | bc)
echo "RESULT:" $result
echo "cpuTimeSTART:" $cpuTimeSiResult
echo "elapsedTimeSTART:" $elapsedTimeSiResult
echo "cpuTimeEND:" $cpuTimeEiResult
echo "elapsedTimeEND:" $elapsedTimeEiResult
fi
sleep 1
if [ "$run" = false ]
then run=true
else
run=false
fi
done
Are there any ideas on how I could solve this better?
I am glad for any advice
Looking at your actual request to get the CPU utilization from a particular process; this could be done in less lines; e.g. below is the CPU usage after giving a seq command.
#using the process name
$ ps -C seq -o %cpu
%CPU
10.4
#using the process id
$ ps -p 5710 -o %cpu
%CPU
10.4

grep a single word from multiple results

I am trying to write a shell script to monitor file system. script logic is,
for each file system from df -H command, read the file system threshold file and get the critical threshold, warning threshold. Based on the condition, it will send notification.
Here is my script:
#!/bin/sh
df -H | grep -vE '^Filesystem|none|boot|tmp|tmpfs' | awk '{ print $5 " " $6 }' | while read $output
do
echo $output
fsuse=$(echo $output | awk '{ print $1}' | cut -d'%' -f1 )
fsname=$(echo $output | awk '{ print $2 }' )
server=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $2}'`
fscrit=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $3}'`
fswarn=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $4}'`
serenv=`cat /workspace/OSE/scripts/fs_alert|grep -w $fsname|awk -F":" '{print $5}'`
if [ $fsuse -ge $fscrit ]; then
message="CRITICAL:${server}:${serenv}:$fsname Is $fsuse Filled"
_notify;
elif [ $fsuse -gt $fswarn ] && [ $fsuse -lt $fscrit ]; then
message="WARNING: $fsname is $fsuse Filled"
_notify;
else
echo "File system space looks good"
fi
done
Here is /workspace/OSE/scripts/fs_alert:
/:hlpdbq001:90:80:QA:dba_mail
/dev/shm:hlpdbq001:90:80:QA:dba_mail
/boot:hlpdbq001:90:80:QA:dba_mail
/home:hlpdbq001:90:80:QA:dba_mail
/opt:hlpdbq001:90:80:QA:dba_mail
/opt/security:hlpdbq001:90:80:QA:dba_mail
/tmp:hlpdbq001:90:80:QA:dba_mail
/var:hlpdbq001:90:80:QA:dba_mail
/u01/app:hlpdbq001:90:80:QA:dba_mail
/u01/app/oracle:hlpdbq001:90:80:QA:dba_mail
/oratrace:hlpdbq001:90:80:QA:dba_mail
/u01/app/emagent:hlpdbq001:90:80:QA:dba_mail
/gg:hlpdbq001:90:80:QA:dba_mail
/workspace:hlpdbq001:90:80:QA:dba_mail
/dbaudit:hlpdbq001:90:80:QA:dba_mail
/tools:hlpdbq001:90:80:QA:dba_mail
My problem is when the script is trying to get crit_va, warn_val from the file for /u01 file system, I am getting three results. How do I get/filter one file system at a time?
$ df -H|grep /u01
/dev/mapper/datavg-gridbaselv 53G 12G 39G 24% /u01/app
/dev/mapper/datavg-rdbmsbaselv 53G 9.6G 41G 20% /u01/app/oracle
/dev/mapper/datavg-oemagentlv 22G 980M 20G 5% /u01/app/emagent
what is the best way to handle this issue?
do i need logic based on Filesystem or Mounted on.
Don't reinvent the wheel. There are tools out there that can do this for your. Try monit for example:
http://sysadminman.net/blog/2011/monit-disk-space-monitoring-1716
well, monit is fine ), if you need alternative take a look at df-check - a wrapper for df utility to verify that thresholds are not exceeded , on per partition basis. At least it seems very close to what you started to implement in your bash script, but it's written on perl and has a neat and simple installation layout. ready to use tool.
-- Regards
PS discloser - I am the tool author

Idle time of a process in Linux

I need to calculate CPU usage (user mode, system mode, idle time) of a process in Linux.
I am able to calculate usage in user and system mode using utime and stime values from /proc/PID/stat, but I found nothing which is related to idle time.
I know I can get idle time from /proc/stat but this value is related to machine, not for particular process.
Is it possible to calculate idle time of a process knowing its PID (reading data from /proc directory)?
I don't know much about it but maybe the following works:
1) Get the process start up time. Im sure thats possible
2) Generate time difference (dTime = CurrentTime - TimeProcessStarted)
3) Substract the time the process is running ( dTime - (usageSystemMode + usageUserMode))
Hope this helps! :D
it's too late but, I guessed this command useful:
IFS=$'\n';for i in `ps -eo uname:20,pid,cmd | grep -v "USER\|grep\|root"`; \
do if [ $(id -g `echo $i | cut -d" " -f1`) -gt 1000 ] && \
[ $(echo $((($(date +%s) - $(date -d "$(ll -u \
--time-style=+"%y-%m-%d %H:%M:%S" /proc/$(echo $i | \
awk '{print $2}')/cwd | awk '{print $6" "$7}')" +%s))/3600))) >=1 ]; \
then echo $i; fi; done
to use it in bash file:
#!/bin/bash
IFS=$'\n'
for i in `ps -eo uname:20,pid,cmd | grep -v "USER\|grep\|root"`
do
Name="`echo $i | cut -d' ' -f1`"
Id="$(id -g $Name)"
Pid="`echo $i | awk '{print $2}'`"
Time1=$(date +%s)
Time2=$(date -d "$(/usr/bin/ls -lu --time-style=+"%y-%m-%d %H:%M:%S" \
/proc/$Pid/cwd | awk '{print $6" "$7}')" +%s)/3600
Time=$Time1-$Time2
if [ $Id -gt 1000 ] && [ $Time >=1 ]
then
echo $i
fi
done
you could change grep -v "grep\|root" as you wish.
this one line command list all processes which not root owner or system users.

Resources