Writing bash code for performance standards

Writing bash code for performance standards - linux

Is there a better way to rewrite this code to get enhanced performance?
If you were to get a bunch of IPs the system seems to hang.
TMP_PREFIX='/tmp/synd'
TMP_FILE="mktemp $TMP_PREFIX.XXXXXXXX"
BANNED_IP_MAIL=`$TMP_FILE`
BANNED_IP_LIST=`$TMP_FILE`
echo "Banned the following ip addresses on `date`" > $BANNED_IP_MAIL
echo >> $BANNED_IP_MAIL
BAD_IP_LIST=`$TMP_FILE`
netstat -ntu | grep SYN_RECV | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr > $BAD_IP_LIST
cat $BAD_IP_LIST
if [ $KILL -eq 1 ]; then
IP_BAN_NOW=0
while read line; do
CURR_LINE_CONN=$(echo $line | cut -d" " -f1)
CURR_LINE_IP=$(echo $line | cut -d" " -f2)
if [ $CURR_LINE_CONN -lt $NO_OF_CONNECTIONS ]; then
break
fi
IGNORE_BAN=`grep -c $CURR_LINE_IP $IGNORE_IP_LIST`
if [ $IGNORE_BAN -ge 1 ]; then
continue
fi
IP_BAN_NOW=1
echo "$CURR_LINE_IP with $CURR_LINE_CONN SYN_RECV connections" >> $BANNED_IP_MAIL
echo $CURR_LINE_IP >> $BANNED_IP_LIST
echo $CURR_LINE_IP >> $IGNORE_IP_LIST
if [ $CSF_BAN -eq 1 ]; then
$CSF -d $CURR_LINE_IP
else
$IPT -I INPUT -s $CURR_LINE_IP -j DROP
fi
done < $BAD_IP_LIST
if [ $IP_BAN_NOW -eq 1 ]; then
dt=`date`
hn=`hostname`
if [ $EMAIL_TO != "" ]; then
cat $BANNED_IP_MAIL | mail -s "IP addresses banned on $dt $hn" $EMAIL_TO
fi
fi
fi
rm -f $TMP_PREFIX.*

Sure, there are lots of ways that can be improved, but you should try to figure out where the real bottleneck is. (It may well be iptables, in which case you might want to try to do all the table updates in a single invocation instead of one at a time. But I'm just guessing.)
Here are a few suggestions; I didn't read all the way through:
netstat -ntu | grep SYN_RECV | awk '{print $5}' | cut -d: -f1 |
sort | uniq -c | sort -nr > $BAD_IP_LIST
If you're only interested in connections in SYN_RECV state, why list udp? Anyway, you're using three utilities (grep, awk and cut) to do one simple line-oriented action. You might as well just do it all in one, for example awk:
awk '$6 == "SYN_RECV" {print substr($5, 1, index($5, ":") - 1)}'
In fact, you could do the uniquifying and counting in awk as well:
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]} END{for (i in ip) print ip[i], i}'
Edit: you could also filter by required count here:
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]}
END {for (i in ip) if (ip[i] >= '$NO_OF_CONNECTIONS') print ip[i], i}'
Now you only need to output the ip address, since you no longer need to filter in the bash script. I don't know if that's faster than piping through sort and uniq and sort again, but it might very well be.
while read line; do
CURR_LINE_CONN=$(echo $line | cut -d" " -f1)
CURR_LINE_IP=$(echo $line | cut -d" " -f2)
if [ $CURR_LINE_CONN -lt $NO_OF_CONNECTIONS ]; then
break
fi
You want to read two fields from stdin. Why don't you just do that:
while read CURR_LINE_CONN CURR_LINE_IP IGNORED &&
((CURR_LINE_CONN >= NO_OF_CONNECTIONS)); do
That saves two subshells and two cut invocations. (The IGNORED in the read built-in is just paranoia, since there will only be two fields output by awk. It's not good paranoia, though, because it silently ignores errors.)
Edit: as above, you could get rid of the test here, too. So it would just be:
netstat -nt |
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]}
END { for (i in ip)
if (ip[i] >= '$NO_OF_CONNECTIONS')
print ip[i], i}' | tee $BAD_IP_LIST
if ((KILL)); then
IP_BAN_NOW=0
while read IP IGNORED; do
Next:
IGNORE_BAN=`grep -c $CURR_LINE_IP $IGNORE_IP_LIST`
if [ $IGNORE_BAN -ge 1 ]; then
continue
fi
grep -c makes grep read the entire input file to get the count; you only want to know if the ip is present. You want grep -q:
if $(grep -q -F -x $CURR_LINE_IP $IGNORE_IP_LIST); then continue; fi
(-F tells grep to interpret the pattern as a string instead of a regex, which is what you want since otherwise . are wildcards. -x tells grep to match the entire line. It's possible for one ip to be a prefix or a suffix or even an infix of another one, which would lead to false matches. The combination of -F and -x might be a bit faster, too, since grep can then optimize the matching quite a bit.)
There's probably more. That's as far as I got.

Related

Lines in File 1 not present in File 2 based on a column

I have 2 files
File 1 - IN.txt
08:43:22 IN 0xabc
08:43:31 IN 0xdef
08:54:45 IN 0xghi
08:54:45 IN 0xjkl
File 2 - OUT.txt
08:43:32 OUT 0xdef
08:54:45 OUT 0xghi
08:54:45 OUT 0xjkl
Basically I am troubleshooting a network issue, IN.txt is packets coming in, OUT.txt is packets going out and column 3 is the packet code so it should match for the packet in the same transaction.
I want to know all IN packets that do not have a matching OUT packet.
Desired output:
08:43:22 IN 0xabc

#!/bin/bash
IN=$(awk -F " " '{print $3}' in.txt)
OUT=$(awk -F " " '{print $3}' out.txt)
for i in $IN
do
flag=false
for o in $OUT
do
if [[ "$i" == "$o" ]]; then
flag=true
break
fi
done
if [[ $flag == false ]]; then
echo "Cannot find packet: $i in out"
fi
done
Result:
dingrui#gdcni:~/onie$ ./filter.sh
Cannot find packet: 0xabc in out

you can use a for.
for i in $(cat IN.txt| awk '{print $3}'); do grep -i $i OUT.txt | wc -l; done
Or more readable:
for i in $(cat IN.txt| awk '{print $3}'); do result=$(grep -i $i OUT.txt | wc -l);echo $i "|" $result; done
OUTPUT:
0xabc | 0
0xdef | 1
0xghi | 1
0xjkl | 1
NOTE: Only matches the packets, I didn't look at the time which doesn't seem important since you want to check packets

You can use fgrep for this:
$ cut -d' ' -f3 < OUT.txt > OUT.txt2
$ fgrep -v IN.txt -f OUT.txt2
08:43:22 IN 0xabc

Getting number of newlines and storing each in a variable

I am making a script that will let you choose between which interface you want to use.
I need a way to get the interfaces and store each of them in a variable.
Here is my code, but it only gets the interfaces:
Interfaces=$(ifconfig | awk '{print $1}' | grep ':' | tr -d ':')

You need to only check the lines that contain the interface name, not the lines with details. In ifconfig, detail lines start with a space; in ip, interface lines start with a number.
In bash, you can use select to create a simple menu:
#! /bin/bash
select interface in $(ip link show | grep '^[0-9]' | cut -f2 -d:) ; do
if [[ $interface ]] ; then
echo You selected $interface
break
fi
done
or
select interface in $(ifconfig -a | grep -v '^ ' | cut -f1 -d' ') ; do
if [[ $interface ]] ; then
echo You selected $interface
break
fi
done

Multiple variables into one variable with wildcard

I have this script:
#!/bin/bash
ping_1=$(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_2=$(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_3=$(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_4=$(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' )
Then I would like to treat the outputs of ping_1-4 in one variable. Something like this:
#!/bin/bash
if [ "$ping_*" -gt 50 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi
Is there a possibility in bash to read these variables with some sort of wildcard?
$ping_*
Did nothing for me.

The answer to your stated problem is that yes, you can do this with parameter expansion in bash (but not in sh):
#!/bin/bash
ping_1=foo
ping_2=bar
ping_etc=baz
for var in "${!ping_#}"
do
echo "$var is set to ${!var}"
done
will print
ping_1 is set to foo
ping_2 is set to bar
ping_etc is set to baz
Here's man bash:
${!prefix*}
${!prefix#}
Names matching prefix. Expands to the names of variables whose
names begin with prefix, separated by the first character of the
IFS special variable. When # is used and the expansion appears
within double quotes, each variable name expands to a separate
word.
The answer to your actual problem is to use arrays instead.

I don't think there's such wildcard.
But you could use a loop to iterate over values, for example:
exists_too_high() {
for value; do
if [ "$value" -gt 50 ]; then
return 0
fi
done
return 1
}
if exists_too_high "$ping_1" "$ping_2" "$ping_3" "$ping_4"; then
echo "One ping is to high"
else
echo "The pings are fine"
fi

You can use "and" (-a) param:
if [ $ping_1 -gt 50 -a \
$ping_2 -gt 50 -a \
$ping_3 -gt 50 -a ]; then
...
...
Or instead of defining a lot of variables, you can make an array and check with a loop:
pings+=($(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' ))
too_high=0
for ping in ${pings[#]}; do
if [ $ping -gt 50 ]; then
too_high=1
break
fi
done
if [ $too_high -eq 1 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi

To complement the existing, helpful answers with an array-based solution that demonstrates:
several advanced Bash techniques (robust array handling, compound conditionals, handling the case where pinging fails)
an optimized way to extract the average timing from ping's output by way of a single sed command (works with both GNU and BSD/macOS sed).
reporting the servers that either took too long or failed to respond by name.
#!/usr/bin/env bash
# Determine the servers to ping as an array.
servers=( 'www.test.com' 'www.test1.com' 'www.test2.com' 'www.test3.com' )
# Initialize the array in which timings will be stored, paralleling the
# "${servers[#]}" array.
avgPingTimes=()
# Initialize the array that stores the names of the servers that either took
# too long to respond (on average), or couldn't pe pinged at all.
failingServers=()
# Determine the threshold above which a timing is considered too high, in ms.
# Note that a shell variable should contain at least 1 lowercase character.
kMAX_TIME=50
# Determine how many pings to send per server to calculate the average timing
# from.
kPINGS_PER_SERVER=1
for server in "${servers[#]}"; do
# Ping the server at hand, extracting the integer portion of the average
# timing.
# Note that if pinging fails, $avgPingTime will be empty.
avgPingTime="$(ping -c "$kPINGS_PER_SERVER" "$server" |
sed -En 's|^.* = [^/]+/([^.]+).+$|\1|p')"
# Check if the most recent ping failed or took too long and add
# the server to the failure array, if so.
[[ -z $avgPingTime || $avgPingTime -gt $kMAX_TIME ]] && failingServers+=( "$server" )
# Add the timing to the output array.
avgPingTimes+=( "$avgPingTime" )
done
if [[ -n $failingServers ]]; then # pinging at least 1 server took too long or failed
echo "${#failingServers[#]} of the ${#servers[#]} servers took too long or couldn't be pinged:"
printf '%s\n' "${failingServers[#]}"
else
echo "All ${#servers[#]} servers responded to pings in a timely fashion."
fi

Yes bash can list variables that begin with $ping_, by using its internal compgen -v command, (see man bash under SHELL BUILTIN COMMANDS), i.e.:
for f in `compgen -v ping_` foo ; do
eval p=\$$f
if [ "$p" -gt 50 ]; then
echo "One ping is too high"
break 1
fi
[ $f=foo ] && echo "The pings are fine"
done
Note the added loop item foo -- if the loop gets through all the variables, then print "the pings are fine".

Optimizing Bash script, subshell removal

I have a bash script that lists the amount of ip addresses connected on a port. My issue is, is that with large amounts of connections it is slow as poo. I think it is because of the subshells in use, but I am having trouble removing them without borking the rest of the script. Here is the script in its entirety as it is fairly short:
#!/bin/bash
portnumber=80
reversedns_enabled=0
[ ! -z "${1}" ] && portnumber=${1}
[ ! -z "${2}" ] && reversedns_enabled=${2}
#this will hold all of our ip addresses extracted from netstat
ipaddresses=""
#get all of our connected ip addresses
while read line; do
ipaddress=$( echo ${line} | cut -d' ' -f5 | sed s/:[^:]*$// )
ipaddresses="${ipaddresses}${ipaddress}\n"
done < <( netstat -ano | grep -v unix | grep ESTABLISHED | grep \:${portnumber} )
#remove trailing newline
ipaddresses=${ipaddresses%%??}
#output of program
finaloutput=""
#get our ip addresses sorted, uniq counted, and reverse sorted based on amount of uniq
while read line; do
if [[ ${reversedns_enabled} -eq 1 ]]; then
reversednsname=""
#we use justipaddress to do our nslookup(remove the count of uniq)
justipaddress=$( echo ${line} | cut -d' ' -f2 )
reversednsstring=$( host ${justipaddress} )
if echo "${reversednsstring}" | grep -q "domain name pointer"; then
reversednsname=$( echo ${reversednsstring} | grep -o "pointer .*" | cut -d' ' -f2 )
else
reversednsname="reverse-dns-not-found"
fi
finaloutput="${finaloutput}${line} ${reversednsname}\n"
else
finaloutput="${finaloutput}${line}\n"
fi
done < <( echo -e ${ipaddresses} | uniq -c | sort -r )
#tabulate that sheet son
echo -e ${finaloutput} | column -t
The majority of the time spent is doing this operation: echo ${line} | cut -d' ' -f5 | sed s/:[^:]*$// what is the best way to inline this to produce a faster script. It takes well over a second with 1000 concurrent users (which is my base target, although should be able to process more without using up all of my cpu).

You could reduce that with cut -d' ' <<< "$line" | sed .... You could write a more complex sed script and avoid the use of cut.
But the real benefit would be in avoiding the loop so there's only one sed (or awk or perl or …) script involved. I'd probably look to reduce it to ipaddresses=$(netstat -ano | awk '...') so that instead of 3 grep processes, plus one cut and sed per line, there was just a single awk process.
ipaddresses=$(netstat -ano |
awk " /unix/ { next } # grep -v unix
!/ESTABLISHED/ { next } # grep ESTABLISHED
!/:${portnumber}/ { next } # grep :${portnum} "'
{ sub(/:[^:]*$/, "", $5); print $5; }'
)
That's probably rather clumsy, but it is a fairly direct transliteration of the existing code. Watch for the quotes to get ${portnumber} into the regex.
Since you feed the list of IP addresses into uniq -c and sort -r. You probably should use sort -rn, and you could use awk to do the uniq -c, too.
The only bit that you can't readily improve is host; that seems to only take one host or IP address argument at a time, so you have to run it for each name or address.

I'll take a stab at a couple of issues:
The following line from the script which performs incremental string concatenation will not be be efficient without the means to allocate a reasonable buffer:
ipaddresses="${ipaddresses}${ipaddress}\n"
For another, using a while loop with read line when a pipeline will do is significantly worse than the pipeline. Try something like this instead of the first loop:
netstat -ano |
grep -v 'unix' |
grep 'ESTABLISHED' |
grep "\:${portnumber}" |
cut -d' ' -f5 |
sed 's/:[^:]*$//' |
while read line; do ...
Also, try combining at least two of the three sequential grep commands into one invocation of grep.
If nothing else, this will mean you are no longer spawning a pipeline which creates new cut and sed processes for each line of input processed in the first loop.

Here is a whole script optimized & refactored:
#!/bin/bash
portnumber=80
reversedns_enabled=0
[[ $1 ]] && portnumber=$1
[[ $2 ]] && reversedns_enabled=$2
#this will hold all of our ip addresses extracted from netstat
ipaddresses=''
#get all of our connected ip addresses
while IFS=' :' read -r type _ _ _ _ ipaddress port state _; do
if [[ $type != 'unix' && $port == "$portnumber" && $state == 'ESTABLISHED' ]]; then
ipaddresses+="$ipaddress\n"
fi
done < <(netstat -ano)
#remove trailing newline
ipaddresses=${ipaddresses%%??}
#output of program
finalOutput=""
#get our ip addresses sorted, uniq counted, and reverse sorted based on amount of uniq
while read -r line; do
if (( reversedns_enabled == 1 )); then
reverseDnsName=""
#we use justipaddress to do our nslookup(remove the count of uniq)
read -r _ justipaddress _ <<< "$line"
reverseDnsString=$(host "$justipaddress")
if [[ $reverseDnsString == *'domain name pointer'* ]]; then
reverseDnsName=${reverseDnsName##*domain name pointer }
else
reverseDnsName="reverse-dns-not-found"
fi
finalOutput+="$line $reverseDnsName\n"
else
finalOutput+="$line\n"
fi
done < <(echo -e "$ipaddresses" | sort -ur)
#tabulate that sheet son
echo -e "$finalOutput" | column -t
As you can see, there are almost no external tools used (no sed, awk or grep). Awesome!

extract average time from ping -c

I want to extract from the command ping -c 4 www.stackoverflow.com | tail -1| awk '{print $4}'
the average time.
107.921/108.929/110.394/0.905 ms
Output should be: 108.929

One way is to just add a cut to what you have there.
ping -c 4 www.stackoverflow.com | tail -1| awk '{print $4}' | cut -d '/' -f 2

ping -c 4 www.stackoverflow.com | tail -1| awk -F '/' '{print $5}' would work fine.
"-F" option is used to specify the field separator.

This might work for you:
ping -c 4 www.stackoverflow.com | sed '$!d;s|.*/\([0-9.]*\)/.*|\1|'

The following solution uses Bash only (requires Bash 3):
[[ $(ping -q -c 4 www.example.com) =~ \ =\ [^/]*/([0-9]+\.[0-9]+).*ms ]] \
&& echo ${BASH_REMATCH[1]}
For the regular expression it's easier to read (and handle) if it is stored in a variable:
regex='= [^/]*/([0-9]+\.[0-9]+).*ms'
[[ $(ping -q -c 4 www.example.com) =~ $regex ]] && echo ${BASH_REMATCH[1]}

Promoting luissquall's very elegent comment to an answer:
ping -c 4 www.stackoverflow.com | awk -F '/' 'END {print $5}'

Direct extract mean time from ping command:
ping -w 4 -q www.duckduckgo.com | cut -d "/" -s -f5
Options:
-w time out 4 seconds
-q quite mode
-d delimiter
-s skip line without delimiter
-f No. of field - depends on your system - sometimes 5th, sometimes 4th
I personly use is this way:
if [ $(ping -w 2 -q www.duckduckgo.com | cut -d "/" -s -f4 | cut -d "." -f1) -lt 20 ]; then
echo "good response time"
else
echo "bad response time"
fi

Use these to get current ping as a single number:
123.456:
ping -w1 -c1 8.8.8.8 | tail -1| cut -d '=' -f 2 | cut -d '/' -f 2
123:
ping -w1 -c1 8.8.8.8 | tail -1| cut -d '=' -f 2 | cut -d '/' -f 2 | cut -d '.' -f 1
Note that this displays the average of only 1 ping (-c1), you can increase the sample size by increasing this number (i.e. -c1337)
This avoids using awk (like #Buggabill posted), which doesn't play nice in bash aliases + takes a nanosecond longer

None of these worked well for me due to various issues such as when a timeout occurs. I only wanted to see bad ping times or timeouts and wanted PING to continue quickly, and none of these solutions worked. Here's my BASH script that works well to do both. Note that in the ping command, response time is limited to 1 second.
I realize this does not directly answer the OP's question, however it does provide a good way to deal with some issues that occur with some of the incomplete "solutions" provided here, thus going beyond the scope of the OPs question, which others coming here are looking for (I cite myself as an example), so I decided to share for those people, not specifically OP's question.
while true
do
###Set your IP amd max milliseconds###
ip="192.168.1.53"
maxms=50
###do not edit below###
err="100% packet loss"
out="$(ping -c 1 -i 1 -w 1 $ip)"
t="$(echo $out | awk -F '/' 'END {print $5}')"
t=${t%.*}
re='^[0-9]+$'
if ! [[ $t =~ $re ]] ; then
if [[ $out == *"$err"* ]] ; then
echo "`date` | ${ip}: TIMEOUT"
else
echo "error: Not a number: ${t} was found in: ${out}"
fi
else
if [ "$t" -gt $maxms ]; then
echo "`date` | ${ip}: ${t} ms"
fi
fi
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Writing bash code for performance standards - linux

Related

Lines in File 1 not present in File 2 based on a column

Getting number of newlines and storing each in a variable

Multiple variables into one variable with wildcard

Optimizing Bash script, subshell removal

extract average time from ping -c

Categories

Resources