Optimizing Bash script, subshell removal

Optimizing Bash script, subshell removal - linux

I have a bash script that lists the amount of ip addresses connected on a port. My issue is, is that with large amounts of connections it is slow as poo. I think it is because of the subshells in use, but I am having trouble removing them without borking the rest of the script. Here is the script in its entirety as it is fairly short:
#!/bin/bash
portnumber=80
reversedns_enabled=0
[ ! -z "${1}" ] && portnumber=${1}
[ ! -z "${2}" ] && reversedns_enabled=${2}
#this will hold all of our ip addresses extracted from netstat
ipaddresses=""
#get all of our connected ip addresses
while read line; do
ipaddress=$( echo ${line} | cut -d' ' -f5 | sed s/:[^:]*$// )
ipaddresses="${ipaddresses}${ipaddress}\n"
done < <( netstat -ano | grep -v unix | grep ESTABLISHED | grep \:${portnumber} )
#remove trailing newline
ipaddresses=${ipaddresses%%??}
#output of program
finaloutput=""
#get our ip addresses sorted, uniq counted, and reverse sorted based on amount of uniq
while read line; do
if [[ ${reversedns_enabled} -eq 1 ]]; then
reversednsname=""
#we use justipaddress to do our nslookup(remove the count of uniq)
justipaddress=$( echo ${line} | cut -d' ' -f2 )
reversednsstring=$( host ${justipaddress} )
if echo "${reversednsstring}" | grep -q "domain name pointer"; then
reversednsname=$( echo ${reversednsstring} | grep -o "pointer .*" | cut -d' ' -f2 )
else
reversednsname="reverse-dns-not-found"
fi
finaloutput="${finaloutput}${line} ${reversednsname}\n"
else
finaloutput="${finaloutput}${line}\n"
fi
done < <( echo -e ${ipaddresses} | uniq -c | sort -r )
#tabulate that sheet son
echo -e ${finaloutput} | column -t
The majority of the time spent is doing this operation: echo ${line} | cut -d' ' -f5 | sed s/:[^:]*$// what is the best way to inline this to produce a faster script. It takes well over a second with 1000 concurrent users (which is my base target, although should be able to process more without using up all of my cpu).

You could reduce that with cut -d' ' <<< "$line" | sed .... You could write a more complex sed script and avoid the use of cut.
But the real benefit would be in avoiding the loop so there's only one sed (or awk or perl or …) script involved. I'd probably look to reduce it to ipaddresses=$(netstat -ano | awk '...') so that instead of 3 grep processes, plus one cut and sed per line, there was just a single awk process.
ipaddresses=$(netstat -ano |
awk " /unix/ { next } # grep -v unix
!/ESTABLISHED/ { next } # grep ESTABLISHED
!/:${portnumber}/ { next } # grep :${portnum} "'
{ sub(/:[^:]*$/, "", $5); print $5; }'
)
That's probably rather clumsy, but it is a fairly direct transliteration of the existing code. Watch for the quotes to get ${portnumber} into the regex.
Since you feed the list of IP addresses into uniq -c and sort -r. You probably should use sort -rn, and you could use awk to do the uniq -c, too.
The only bit that you can't readily improve is host; that seems to only take one host or IP address argument at a time, so you have to run it for each name or address.

I'll take a stab at a couple of issues:
The following line from the script which performs incremental string concatenation will not be be efficient without the means to allocate a reasonable buffer:
ipaddresses="${ipaddresses}${ipaddress}\n"
For another, using a while loop with read line when a pipeline will do is significantly worse than the pipeline. Try something like this instead of the first loop:
netstat -ano |
grep -v 'unix' |
grep 'ESTABLISHED' |
grep "\:${portnumber}" |
cut -d' ' -f5 |
sed 's/:[^:]*$//' |
while read line; do ...
Also, try combining at least two of the three sequential grep commands into one invocation of grep.
If nothing else, this will mean you are no longer spawning a pipeline which creates new cut and sed processes for each line of input processed in the first loop.

Here is a whole script optimized & refactored:
#!/bin/bash
portnumber=80
reversedns_enabled=0
[[ $1 ]] && portnumber=$1
[[ $2 ]] && reversedns_enabled=$2
#this will hold all of our ip addresses extracted from netstat
ipaddresses=''
#get all of our connected ip addresses
while IFS=' :' read -r type _ _ _ _ ipaddress port state _; do
if [[ $type != 'unix' && $port == "$portnumber" && $state == 'ESTABLISHED' ]]; then
ipaddresses+="$ipaddress\n"
fi
done < <(netstat -ano)
#remove trailing newline
ipaddresses=${ipaddresses%%??}
#output of program
finalOutput=""
#get our ip addresses sorted, uniq counted, and reverse sorted based on amount of uniq
while read -r line; do
if (( reversedns_enabled == 1 )); then
reverseDnsName=""
#we use justipaddress to do our nslookup(remove the count of uniq)
read -r _ justipaddress _ <<< "$line"
reverseDnsString=$(host "$justipaddress")
if [[ $reverseDnsString == *'domain name pointer'* ]]; then
reverseDnsName=${reverseDnsName##*domain name pointer }
else
reverseDnsName="reverse-dns-not-found"
fi
finalOutput+="$line $reverseDnsName\n"
else
finalOutput+="$line\n"
fi
done < <(echo -e "$ipaddresses" | sort -ur)
#tabulate that sheet son
echo -e "$finalOutput" | column -t
As you can see, there are almost no external tools used (no sed, awk or grep). Awesome!

Related

Unix Script loop through individual variables in a list and execute code

I have been busting my head all day long without coming up with a sucessfull solution.
Setup:
We have Linux RHEL 8.3 and a file, script.sh
There is an enviroment variable set by an application with a dynamic string in it.
export PROGARM_VAR="abc10,def20,ghi30"
The delimiter is always "," and the values inside vary from 1 to 20.
Inside the script I have defined 20 variables which take the values
using "cut" command I take each value and assign it to a variable
var1=$(echo $PROGARM_VAR | cut -f1 -d,)
var2=$(echo $PROGARM_VAR | cut -f2 -d,)
var3=$(echo $PROGARM_VAR | cut -f3 -d,)
var4=$(echo $PROGARM_VAR | cut -f4 -d,)
etc
In our case we will have:
var1="abc10" var2="def20" var3="ghi30" and var4="" which is empty
The loop must take each variable, test if its not empty and execute 10 pages of code using the tested variable. When it reaches an empty variable it should break.
Could you give me a hand please?
Thank you

Just split it with a comma. There are endless possibilities. You could:
10_pages_of_code() { echo "$1"; }
IFS=, read -a -r vars <<<"abc10,def20,ghi30"
for i in "${vars[#]}"; do 10_pages_of_code "$i"; done
or:
printf "%s" "abc10,def20,ghi30" | xargs -n1 -d, bash -c 'echo 10_pages_of_code "$1"' _
A safer code could use readarray instead of read to properly handle newlines in values, but I doubt that matters for you:
IFS= readarray -d , -t vars < <(printf "%s" "abc10,def20,ghi30")
You could also read in a stream up:
while IFS= read -r -d, var || [[ -n "$var" ]]; do
10_pages_of_code "$var"
done < <(printf "%s" "abc10,def20,ghi30")
But still you could do it with cut... just actually write a loop and use an iterator.
i=0
while var=$(printf "%s\n" "$PROGARM_VAR" | cut -f"$i" -d,) && [[ -n "$var" ]]; do
10_pages_of_code "$var"
((i++))
done

or
echo "$PROGRAM_VAR" | tr , \\n | while read var; do
: something with $var
done

Linux usernames /etc/passwd listing

I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'

Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'

All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.

The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.

While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)

If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'

Getting number of newlines and storing each in a variable

I am making a script that will let you choose between which interface you want to use.
I need a way to get the interfaces and store each of them in a variable.
Here is my code, but it only gets the interfaces:
Interfaces=$(ifconfig | awk '{print $1}' | grep ':' | tr -d ':')

You need to only check the lines that contain the interface name, not the lines with details. In ifconfig, detail lines start with a space; in ip, interface lines start with a number.
In bash, you can use select to create a simple menu:
#! /bin/bash
select interface in $(ip link show | grep '^[0-9]' | cut -f2 -d:) ; do
if [[ $interface ]] ; then
echo You selected $interface
break
fi
done
or
select interface in $(ifconfig -a | grep -v '^ ' | cut -f1 -d' ') ; do
if [[ $interface ]] ; then
echo You selected $interface
break
fi
done

Multiple variables into one variable with wildcard

I have this script:
#!/bin/bash
ping_1=$(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_2=$(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_3=$(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_4=$(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' )
Then I would like to treat the outputs of ping_1-4 in one variable. Something like this:
#!/bin/bash
if [ "$ping_*" -gt 50 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi
Is there a possibility in bash to read these variables with some sort of wildcard?
$ping_*
Did nothing for me.

The answer to your stated problem is that yes, you can do this with parameter expansion in bash (but not in sh):
#!/bin/bash
ping_1=foo
ping_2=bar
ping_etc=baz
for var in "${!ping_#}"
do
echo "$var is set to ${!var}"
done
will print
ping_1 is set to foo
ping_2 is set to bar
ping_etc is set to baz
Here's man bash:
${!prefix*}
${!prefix#}
Names matching prefix. Expands to the names of variables whose
names begin with prefix, separated by the first character of the
IFS special variable. When # is used and the expansion appears
within double quotes, each variable name expands to a separate
word.
The answer to your actual problem is to use arrays instead.

I don't think there's such wildcard.
But you could use a loop to iterate over values, for example:
exists_too_high() {
for value; do
if [ "$value" -gt 50 ]; then
return 0
fi
done
return 1
}
if exists_too_high "$ping_1" "$ping_2" "$ping_3" "$ping_4"; then
echo "One ping is to high"
else
echo "The pings are fine"
fi

You can use "and" (-a) param:
if [ $ping_1 -gt 50 -a \
$ping_2 -gt 50 -a \
$ping_3 -gt 50 -a ]; then
...
...
Or instead of defining a lot of variables, you can make an array and check with a loop:
pings+=($(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' ))
too_high=0
for ping in ${pings[#]}; do
if [ $ping -gt 50 ]; then
too_high=1
break
fi
done
if [ $too_high -eq 1 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi

To complement the existing, helpful answers with an array-based solution that demonstrates:
several advanced Bash techniques (robust array handling, compound conditionals, handling the case where pinging fails)
an optimized way to extract the average timing from ping's output by way of a single sed command (works with both GNU and BSD/macOS sed).
reporting the servers that either took too long or failed to respond by name.
#!/usr/bin/env bash
# Determine the servers to ping as an array.
servers=( 'www.test.com' 'www.test1.com' 'www.test2.com' 'www.test3.com' )
# Initialize the array in which timings will be stored, paralleling the
# "${servers[#]}" array.
avgPingTimes=()
# Initialize the array that stores the names of the servers that either took
# too long to respond (on average), or couldn't pe pinged at all.
failingServers=()
# Determine the threshold above which a timing is considered too high, in ms.
# Note that a shell variable should contain at least 1 lowercase character.
kMAX_TIME=50
# Determine how many pings to send per server to calculate the average timing
# from.
kPINGS_PER_SERVER=1
for server in "${servers[#]}"; do
# Ping the server at hand, extracting the integer portion of the average
# timing.
# Note that if pinging fails, $avgPingTime will be empty.
avgPingTime="$(ping -c "$kPINGS_PER_SERVER" "$server" |
sed -En 's|^.* = [^/]+/([^.]+).+$|\1|p')"
# Check if the most recent ping failed or took too long and add
# the server to the failure array, if so.
[[ -z $avgPingTime || $avgPingTime -gt $kMAX_TIME ]] && failingServers+=( "$server" )
# Add the timing to the output array.
avgPingTimes+=( "$avgPingTime" )
done
if [[ -n $failingServers ]]; then # pinging at least 1 server took too long or failed
echo "${#failingServers[#]} of the ${#servers[#]} servers took too long or couldn't be pinged:"
printf '%s\n' "${failingServers[#]}"
else
echo "All ${#servers[#]} servers responded to pings in a timely fashion."
fi

Yes bash can list variables that begin with $ping_, by using its internal compgen -v command, (see man bash under SHELL BUILTIN COMMANDS), i.e.:
for f in `compgen -v ping_` foo ; do
eval p=\$$f
if [ "$p" -gt 50 ]; then
echo "One ping is too high"
break 1
fi
[ $f=foo ] && echo "The pings are fine"
done
Note the added loop item foo -- if the loop gets through all the variables, then print "the pings are fine".

Writing bash code for performance standards

Is there a better way to rewrite this code to get enhanced performance?
If you were to get a bunch of IPs the system seems to hang.
TMP_PREFIX='/tmp/synd'
TMP_FILE="mktemp $TMP_PREFIX.XXXXXXXX"
BANNED_IP_MAIL=`$TMP_FILE`
BANNED_IP_LIST=`$TMP_FILE`
echo "Banned the following ip addresses on `date`" > $BANNED_IP_MAIL
echo >> $BANNED_IP_MAIL
BAD_IP_LIST=`$TMP_FILE`
netstat -ntu | grep SYN_RECV | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr > $BAD_IP_LIST
cat $BAD_IP_LIST
if [ $KILL -eq 1 ]; then
IP_BAN_NOW=0
while read line; do
CURR_LINE_CONN=$(echo $line | cut -d" " -f1)
CURR_LINE_IP=$(echo $line | cut -d" " -f2)
if [ $CURR_LINE_CONN -lt $NO_OF_CONNECTIONS ]; then
break
fi
IGNORE_BAN=`grep -c $CURR_LINE_IP $IGNORE_IP_LIST`
if [ $IGNORE_BAN -ge 1 ]; then
continue
fi
IP_BAN_NOW=1
echo "$CURR_LINE_IP with $CURR_LINE_CONN SYN_RECV connections" >> $BANNED_IP_MAIL
echo $CURR_LINE_IP >> $BANNED_IP_LIST
echo $CURR_LINE_IP >> $IGNORE_IP_LIST
if [ $CSF_BAN -eq 1 ]; then
$CSF -d $CURR_LINE_IP
else
$IPT -I INPUT -s $CURR_LINE_IP -j DROP
fi
done < $BAD_IP_LIST
if [ $IP_BAN_NOW -eq 1 ]; then
dt=`date`
hn=`hostname`
if [ $EMAIL_TO != "" ]; then
cat $BANNED_IP_MAIL | mail -s "IP addresses banned on $dt $hn" $EMAIL_TO
fi
fi
fi
rm -f $TMP_PREFIX.*

Sure, there are lots of ways that can be improved, but you should try to figure out where the real bottleneck is. (It may well be iptables, in which case you might want to try to do all the table updates in a single invocation instead of one at a time. But I'm just guessing.)
Here are a few suggestions; I didn't read all the way through:
netstat -ntu | grep SYN_RECV | awk '{print $5}' | cut -d: -f1 |
sort | uniq -c | sort -nr > $BAD_IP_LIST
If you're only interested in connections in SYN_RECV state, why list udp? Anyway, you're using three utilities (grep, awk and cut) to do one simple line-oriented action. You might as well just do it all in one, for example awk:
awk '$6 == "SYN_RECV" {print substr($5, 1, index($5, ":") - 1)}'
In fact, you could do the uniquifying and counting in awk as well:
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]} END{for (i in ip) print ip[i], i}'
Edit: you could also filter by required count here:
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]}
END {for (i in ip) if (ip[i] >= '$NO_OF_CONNECTIONS') print ip[i], i}'
Now you only need to output the ip address, since you no longer need to filter in the bash script. I don't know if that's faster than piping through sort and uniq and sort again, but it might very well be.
while read line; do
CURR_LINE_CONN=$(echo $line | cut -d" " -f1)
CURR_LINE_IP=$(echo $line | cut -d" " -f2)
if [ $CURR_LINE_CONN -lt $NO_OF_CONNECTIONS ]; then
break
fi
You want to read two fields from stdin. Why don't you just do that:
while read CURR_LINE_CONN CURR_LINE_IP IGNORED &&
((CURR_LINE_CONN >= NO_OF_CONNECTIONS)); do
That saves two subshells and two cut invocations. (The IGNORED in the read built-in is just paranoia, since there will only be two fields output by awk. It's not good paranoia, though, because it silently ignores errors.)
Edit: as above, you could get rid of the test here, too. So it would just be:
netstat -nt |
awk '$6 == "SYN_RECV" {++ip[substr($5, 1, index($5, ":") - 1)]}
END { for (i in ip)
if (ip[i] >= '$NO_OF_CONNECTIONS')
print ip[i], i}' | tee $BAD_IP_LIST
if ((KILL)); then
IP_BAN_NOW=0
while read IP IGNORED; do
Next:
IGNORE_BAN=`grep -c $CURR_LINE_IP $IGNORE_IP_LIST`
if [ $IGNORE_BAN -ge 1 ]; then
continue
fi
grep -c makes grep read the entire input file to get the count; you only want to know if the ip is present. You want grep -q:
if $(grep -q -F -x $CURR_LINE_IP $IGNORE_IP_LIST); then continue; fi
(-F tells grep to interpret the pattern as a string instead of a regex, which is what you want since otherwise . are wildcards. -x tells grep to match the entire line. It's possible for one ip to be a prefix or a suffix or even an infix of another one, which would lead to false matches. The combination of -F and -x might be a bit faster, too, since grep can then optimize the matching quite a bit.)
There's probably more. That's as far as I got.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Optimizing Bash script, subshell removal - linux

Related

Unix Script loop through individual variables in a list and execute code

Linux usernames /etc/passwd listing

Getting number of newlines and storing each in a variable

Multiple variables into one variable with wildcard

Writing bash code for performance standards

Categories

Resources