speed up bash script with multithreading?

speed up bash script with multithreading? - multithreading

I've got a bash script that i put together to merge multiple packet captures based on a common filter. I'm running daemonlogger on the back end and it rolls pcap files based on size so its difficult to get the whole picture sometimes as the data i'm looking for may be in one pcap file and the rest in another.. The biggest gripe i have is the inability to speedup this process. It can only process one pcap at a time. Does anyone have any recommendations on how to speed this up with multiple subprocesses or multiple threads?
#!/bin/bash
echo '[+] example tcp dump filters:'
echo '[+] host 1.1.1.1'
echo '[+] host 1.1.1.1 dst port 80'
echo '[+] host 1.1.1.1 and host 2.2.2.2 and dst port 80'
echo 'tcpdump filter:'
read FILTER
cd /var/mycaps/
DATESTAMP=$(date +"%m-%d-%Y-%H:%M")
# make a specific folder to drop the filtered pcaps in
mkdir /var/mycaps/temp/$DATESTAMP
# iterate over all pcaps and check for an instance of your filter
for file in $(ls *.pcap); do
tcpdump -nn -A -w temp/$DATESTAMP/$file -r $file $FILTER
# remove empty pcaps that dont match
if [ "`ls -l temp/$DATESTAMP/$file | awk '{print $5}'`" = "24" ]; then
rm -f "temp/$DATESTAMP/$file"
fi
done
echo '[+] Merging pcaps'
# cd to your pcap directory
cd /var/mycaps/temp/${DATESTAMP}
# merge all of the pcaps into one file and remove the seperated files
mergecap *.pcap -w merged.pcap
rm -f original.*
echo "[+] Done. your files are in $(pwd)"

Run the body of the loop in the background, then wait for all the background jobs to complete before continuing.
max_jobs=10 # For example
job_count=0
for file in *.pcap; do # Don't iterate over the output of ls
(tcpdump -nn -A -w temp/"$DATESTAMP"/"$file" -r "$file" $FILTER
# remove empty pcaps that don't match. Use stat to get the file size
if [ "$(stat -c "%s")" = 24 ]; then
rm -f "temp/$DATESTAMP/$file"
fi
) &
job_count=$((job_count+1))
if [ "$job_count" -gt "$max_jobs" ]; then
wait
job_count=0
fi
done
wait

Related

how to filter out / ignore specific lines when comparing text files with diff

To further clarify what I am trying to do, I wrote the script below. I am attempting to audit some files between my QA and PRD environments and would like the final Diff output to Ignore hard coded values such as sql connections. I have about 6 different values to filer. I have tried several ways thus far I am not able to get any of them to work as needed. I am open to doing this another way if anyone has any ideas. I am pretty new to script development so Im open to any ideas or information. Thanks :)
#!/bin/bash
#*********************************************************************
#
# Name: compareMD5.sh
# Date: 02/12/2018
# Script Location:
# Author: Maggie o
#
# Description: This script will pull absolute paths from a text file
# and compare the files via ssh between QA & PRD on md5sum
# output match or no match
# Then the file the non matching files will be imported to a
# tmp directory via scp
# Files will be compared locally and exclude whitespace,
# spaces, comments, and hard coded values
# NOTE: Script may take a several minutes to run
#
# Usage: Auditing QA to PRD Pass 3
# nohup ./compareMD52.sh > /output/compareMD52.out 2> /error/compareMD52.err
# checking run ps -ef | grep compareMD52*
#**********************************************************************
rm /output/no_matchMD5.txt
rm /output/filesDiffer.txt
echo "Filename | Path" > /output/matchingMD5.txt
#Remove everything below tmp directory recursivly as it was created by previous script run
rm -rf /tmp/*
for i in $(cat /input/comp_list.txt) #list of files with absolute paths output by compare script
do
export filename=$(basename "$i") #Grab just the filename
export path=$(dirname "$i") #Just the Directory
qa_md5sum=$(md5sum "$i") #Get the md5sum
qa_md5="${qa_md5sum%% *}" #remove the appended path
export tmpdir=(/tmp"$path")
# if the stat is not null then run the if, if file is exisiting
if ssh oracle#Someconnection stat $path'$filename' \> /dev/null 2\>\&1
then
prd_md5sum=$(ssh oracle#Somelocation "cd $path; find -name '$filename' -
exec md5sum {} \;")
prd_md5="${prd_md5sum%% *}" #remove the appended path
if [[ $qa_md5 == $prd_md5 ]] #Match hash as integer
then
echo $filename $path " QA Matches PRD">> /output/matchingMD5.txt
else
echo $i
echo $tmpdir
echo "Copying "$i" to "$tmpdir >> /output/no_matchMD5.txt
#Copy the file from PRD to a tmp Dir in QA, keep dir structure to avoid issues of same filename exisiting in diffrent directorys
mkdir -p $tmpdir # -p creates only if not exisiting, does not produce errors if exisiting
scp oracle#Somelocation:$i $tmpdir # get the file from Prd, Insert into tmp Directory
fi
fi
done
for x in $(cat /output/no_matchMD5.txt) #do a local comapare using diff
do
comp_filename=$(basename "$x")
#Ignore Comments, no white space, no blank lines, and only report if different but not How different
qa=(/tmp"$x")
#IN TEST
if diff -bBq -I '^#' $x $qa >/dev/null
# Fails to catch files if the Comment then the start of a line
then
echo $comp_filename " differs more then just white space, or
comment"
echo $x >> /output/filesDiffer.txt
fi
done

You can pipe the output into grep -v
Like this:
diff -bBq TEST.sh TEST2.sh | grep -v "^#"

I was able to get this figured out using this method
if diff -bBqZ -I '^#' <(grep -vE '(thing1|thing2|thing3)' $x) <(grep -vE '(thing1|thing2|thing3)' $prdfile)

bash - wget -N if else value check

I'm working on a bash script that pulls a file from an FTP site only if the timestamp on remote is different than local. After it puts the file, it copies the file over to 3 other computers via samba (smbclient).
Everything works, but the file copies even if the wget -N ftp://insertsitehere.com returns a value that the file on the remote was not newer. What would be the best way to check the output of the script so that the copy only happens if a new version was pulled from FTP?
Ideally, I'd like the copy to the computers to preserve the timestamp just like the wget -N command does, too.
Here is an example of what I have:
#!/bin/bash
OUTDIR=/cats/dogs
cd $OUTDIR
wget -N ftp://user:password#sitegoeshere.com/filename
if [ $? -eq 0 ]; then
HOSTS="server1 server2 server3"
for i in $HOSTS; do
echo "Uploading to $i..."
smbclient -A /root/.smbclient.authfile //$i/path -c "lcd /cats/dogs; put fiilename.txt"
if [ $? -eq 0 ]; then
echo "Upload to $i successful..."
else
echo "There was an issue uploading to host $i..."
fi
done
else
echo "There was an issue with the FTP Download...."
exit 1
fi

The return value of wget is different than 0 only if there is an error. If -N is in use and the remote file is older than the local file, it will still have a return value of 0, so you cannot use that to check if the file has been modified.
You could check the mtime of the file to see if it changed, or the content. For example, you could use something like:
md5_old=$( md5sum filename.txt 2>/dev/null )
wget -N ftp://user:password#sitegoeshere.com/filename.txt
md5_new=$( md5sum filename.txt )
if [ "$md5_old" != "$md5_new" ]; then
# Copy filename.txt to SMB servers
fi
Regarding smbclient, unfortunately there is no way to preserve timestamps in either get or put commands. If you need it, you must use some different tool (scp -p, rsync -t...)

touch -r foo.txt foo.old
wget -N example.com/foo.txt
if [ foo.txt -nt foo.old ]
then
echo 'Uploading to server1...'
fi
"Save" the current timestamp into a new empty file
Use wget --timestamping to only download the file if it is newer
If file is newer than the "save" file, do stuff

scp: how to find out that copying was finished

I'm using scp command to copy file from one Linux host to another.
I run scp commend on host1 and copy file from host1 to host2. File is quite big and it takes for some time to copy it.
On host2 file appears immediately as soon as copying was started. I can do everything with this file even if copying is still in progress.
Is there any reliable way to find out if copying was finished or not on host2?

Off the top of my head, you could do something like:
touch tinyfile
scp bigfile tinyfile user#host:
Then when tinyfile appears you know that the transfer of bigfile is complete.
As pointed out in the comments, this assumes that scp will copy the files one by one, in the order specified. If you don't trust it, you could do them one by one explicitly:
scp bigfile user#host:
scp tinyfile user#host:
The disadvantage of this approach is that you would potentially have to authenticate twice. If this were an issue you could use something like ssh-agent.

On sending side (host1) use script like this:
#!/bin/bash
echo 'starting transfer'
scp FILE USER#DST_SERVER:DST_PATH
OUT=$?
if [ $OUT = 0 ]; then
echo 'transfer successful'
touch successful
scp successful USER#DST_SERVER:DST_PATH
else
echo 'transfer faild'
fi
On receiving side (host2) make script like this:
#!/bin/bash
SLEEP_TIME=30
MAX_CNT=10
CNT=0
while [[ ! -e successful && $CNT < $MAX_CNT ]]; do
((CNT++))
sleep($SLEEP_TIME);
done;
if [[ -e successful ]]; then
echo 'successful'
rm successful
# do somethning with FILE
fi
With CNT and MAX_CNT you disable endless loop (in case file successful isn't transferred).
Product MAX_CNT and SLEEP_TIME should be equal or greater expected transfer time. In my example expected transfer time is less than 300 seconds.

A checksum (md5sum, sha256sum ,sha512sum) of the local and remote files would tell you if they're identical.
For the situation where you don't have SSH access to the remote system - like an FTP server - you can download the file after it's uploaded and compare the checksums. I do this for files I send from production scripts at work. Below is a snippet from the script in which I do this.
MD5SRC=$(md5sum $LOCALFILE | cut -c 1-32)
MD5TESTFILE=$(mktemp -p /ramdisk)
curl \
-o $MD5TESTFILE \
-sS \
-u $FTPUSER:$FTPPASS \
ftp://$FTPHOST/$REMOTEFILE
MD5DST=$(md5sum $MD5TESTFILE | cut -c 1-32)
if [ "$MD5SRC" == "$MD5DST" ]
then
echo "+Local and Remote files match!"
else
echo "-Local and Remote files don't match"
fi

if you use inotify-tools,
then the solution will looks like this:
while ! inotifywait -e close $(dirname ${bigfile_fullname}) 2>/dev/null | \
grep -Eo "CLOSE $(basename ${bigfile_fullname})$">/dev/null
do true
done
echo "File ${bigfile_fullname} closed"

After some investigation, and discussion of the problem on other forums I have found one more solution. Maybe it can help somebody.
There is a command "lsof". It lists open files. During copying the file will be opened, so the command
lsof | grep filename
will return non empty result.
So you might want to make a while loop to wait until lsof returns nothing and proceed with your task.
Example:
# provide your file name here
f=<nameOfYourFile>
lsofresult=`lsof | grep $f | wc -l`
while [ $lsofresult != 0 ]; do
echo still copying file $f...
sleep 5
lsofresult=`lsof | grep $f | wc -l`
done; echo copying file $f is finished: `ls $f`

For the duplicate question, How to check if file has been scp 100% to the remote location , which was for an expect script, to know if a file is transferred completely, we can add expect 100% .. .. i.e something like this ...
expect -c "
set timeout 1
spawn scp user#$REMOTE_IP:/tmp/my.file user#$HOST_IP:/home/.
expect yes/no { send yes\r ; exp_continue }
expect password: { send $SCP_PASSWORD\r }
expect 100%
sleep 1
exit
"
if [ -f "/home/my.file" ]; then
echo "Success"
fi

If avoiding a second SSH handshake is important, you can use something like the following:
ssh host cat \> bigfile \&\& touch complete < bigfile
Then wait for the "complete" file to get created on the remote end.

Getting an empty file for grep output

I am running this command in a script
while [ 1 ]
do
if [ -e $LOG ]
then
grep -A 5 -B 5 -f $PATTERNS $LOG >> $FOREMAIL
break
fi
done
$LOG file is scp'ed from another machine. So as soon as it appears in the current directory, while loop detects it and does the grep. The problem is, the $FOREMAIL file turns up to be empty. But if I run this grep outside of the script as a standalone command with same files and params, I can see that it generates an output.
I am baffled as to why this command is generating no o/p in the script?

The -e is triggering as soon as scp creates the file, while it still has no data in it, and grep is operating on an empty file. You need to wait until the file has finished transferring.
You could accomplish this by transferring to a temporary filename, than running mv over ssh from the machine which is pushing the file up.
Edit: the code for the machine copying to log file up...
scp $log 192.168.0.1:/logfiles/${log}.tmp
ssh 192.168.0.1 mv /logfiles/${log}.tmp /logfiles/${log}

Before you can grep, you need to wait for two things: 1) the download started (file comes into existence) and 2) download finished (nobody is opening the file anymore). I have a script call waitfor.sh, which does this:
#!/bin/bash
# waitfor.sh - wait for a file fully downloaded (via Firefox, scp, ...)
# Syntax:
# waitfor.sh filename
FILENAME=$1 # Name of file to wait for
INTERVAL=10 # Wait interval of N seconds
# Wait for download started
while [ ! -f $FILENAME ]
do
sleep $INTERVAL
done
# Wait for download finished
while lsof $FILENAME
do
sleep $INTERVAL
done
To use it:
waitfor.sh $LOG
grep ...

Could it be that the while [1] is very fast, so when the file starts copying, it shows up as an empty file first before copying is complete? Depending on the size of the file, try a sleep delay inside the then loop. Figuring out when a file finishes copying when done by an external process is probably a separate question - e.g. googling for something like "how to tell when scp has finshed copying a file" turns up a bunch of links like: https://superuser.com/questions/45224/is-there-a-way-to-tell-if-a-file-is-done-copying

Better to use:
if [ -f $LOG ]
instead of:
if [ -e $LOG ]
-f checks for a regular type
-e checks for any file

Here's what I ended up doing:
scp $LOGFILE
then
scp $SCPDONE # empty file
And modified the if clause like this:
while [ 1 ]
do
if [ -e $SCPDONE ]
then
grep -A 5 -B 5 -f $PATTERNS $LOG >> $FOREMAIL
break
fi
done

Bash Script to allow Nagios to report ping between two other Linux machines

I'm looking for alternatives to working out the ping between two machine (mA and mB) and report this back to Nagios (on mC).
My current thoughts are to write a BASH script that will ping the machines in a cron job, output the data to a file then have another bash script that Nagios can use to read that file. This doesn't feel like the best/right way to do this though?
Here's the script I plan to run in the cron job:
#!/bin/bash
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ]
then
echo $0: usage: $0 file? ip? pingcount? deadline?
exit 126
else
FILE=$1
IP=$2
PCOUNT=$3
DLINE=$4
while read line
do
if [[ $line == rtt* ]]
then
#replace forward slash with underscore
line=${line////_}
#replace spaces with underscore
line=${line// /_}
#get the 8 item when splitting string on underscore
#echo $line| cut -d'_' -f 8 >> $FILE #Append
#echo $line| cut -d'_' -f 8 > $FILE #Overwrite
echo $line| cut -d'_' -f 8
fi
done < <(ping $IP -c $PCOUNT -q -w $DLINE) #-q output summary / -w deadline / -c pint count
I though about using trace route, but I think this would be produces a slower ping?, is there another way to achieve what I want?
Note: I know Nagios can directly ping a machine, but this isn't what I want to do and won't tell me what I want. Also this is my second script ever, so it's probably rubbish. Also, what alternative would I have if ICMP was blocked?

Have you looked at NRPE and check_ping? This would allow the nagios machine (mC) to ask mA to ping mB and then mA would report the results to mC. You would need to install and configure NRPE and the nagios-plugins on mA for this to work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

speed up bash script with multithreading? - multithreading

Related

how to filter out / ignore specific lines when comparing text files with diff

bash - wget -N if else value check

scp: how to find out that copying was finished

Getting an empty file for grep output

Bash Script to allow Nagios to report ping between two other Linux machines

Categories

Resources