Bash script to check is a directory is modifed - linux

This has probably been created before and better than mine. I have a directory where files are created for a few milliseconds before being removed. I researched and couldn't find what I was looking for so I made something to do it and added a few more features.
How it works:
you run the script, input the directory, and input the time you want it to run from 10 seconds to 6000 seconds (1 hour). It validates what you enter to make sure the directory is real and you don't exceed or go below that time. using sdiff -s it will compare the state of the directory when the script began to a new version of it ever 0.001 seconds. If there are changes it will tell you.
I wanted to share it since other may find it useful, and more importantly ask if you guys had improvements. I have been doing a lot of self-taught (mostly using stack exchange) bash scripting for almost a year and I really love it. I am always looking to improve me code. I am new to interactive scripts so if you guys have recommendations for input validation I'd love to hear it. I couldn't figure out how to get the "if" statements for time in seconds combined to check for anything less than 10 and greater than 6,000 despite trying a lot of things so I just made them separate. The "sed" portions are kind of wonky here and I didn't do a great job optimizing. I just worked on them until the output was what I wanted.
EDIT: I don't have inotify and I don't think I could get it on this locked down system.
#!/bin/bash
# Directory Check Script
# Created 13 Aug 2022
CLISESSID=$$
export CLISESSID
### DEFINE A LOCATION WHERE FILES CAN BE TEMPORARILY MADE ###
tmp=/tmp
temp1=$tmp/temp1.txt
temp2=$tmp/temp2.txt
echo "This script will check a directory to see if any files were added for the length of time you specify"
read -ep 'What is the full directory you would like to verify? ' dir
if [ ! -d "$dir" ] ; then
echo "Directory does not exist. Exiting."
exit
fi
read -ep '(This must be between 10-6000. i.e 5 minutes = 300, 10 minutes = 600, 1 hour = 6000)
How many seconds would you like to check for? ' seconds
if [[ "$seconds" -lt 10 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
if [[ "$seconds" -gt 6000 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
echo "checking $dir for $seconds seconds."
ls --full-time $dir | tail -n +2 > $temp1
SECONDS=0
echo "Checking for changes to $dir every 0.001 seconds for $seconds seconds."
until [[ $(ls --full-time $dir | tail -n +2) != $(cat "$temp1") ]] > /dev/null 2>&1
do
if (( SECONDS > $seconds ))
then
echo "Exceded defined time of $seconds seconds. Exiting."
exit 1
fi
sleep 0.001
done
ls --full-time $dir | tail -n +2 > $temp2
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " |" | wc -l) -gt 0 ]] ; then
echo "
File has been modified in $dir:"
sdiff -w 400 -s $temp1 $temp2 | sed 's/|/\n/' | sed 's/^ *//g' | sed '1~ i Before:' | sed '3~ i After:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " >" | wc -l) -gt 0 ]] ; then
echo "
File has been added to $dir:"
sdiff -w 400 -s /tmp/temp1.txt /tmp/temp2.txt | sed 's/>/\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Added file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " <" | wc -l) -gt 0 ]] ; then
echo "
File has removed modified in $dir:"
sdiff -w 400 -s $temp1 /$temp2 | sed 's/</\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Removed file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d' | sed 's/ *$//'
fi
rm -f $temp1 $temp2

Related

reuse virtual files in bash script

What I am trying to do is run a bash script that looks somewhat like this:
#!/usr/bin/bash
only1=$(comm -23 $1 $2 | wc -l)
only2=$(comm -13 $1 $2 | wc -l)
common=$(comm -12 $1 $2 | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
If I execute the script as script.sh file1 file2 it works fine. However, if I use it as script.sh <(grep 'foo' file1) <(grep 'foo' file2) it fails because the virtual files of the kind dev/fd/62 are only available for the first command (only1 in the script). The output is:
262 only in /dev/fd/63
0 only in /dev/fd/62
0 in both
Is there a way to make these virtual files available to all of the commands in the script?
The issue here is that the first invocation of comm will read to the end of both input files.
As you'd like to be able to provide pipes as the input (instead of a "real file), you'll need read the inputs once only, and then provide that as input to the subsequent commands... With pipes, as soon as data is read, it's gone and isn't coming back.
For example:
#!/bin/bash -eu
# cleanup temporary files on exit
trap 'rm ${TMP_FILE1:-} ${TMP_FILE2:-}' EXIT
TMP_FILE1=$(mktemp)
cat < $1 > $TMP_FILE1
TMP_FILE2=$(mktemp)
cat < $2 > $TMP_FILE2
only1=$(comm -23 $TMP_FILE1 $TMP_FILE2 | wc -l)
only2=$(comm -13 $TMP_FILE1 $TMP_FILE2 | wc -l)
common=$(comm -12 $TMP_FILE1 $TMP_FILE2 | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
If your files are small enough, then you can get away with reading them into variables:
#!/bin/bash -eu
FILE1=$( < $1 )
FILE2=$( < $2 )
only1=$(comm -23 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
only2=$(comm -13 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
common=$(comm -12 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
Please also note that comm only works on sorted data... which means you probably want to use sort on the inputs, unless you are fully aware of the consequences of using unsorted inputs.
sort < $1 > $TMP_FILE1
FILE1=$( sort < $1 )

function in loop corrupts every other iteration

I made a short bash program to download podcasts and retrieve only last 20 seconds.
Strange thing is it fails downloading every other iteration. There seems to be a problem with the function trim_nsec, because when I get rid of it in the loop, all the rest correctly works.
Edit : addition of double quotes, which doesn't solve the problem
<!-- language: lang-bash -->
#!/bin/bash
# Get podcast list
wget -O feed http://www.rtl.fr/podcast/on-n-est-pas-forcement-d-accord.xml
function trim_nsec () {
# arguments : 1 : mp3file - 2 : duration - 3 : outputfile
duration=$(ffprobe -i "${1}" -show_entries format=duration -v quiet -of csv="p=0")
nth_second=$(echo "${duration} - ${2}"|bc)
ffmpeg -i "${1}" -ss "${nth_second}" "${3}"
}
cpt=1
# let's work only on the 4th first files
grep -Po 'http[^<]*.mp3' feed|grep admedia| head -n 4 > list
cat list | while read i
do
year=$(echo "$i" | cut -d"/" -f6)
day=$(echo "$i" | cut -d"/" -f7)
fullname=$(echo "$i" | awk -F"/" '{print $NF}')
fullnameend=$(echo "$fullname" |sed -e 's/\.mp3$/_end\.mp3/')
new_name=$(echo "$year"_"$day"_"$fullnameend")
# let's download
wget -O "$fullname" "$i"
# let's trim last 20 sec
trim_nsec "$fullname" 20 "$new_name"
echo "$cpt file processed"
#delete orig. file :
rm "$fullname"
((cpt++))
done
Any idea ?
The problem is most likely due to the fact that on errors, ffmpeg will try to get an input from user which will consume the input provided by cat list. See a similar question here or here. To prevent trim_nsec from consuming the input from cat list, you could do:
cat list | while read i
do
year=$(echo "$i" | cut -d"/" -f6)
day=$(echo "$i" | cut -d"/" -f7)
fullname=$(echo "$i" | awk -F"/" '{print $NF}')
fullnameend=$(echo "$fullname" |sed -e 's/\.mp3$/_end\.mp3/')
new_name=$(echo "$year"_"$day"_"$fullnameend")
# let's download
wget -c -O "$fullname" "$i"
# let's trim last 20 sec
trim_nsec "$fullname" 20 "$new_name" <&3
echo "$cpt file processed"
#delete orig. file :
#rm "$fullname"
((cpt++))
done 3<&1

I want to check if some given files contain more then 3 words from an input file in a shell script

My first parameter is the file that contains the given words and the rest are the other directories in which I'm searching for files, that contain at least 3 of the words from the 1st parameter
I can successfully print out the number of matching words, but when testing if it's greater then 3 it gives me the error: test: too many arguments
Here's my code:
#!/bin/bash
file=$1
shift 1
for i in $*
do
for j in `find $i`
do
if test -f "$j"
then
if test grep -o -w "`cat $file`" $j | wc -w -ge 3
then
echo $j
fi
fi
done
done
You first need to execute the grep | wc, and then compare that output with 3. You need to change your if statement for that. Since you are already using the backquotes, you cannot nest them, so you can use the other syntax $(command), which is equivalent to `command`:
if [ $(grep -o -w "`cat $file`" $j | wc -w) -ge 3 ]
then
echo $j
fi
I believe your problem is that you are trying to get the result of grep -o -w "cat $file" $j | wc -w to see if it's greater or equal to three, but your syntax is incorrect. Try this instead:
if test $(grep -o -w "`cat $file`" $j | wc -w) -ge 3
By putting the grep & wc commands inside the $(), the shell executes those commands and uses the output rather than the text of the commands themselves. Consider this:
> cat words
western
found
better
remember
> echo "cat words | wc -w"
cat words | wc -w
> echo $(cat words | wc -w)
4
> echo "cat words | wc -w gives you $(cat words | wc -w)"
cat words | wc -w gives you 4
>
Note that the $() syntax is equivalent to the double backtick notation you're already using for the cat $file command.
Hope this helps!
Your code can be refactored and corrected at few places.
Have it this way:
#!/bin/bash
input="$1"
shift
for dir; do
while IFS= read -r d '' file; do
if [[ $(grep -woFf "$input" "$file" | sort -u | wc -l) -ge 3 ]]; then
echo "$file"
fi
done < <(find "$dir" -type f -print0)
done
for dir loops through all the arguments
Use of sort -u is to remove duplicate words from output of grep.
Usewc -linstead ofwc -wsincegrep -o` prints matching words in separate lines.
find ... -print0 is to take care of file that may have whitespaces.
find ... -type f is to retrieve only files and avoid checking for -f later.

Multiple variables into one variable with wildcard

I have this script:
#!/bin/bash
ping_1=$(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_2=$(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_3=$(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//')
ping_4=$(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' )
Then I would like to treat the outputs of ping_1-4 in one variable. Something like this:
#!/bin/bash
if [ "$ping_*" -gt 50 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi
Is there a possibility in bash to read these variables with some sort of wildcard?
$ping_*
Did nothing for me.
The answer to your stated problem is that yes, you can do this with parameter expansion in bash (but not in sh):
#!/bin/bash
ping_1=foo
ping_2=bar
ping_etc=baz
for var in "${!ping_#}"
do
echo "$var is set to ${!var}"
done
will print
ping_1 is set to foo
ping_2 is set to bar
ping_etc is set to baz
Here's man bash:
${!prefix*}
${!prefix#}
Names matching prefix. Expands to the names of variables whose
names begin with prefix, separated by the first character of the
IFS special variable. When # is used and the expansion appears
within double quotes, each variable name expands to a separate
word.
The answer to your actual problem is to use arrays instead.
I don't think there's such wildcard.
But you could use a loop to iterate over values, for example:
exists_too_high() {
for value; do
if [ "$value" -gt 50 ]; then
return 0
fi
done
return 1
}
if exists_too_high "$ping_1" "$ping_2" "$ping_3" "$ping_4"; then
echo "One ping is to high"
else
echo "The pings are fine"
fi
You can use "and" (-a) param:
if [ $ping_1 -gt 50 -a \
$ping_2 -gt 50 -a \
$ping_3 -gt 50 -a ]; then
...
...
Or instead of defining a lot of variables, you can make an array and check with a loop:
pings+=($(ping -c 1 www.test.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test1.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test2.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//'))
pings+=($(ping -c 1 www.test3.com | tail -1| awk '{print $4}' | cut -d '/' -f 2 | sed 's/\.[^.]*$//' ))
too_high=0
for ping in ${pings[#]}; do
if [ $ping -gt 50 ]; then
too_high=1
break
fi
done
if [ $too_high -eq 1 ]; then
echo "One ping is to high"
else
echo "The pings are fine"
fi
To complement the existing, helpful answers with an array-based solution that demonstrates:
several advanced Bash techniques (robust array handling, compound conditionals, handling the case where pinging fails)
an optimized way to extract the average timing from ping's output by way of a single sed command (works with both GNU and BSD/macOS sed).
reporting the servers that either took too long or failed to respond by name.
#!/usr/bin/env bash
# Determine the servers to ping as an array.
servers=( 'www.test.com' 'www.test1.com' 'www.test2.com' 'www.test3.com' )
# Initialize the array in which timings will be stored, paralleling the
# "${servers[#]}" array.
avgPingTimes=()
# Initialize the array that stores the names of the servers that either took
# too long to respond (on average), or couldn't pe pinged at all.
failingServers=()
# Determine the threshold above which a timing is considered too high, in ms.
# Note that a shell variable should contain at least 1 lowercase character.
kMAX_TIME=50
# Determine how many pings to send per server to calculate the average timing
# from.
kPINGS_PER_SERVER=1
for server in "${servers[#]}"; do
# Ping the server at hand, extracting the integer portion of the average
# timing.
# Note that if pinging fails, $avgPingTime will be empty.
avgPingTime="$(ping -c "$kPINGS_PER_SERVER" "$server" |
sed -En 's|^.* = [^/]+/([^.]+).+$|\1|p')"
# Check if the most recent ping failed or took too long and add
# the server to the failure array, if so.
[[ -z $avgPingTime || $avgPingTime -gt $kMAX_TIME ]] && failingServers+=( "$server" )
# Add the timing to the output array.
avgPingTimes+=( "$avgPingTime" )
done
if [[ -n $failingServers ]]; then # pinging at least 1 server took too long or failed
echo "${#failingServers[#]} of the ${#servers[#]} servers took too long or couldn't be pinged:"
printf '%s\n' "${failingServers[#]}"
else
echo "All ${#servers[#]} servers responded to pings in a timely fashion."
fi
Yes bash can list variables that begin with $ping_, by using its internal compgen -v command, (see man bash under SHELL BUILTIN COMMANDS), i.e.:
for f in `compgen -v ping_` foo ; do
eval p=\$$f
if [ "$p" -gt 50 ]; then
echo "One ping is too high"
break 1
fi
[ $f=foo ] && echo "The pings are fine"
done
Note the added loop item foo -- if the loop gets through all the variables, then print "the pings are fine".

extract average time from ping -c

I want to extract from the command ping -c 4 www.stackoverflow.com | tail -1| awk '{print $4}'
the average time.
107.921/108.929/110.394/0.905 ms
Output should be: 108.929
One way is to just add a cut to what you have there.
ping -c 4 www.stackoverflow.com | tail -1| awk '{print $4}' | cut -d '/' -f 2
ping -c 4 www.stackoverflow.com | tail -1| awk -F '/' '{print $5}' would work fine.
"-F" option is used to specify the field separator.
This might work for you:
ping -c 4 www.stackoverflow.com | sed '$!d;s|.*/\([0-9.]*\)/.*|\1|'
The following solution uses Bash only (requires Bash 3):
[[ $(ping -q -c 4 www.example.com) =~ \ =\ [^/]*/([0-9]+\.[0-9]+).*ms ]] \
&& echo ${BASH_REMATCH[1]}
For the regular expression it's easier to read (and handle) if it is stored in a variable:
regex='= [^/]*/([0-9]+\.[0-9]+).*ms'
[[ $(ping -q -c 4 www.example.com) =~ $regex ]] && echo ${BASH_REMATCH[1]}
Promoting luissquall's very elegent comment to an answer:
ping -c 4 www.stackoverflow.com | awk -F '/' 'END {print $5}'
Direct extract mean time from ping command:
ping -w 4 -q www.duckduckgo.com | cut -d "/" -s -f5
Options:
-w time out 4 seconds
-q quite mode
-d delimiter
-s skip line without delimiter
-f No. of field - depends on your system - sometimes 5th, sometimes 4th
I personly use is this way:
if [ $(ping -w 2 -q www.duckduckgo.com | cut -d "/" -s -f4 | cut -d "." -f1) -lt 20 ]; then
echo "good response time"
else
echo "bad response time"
fi
Use these to get current ping as a single number:
123.456:
ping -w1 -c1 8.8.8.8 | tail -1| cut -d '=' -f 2 | cut -d '/' -f 2
123:
ping -w1 -c1 8.8.8.8 | tail -1| cut -d '=' -f 2 | cut -d '/' -f 2 | cut -d '.' -f 1
Note that this displays the average of only 1 ping (-c1), you can increase the sample size by increasing this number (i.e. -c1337)
This avoids using awk (like #Buggabill posted), which doesn't play nice in bash aliases + takes a nanosecond longer
None of these worked well for me due to various issues such as when a timeout occurs. I only wanted to see bad ping times or timeouts and wanted PING to continue quickly, and none of these solutions worked. Here's my BASH script that works well to do both. Note that in the ping command, response time is limited to 1 second.
I realize this does not directly answer the OP's question, however it does provide a good way to deal with some issues that occur with some of the incomplete "solutions" provided here, thus going beyond the scope of the OPs question, which others coming here are looking for (I cite myself as an example), so I decided to share for those people, not specifically OP's question.
while true
do
###Set your IP amd max milliseconds###
ip="192.168.1.53"
maxms=50
###do not edit below###
err="100% packet loss"
out="$(ping -c 1 -i 1 -w 1 $ip)"
t="$(echo $out | awk -F '/' 'END {print $5}')"
t=${t%.*}
re='^[0-9]+$'
if ! [[ $t =~ $re ]] ; then
if [[ $out == *"$err"* ]] ; then
echo "`date` | ${ip}: TIMEOUT"
else
echo "error: Not a number: ${t} was found in: ${out}"
fi
else
if [ "$t" -gt $maxms ]; then
echo "`date` | ${ip}: ${t} ms"
fi
fi
done

Resources