Shell script with Wget - If else nested inside for loop - linux

I'm trying to make a shell script that reads a list of download URLs to find if they're still active. I'm not sure what's wrong with my current script, (I'm new to this) and any pointers would be a huge help!
user#pc:~/test# cat sites.list
http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite
Script:
#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done
As is, the script outputs everything to notok.txt - (the first google site should go to ok.txt).
But if I run:
wget --spider http://www.google.com/images/srpr/logo3w.png -b
And then do:
grep "200 OK" wget-log
It greps the string without any problems. What noob mistake did I make with the syntax? Thanks m8s!

The -b option is sending wget to the background, so you're doing the grep before wget has finished.
Try without the -b option:
if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then

There are a few issues with what you're doing.
Your for i in will have problems with lines that contain whitespace. Better to use while read to read individual lines of a file.
You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
wget isn't necessarily the best tool for this. I'd advise using curl instead.
So here's a better way to handle this...
#!/bin/bash
sitelist="sites.list"
curl="/usr/bin/curl"
# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
echo "ERROR: Sitelist is missing." >&2
exit 1
elif [[ ! -s "$sitelist" ]]; then
echo "ERROR: Sitelist is empty." >&2
exit 1
elif [[ ! -x "$curl" ]]; then
echo "ERROR: I can't work under these conditions." >&2
exit 1
fi
# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar
while read url; do
# remove comments
url=${url%%#*}
# skip empty lines
if [[ -z "$url" ]]; then
continue
fi
# Handle just ftp, http and https.
# We could do full URL pattern matching, but meh.
case "$url" in
#(f|ht)tp?(s)://*)
# Get just the numeric HTTP response code
http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
case "$http_code" in
200|226)
# You'll get a 226 in ${http_code} from a valid FTP URL.
# If all you really care about is that the response is in the 200's,
# you could match against "2??" instead.
echo "$url" >> ok.txt
;;
*)
# You might want different handling for redirects (301/302).
echo "$url" >> notok.txt
;;
esac
;;
*)
# If we're here, we didn't get a URL we could read.
echo "WARNING: invalid url: $url" >&2
;;
esac
done < "$sitelist"
This is untested. For educational purposes only. May contain nuts.

Related

Using inotifywait to process two files in parallel

I am using:
inotifywait -m -q -e close_write --format %f . | while IFS= read -r file; do
cp -p "$file" /path/to/other/directory
done
to monitor a folder for file completion, then moving it to another folder.
Files are made in pairs but at separate times, ie File1_001.txt is made at 3pm, File1_002.txt is made at 9pm. I want to monitor for the completion of BOTH files, then launch a script.
script.sh File1_001.txt File1_002.txt
So I need to have another inotifywait command or a different utility, that can also identify that both files are present and completed, then start the script.
Does anyone know how to solve this problem?
I found a Linux box with inotifywait installed on it, so now I understand what it does and how it works. :)
Is this what you need?
#!/bin/bash
if [ "$1" = "-v" ]; then
Verbose=true
shift
else
Verbose=false
fi
file1="$1"
file2="$2"
$Verbose && printf 'Waiting for %s and %s.\n' "$file1" "$file2"
got1=false
got2=false
while read thisfile; do
$Verbose && printf ">> $thisfile"
case "$thisfile" in
$file1) got1=true; $Verbose && printf "... it's a match!" ;;
$file2) got2=true; $Verbose && printf "... it's a match!" ;;
esac
$Verbose && printf '\n'
if $got1 && $got2; then
$Verbose && printf 'Saw both files.\n'
break
fi
done < <(inotifywait -m -q -e close_write --format %f .)
This runs a single inotifywait but parses its output in a loop that exits when both files on the command line ($1 and $2) are seen to have been updated.
Note that if one file is closed and then later is reopened while the second file is closed, this script obviously will not detect the open file. But that may not be a concern in your use case.
Note that there are many ways of building a solution -- I've shown you only one.

How to check status of URLs from text file using bash shell script

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me

Bash | curl | curls 2 URL's then stops

I am trying to write a simple bash script that will use a list from a text document and curl each URL that is on the list in order to see what the contents of each URL is. It allows me to cURL 2 sites and creates the text documents for the rest however it only downloads the first 2. I have already manage to write the script that pulls there IP's and places them in a seperate file using the grep command. At first i tried
#!/bin/bash
for var in `cat host.txt`; do
curl -s $var >> /tmp/ping/html/$var.html
done
I have tried with and without the silent switch. I then tried the following:
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
wait
done
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would try and do them all at the same time again stopping after 2
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
done
wait
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would do the same, I am new to bash scripting and only know some of the basics, any help would be appreciated
Start with the simple: verify that you are in fact iterating over the entire list:
# This is the recommended way to iterate over the file. See
# http://mywiki.wooledge.org/BashFAQ/001
while read -r var; do
echo "$var"
done < hosts.txt
Then add in the call to curl, checking its exit status
while read -r var; do
echo "$var"
curl "$var" >> /tmp/ping/html/$var.html || echo "curl failed: $?"
done < hosts.txt
You pipe into $var, which could result in a wrong filename, because of the two slashes in the URL. Additionally i would quote the URL. For Example it works with the basename of the URL.
#!/bin/bash
for var in `cat host.txt`; do
name=$(basename $var)
curl -v -s "$var" -o "/tmp/ping/html/$name.html"
done
You may also want to skip blank lines and Comments (#)
#!/bin/bash
file="host.txt"
curl="curl"
while read -r line
do
[[ $line = \#* ]] || [[ -z "${line}" ]] && continue
filename=$(basename $line)
$curl -s "$line" >> "/tmp/ping/html/$filename.html"
done < "$file"

Bash: Create a file if it does not exist, otherwise check to see if it is writeable

I have a bash program that will write to an output file. This file may or may not exist, but the script must check permissions and fail early. I can't find an elegant way to make this happen. Here's what I have tried.
set +e
touch $file
set -e
if [ $? -ne 0 ]; then exit;fi
I keep set -e on for this script so it fails if there is ever an error on any line. Is there an easier way to do the above script?
Why complicate things?
file=exists_and_writeable
if [ ! -e "$file" ] ; then
touch "$file"
fi
if [ ! -w "$file" ] ; then
echo cannot write to $file
exit 1
fi
Or, more concisely,
( [ -e "$file" ] || touch "$file" ) && [ ! -w "$file" ] && echo cannot write to $file && exit 1
Rather than check $? on a different line, check the return value immediately like this:
touch file || exit
As long as your umask doesn't restrict the write bit from being set, you can just rely on the return value of touch
You can use -w to check if a file is writable (search for it in the bash man page).
if [[ ! -w $file ]]; then exit; fi
Why must the script fail early? By separating the writable test and the file open() you introduce a race condition. Instead, why not try to open (truncate/append) the file for writing, and deal with the error if it occurs? Something like:
$ echo foo > output.txt
$ if [ $? -ne 0 ]; then die("Couldn't echo foo")
As others mention, the "noclobber" option might be useful if you want to avoid overwriting existing files.
Open the file for writing. In the shell, this is done with an output redirection. You can redirect the shell's standard output by putting the redirection on the exec built-in with no argument.
set -e
exec >shell.out # exit if shell.out can't be opened
echo "This will appear in shell.out"
Make sure you haven't set the noclobber option (which is useful interactively but often unusable in scripts). Use > if you want to truncate the file if it exists, and >> if you want to append instead.
If you only want to test permissions, you can run : >foo.out to create the file (or truncate it if it exists).
If you only want some commands to write to the file, open it on some other descriptor, then redirect as needed.
set -e
exec 3>foo.out
echo "This will appear on the standard output"
echo >&3 "This will appear in foo.out"
echo "This will appear both on standard output and in foo.out" | tee /dev/fd/3
(/dev/fd is not supported everywhere; it's available at least on Linux, *BSD, Solaris and Cygwin.)

wget with errorlevel bash output

I want to create a bash file (.sh) which does the following:
I call the script like ./download.sh www.blabla.com/bla.jpg
the script has to echo then if the file has downloaded or not...
How can I do this? I know I can use errorlevel but I'm new to linux so...
Thanks in advance!
Typically applications in Linux will set the value of the environment variable $? on failure. You can examine this return code and see if it gets you any error for wget.
#!/bin/bash
wget $1 2>/dev/null
export RC=$?
if [ "$RC" = "0" ]; then
echo $1 OK
else
echo $1 FAILED
fi
You could name this script download.sh. Change the permissions to 755 with chmod 755. Call it with the name of the file you wish to download. ./download.sh www.google.com
You could try something like:
#!/bin/sh
[ -n $1 ] || {
echo "Usage: $0 [url to file to get]" >&2
exit 1
}
wget $1
[ $? ] && {
echo "Could not download $1" | mail -s "Uh Oh" you#yourdomain.com
echo "Aww snap ..." >&2
exit 1
}
# If we're here, it downloaded successfully, and will exit with a normal status
When making a script that will (likely) be called by other scripts, it is important to do the following:
Ensure argument sanity
Send e-mail, write to a log, or do something else so someone knows what went wrong
The >&2 simply redirects the output of error messages to stderror, which allows a calling script to do something like this:
foo-downloader >/dev/null 2>/some/log/file.txt
Since it is a short wrapper, no reason to forsake a bit of sanity :)
This also allows you to selectively direct the output of wget to /dev/null, you might actually want to see it when testing, especially if you get an e-mail saying it failed :)
wget executes in non-interactive way. This means that wget work in the background and you can't catch de return code with $?.
One solution it's to handle the "--server-response" property, searching http 200 status code
Example:
wget --server-response -q -o wgetOut http://www.someurl.com
sleep 5
_wgetHttpCode=`cat wgetOut | gawk '/HTTP/{ print $2 }'`
if [ "$_wgetHttpCode" != "200" ]; then
echo "[Error] `cat wgetOut`"
fi
Note: wget need some time to finish his work, for that reason I put "sleep 5". This is not the best way to do but worked ok for test the solution.

Resources