How can this bash script be made shorter and better? Is there a way to combine multiple curl statements and if statements in bash? - linux

Status_code_1=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" https://url_1.com)
Status_code_2=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" https://url_2.com)
if [ $Status_code_1 == "200" ]; then
echo "url_1 is running successfully"
else
echo "Error at url_1. Status code:" $Status_code_1
fi
if [ $Status_code_2 == "200" ]; then
echo "url_2 is running successfully"
else
echo "Error Error at url_2. Status code:" $Status_code_2
fi
The main script is scheduled and runs everyday and prints the success message everytime. If the status code is anything other than 200, $Status_code_1 or $Status_code_2, whichever is down prints the error code.
The code is working fine but I want to know how can it be made shorter. Can the curl command from first 2 lines be combined because they have same authorization and everything, it's just the url at the end is different. Also later if statements are pretty much same, only that I'm running them separately for different urls.
Is it possible to write first 2 lines in one line and same for both if statements? I know AND and OR can be used for if statements but say we have 5 urls and 2 are down, how it will print the name of those 2 urls in that case?

Is there a way to combine multiple curl statements and if statements in bash?
curl can retrieve multiple URLs in one run, but then you're left to parse its output into per-URL pieces. Since you want to report on the response for each URL, it is probably to your advantage to run curl separately for each.
But you can make the script less repetitive by writing a shell function that performs one curl run and reports on the results:
test_url() {
local status=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" "$1")
if [ "$status" = 200 ]; then
echo "$2 is running successfully"
else
echo "Error at $2. Status code: $status"
fi
}
Then running it for a given URL is a one-liner:
test_url https://url_1.com url1_1
test_url https://url_2.com url1_2
Altogether, that's also about the same length as the original, but this is the break-even point on length. Each specific URL to test requires only one line, as opposed to six in your version. Also, if you want to change any of the details of the lookup or status reporting then you can do it in one place for all.

To avoid repetition, you can encapsulate code you need to reuse in a function.
httpresult () {
curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" "$#"
}
check200 () {
local status=$(httpresult "$1")
if [ "$status" = "200" ]; then
echo "$0: ${2-$1} is running successfully" >&2
else
echo "$0: error at ${2-$1}. Status code: $status" >&2
fi
}
check200 "https://url_1.com/" "url_1"
check200 "https://url_2.com/" "url_2"
Splitting httpresult to a separate function isn't really necessary, but perhaps useful both as a demonstration of a more modular design, and as something you might reuse in other scripts too.
I changed the formatting of the status message to include the name of the script in the message, and to print diagnostics to standard error instead of standard output, in accordance with common best practices.
The check200 function accepts a URL and optionally a human-readable label to use in the diagnostic messages; if you omit it, they will simply contain the URL, too. It wasn't clear from your question whether the labels are important and useful.
Notice that the standard comparison operator in [ ... ] is =, not == (though Bash will accept both).

Related

Keep track of execution time in a bash script and terminate current command if it takes too long

I'm trying to create a bash script to download files en masse from a certain website.
Their download links are sequential - e.g. it's just id=1, id=2, id=3 all the way up to 660000. The only requirement is that you have to be logged in, which makes this a bit harder. Oh, and the login will randomly time out after a few hours so I have to log back in.
Here's my current script, which works well about 99% of the time.
#!/bin/sh
cd downloads
for i in `seq 1 660000`
do
lastname=""
echo "Downloading file $i"
echo "Downloading file $i" >> _downloadinglog.txt
response=$(curl --write-out %{http_code} -b _cookies.txt -c _cookies.txt --silent --output /dev/null "[sample URL to make sure cookie is still logged in]")
if ! [ $response -eq 200 ]
then
echo "Cookie didn't work, trying to re-log in..."
curl -d "userid=[USERNAME]" -d "pwd=[PASSWORD]" -b _cookies.txt -c _cookies.txt --silent --output /dev/null "[login URL]"
response=$(curl --write-out %{http_code} -b _cookies.txt -c _cookies.txt --silent --output /dev/null "[sample URL again]")
if ! [ $response -eq 200 ]
then
echo "Something weird happened?? Response code $response. Logging in didn't fix issue, fix then resume from $(($i - 1))"
echo "Something weird happened?? Response code $response. Logging in didn't fix issue, fix then resume from $(($i - 1))" >> _downloadinglog.txt
exit 0
fi
echo "Downloading file $(($i - 1)) again incase cookie expiring caused it to fail"
echo "Downloading file $(($i - 1)) again incase cookie expiring caused it to fail" >> _downloadinglog.txt
lastname=$(curl --write-out %{filename_effective} -O -J -b _cookies.txt -c _cookies.txt "[URL to download files]?id=$(($i - 1))")
echo "id $(($i - 1)) = $lastname" >> _downloadinglog.txt
lastname=""
echo "Downloading file $i"
echo "Downloading file $i" >> _downloadinglog.txt
fi
lastname=$(curl --write-out %{filename_effective} -O -J -b _cookies.txt -c _cookies.txt "[URL to download files]?id=$i")
echo "id $i = $lastname" >> _downloadinglog.txt
done
So basically what I have it doing is attempting to download a random file before moving to the next file in the set. If the download fails, we assume the login cookie is no longer valid and tell curl to log me back in.
This works great, and I was able to get several thousand files from it. But what would happen is - either my router goes down for a second or two, or THEIR site goes down for a minute or two, and curl will just sit there thinking it's downloading for hours. I once came back to it literally spending 24 hours on the same file. It doesn't seem to have the ability to know if the transfer timed out in the middle - only if it can't START the transfer.
I know there are ways to terminate execution of a command if you combine it with "sleep", but since this has to be "smart" and restart from where it left off, I can't just kill the whole script.
Any suggestions? I'm open to using something other than curl if I can use it to login via a terminal command.
You can try using the curl options --connect-timeout or --max-time .
--max-time should be your pick.
From manual :
--max-time
Maximum time in seconds that you allow the whole operation to take. This is useful for preventing your batch jobs from hanging for hours due to slow networks or links going down. Since 7.32.0, this option accepts decimal values, but the actual timeout will decrease in accuracy as the specified timeout increases in decimal precision. See also the --connect-timeout option.
If this option is used several times, the last one will be used.
Then capture the result of the command in a var and process further based on the result.

Perl/ curl How to get Status Code and Response Body

I am trying to write a simple perl script that calls and API and if the status code is 2xx the do something with the response. While if it is 4xx or 5xx then do something else.
The issue I am encountering is I am able to either get the response code (using a custom write-out formatter and pass the output somewhere else) or I can get the whole response and the headers.
my $curlResponseCode = `curl -s -o /dev/null -w "%{http_code}" ....`;
Will give me the status code only.
my $curlResponse = `curl -si ...`;
Will give me the entire header plus the response.
My question is how can I obtain the response body from the server and the http status code in a neat format that allows me to separate them into two separate variables.
Unfortunately I cannot use LWP or any other separate libraries.
Thanks in advance.
-Spencer
I came up with this solution:
URL="http://google.com"
# store the whole response with the status at the and
HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST $URL)
# extract the body
HTTP_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
# extract the status
HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
# print the body
echo "$HTTP_BODY"
# example using the status
if [ ! $HTTP_STATUS -eq 200 ]; then
echo "Error [HTTP status: $HTTP_STATUS]"
exit 1
fi
...Will give me the entire header plus the response.
...in a neat format that allows me to separate them into two separate variables.
Since header and body are simply delimited by an empty line you can split the content on this line:
my ($head,$body) = split( m{\r?\n\r?\n}, `curl -si http://example.com `,2 );
And to get the status code from the header
my ($code) = $head =~m{\A\S+ (\d+)};
You might also combine this into a single expression with a regexp, although this might be harder to understand:
my ($code,$body) = `curl -si http://example.com`
=~m{\A\S+ (\d+) .*?\r?\n\r?\n(.*)}s;
Pretty fundamentally - you're capturing output from a system command. It is far and away better to do this by using the library built for it - LWP.
Failing that though - curl -v will produce status code and content, and you'll have to parse it.
You might also find this thread on SuperUser useful:
https://superuser.com/questions/272265/getting-curl-to-output-http-status-code
Specifically
#creates a new file descriptor 3 that redirects to 1 (STDOUT)
exec 3>&1
# Run curl in a separate command, capturing output of -w "%{http_code}" into HTTP_STATUS
# and sending the content to this command's STDOUT with -o >(cat >&3)
HTTP_STATUS=$(curl -w "%{http_code}" -o >(cat >&3) 'http://example.com')
(That isn't perl, but you can probably use something similar. At very least, running the -w and capturing your content to a temp file.
Haven't figured out a "pure" Perl solution, but I drafted this snippet to check the HTTP response code of a page via curl:
#!/usr/bin/perl
use v5.30;
use warnings;
use diagnostics;
our $url = "";
my $username = "";
my $password = "";
=begin url_check
Exit if HTTP response code not 200.
=cut
sub url_check {
print "Checking URL status code...\n";
my $status_code =
(`curl --max-time 2.5 --user ${username}:${password} --output /dev/null --silent --head --write-out '%{http_code}\n' $url`);
if ($status_code != '200'){
{
print "URL not accessible. Exiting. \n";
exit;
}
} else {
print "URL accessible. Continuing... \n";
}
}
url_check
The verbose use of curl more or less documents itself. My example allows you to pass credentials to a page, but that can be removed as needed.

How to check status of URLs from text file using bash shell script

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me

Shell script with Wget - If else nested inside for loop

I'm trying to make a shell script that reads a list of download URLs to find if they're still active. I'm not sure what's wrong with my current script, (I'm new to this) and any pointers would be a huge help!
user#pc:~/test# cat sites.list
http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite
Script:
#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done
As is, the script outputs everything to notok.txt - (the first google site should go to ok.txt).
But if I run:
wget --spider http://www.google.com/images/srpr/logo3w.png -b
And then do:
grep "200 OK" wget-log
It greps the string without any problems. What noob mistake did I make with the syntax? Thanks m8s!
The -b option is sending wget to the background, so you're doing the grep before wget has finished.
Try without the -b option:
if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then
There are a few issues with what you're doing.
Your for i in will have problems with lines that contain whitespace. Better to use while read to read individual lines of a file.
You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
wget isn't necessarily the best tool for this. I'd advise using curl instead.
So here's a better way to handle this...
#!/bin/bash
sitelist="sites.list"
curl="/usr/bin/curl"
# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
echo "ERROR: Sitelist is missing." >&2
exit 1
elif [[ ! -s "$sitelist" ]]; then
echo "ERROR: Sitelist is empty." >&2
exit 1
elif [[ ! -x "$curl" ]]; then
echo "ERROR: I can't work under these conditions." >&2
exit 1
fi
# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar
while read url; do
# remove comments
url=${url%%#*}
# skip empty lines
if [[ -z "$url" ]]; then
continue
fi
# Handle just ftp, http and https.
# We could do full URL pattern matching, but meh.
case "$url" in
#(f|ht)tp?(s)://*)
# Get just the numeric HTTP response code
http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
case "$http_code" in
200|226)
# You'll get a 226 in ${http_code} from a valid FTP URL.
# If all you really care about is that the response is in the 200's,
# you could match against "2??" instead.
echo "$url" >> ok.txt
;;
*)
# You might want different handling for redirects (301/302).
echo "$url" >> notok.txt
;;
esac
;;
*)
# If we're here, we didn't get a URL we could read.
echo "WARNING: invalid url: $url" >&2
;;
esac
done < "$sitelist"
This is untested. For educational purposes only. May contain nuts.

wget with errorlevel bash output

I want to create a bash file (.sh) which does the following:
I call the script like ./download.sh www.blabla.com/bla.jpg
the script has to echo then if the file has downloaded or not...
How can I do this? I know I can use errorlevel but I'm new to linux so...
Thanks in advance!
Typically applications in Linux will set the value of the environment variable $? on failure. You can examine this return code and see if it gets you any error for wget.
#!/bin/bash
wget $1 2>/dev/null
export RC=$?
if [ "$RC" = "0" ]; then
echo $1 OK
else
echo $1 FAILED
fi
You could name this script download.sh. Change the permissions to 755 with chmod 755. Call it with the name of the file you wish to download. ./download.sh www.google.com
You could try something like:
#!/bin/sh
[ -n $1 ] || {
echo "Usage: $0 [url to file to get]" >&2
exit 1
}
wget $1
[ $? ] && {
echo "Could not download $1" | mail -s "Uh Oh" you#yourdomain.com
echo "Aww snap ..." >&2
exit 1
}
# If we're here, it downloaded successfully, and will exit with a normal status
When making a script that will (likely) be called by other scripts, it is important to do the following:
Ensure argument sanity
Send e-mail, write to a log, or do something else so someone knows what went wrong
The >&2 simply redirects the output of error messages to stderror, which allows a calling script to do something like this:
foo-downloader >/dev/null 2>/some/log/file.txt
Since it is a short wrapper, no reason to forsake a bit of sanity :)
This also allows you to selectively direct the output of wget to /dev/null, you might actually want to see it when testing, especially if you get an e-mail saying it failed :)
wget executes in non-interactive way. This means that wget work in the background and you can't catch de return code with $?.
One solution it's to handle the "--server-response" property, searching http 200 status code
Example:
wget --server-response -q -o wgetOut http://www.someurl.com
sleep 5
_wgetHttpCode=`cat wgetOut | gawk '/HTTP/{ print $2 }'`
if [ "$_wgetHttpCode" != "200" ]; then
echo "[Error] `cat wgetOut`"
fi
Note: wget need some time to finish his work, for that reason I put "sleep 5". This is not the best way to do but worked ok for test the solution.

Resources