I am trying to write a simple perl script that calls and API and if the status code is 2xx the do something with the response. While if it is 4xx or 5xx then do something else.
The issue I am encountering is I am able to either get the response code (using a custom write-out formatter and pass the output somewhere else) or I can get the whole response and the headers.
my $curlResponseCode = `curl -s -o /dev/null -w "%{http_code}" ....`;
Will give me the status code only.
my $curlResponse = `curl -si ...`;
Will give me the entire header plus the response.
My question is how can I obtain the response body from the server and the http status code in a neat format that allows me to separate them into two separate variables.
Unfortunately I cannot use LWP or any other separate libraries.
Thanks in advance.
-Spencer
I came up with this solution:
URL="http://google.com"
# store the whole response with the status at the and
HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST $URL)
# extract the body
HTTP_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
# extract the status
HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
# print the body
echo "$HTTP_BODY"
# example using the status
if [ ! $HTTP_STATUS -eq 200 ]; then
echo "Error [HTTP status: $HTTP_STATUS]"
exit 1
fi
...Will give me the entire header plus the response.
...in a neat format that allows me to separate them into two separate variables.
Since header and body are simply delimited by an empty line you can split the content on this line:
my ($head,$body) = split( m{\r?\n\r?\n}, `curl -si http://example.com `,2 );
And to get the status code from the header
my ($code) = $head =~m{\A\S+ (\d+)};
You might also combine this into a single expression with a regexp, although this might be harder to understand:
my ($code,$body) = `curl -si http://example.com`
=~m{\A\S+ (\d+) .*?\r?\n\r?\n(.*)}s;
Pretty fundamentally - you're capturing output from a system command. It is far and away better to do this by using the library built for it - LWP.
Failing that though - curl -v will produce status code and content, and you'll have to parse it.
You might also find this thread on SuperUser useful:
https://superuser.com/questions/272265/getting-curl-to-output-http-status-code
Specifically
#creates a new file descriptor 3 that redirects to 1 (STDOUT)
exec 3>&1
# Run curl in a separate command, capturing output of -w "%{http_code}" into HTTP_STATUS
# and sending the content to this command's STDOUT with -o >(cat >&3)
HTTP_STATUS=$(curl -w "%{http_code}" -o >(cat >&3) 'http://example.com')
(That isn't perl, but you can probably use something similar. At very least, running the -w and capturing your content to a temp file.
Haven't figured out a "pure" Perl solution, but I drafted this snippet to check the HTTP response code of a page via curl:
#!/usr/bin/perl
use v5.30;
use warnings;
use diagnostics;
our $url = "";
my $username = "";
my $password = "";
=begin url_check
Exit if HTTP response code not 200.
=cut
sub url_check {
print "Checking URL status code...\n";
my $status_code =
(`curl --max-time 2.5 --user ${username}:${password} --output /dev/null --silent --head --write-out '%{http_code}\n' $url`);
if ($status_code != '200'){
{
print "URL not accessible. Exiting. \n";
exit;
}
} else {
print "URL accessible. Continuing... \n";
}
}
url_check
The verbose use of curl more or less documents itself. My example allows you to pass credentials to a page, but that can be removed as needed.
Related
I'm using curl to grab a list of subscribers. Once this has been downloaded the rest of my script will process the file.
How could I make the script wait until the file has been downloaded and error if it failed?
curl "http://mydomain/api/v1/subscribers" -u
'user:pass' | json_pp >>
new.json
Thanks
As noted in the comment, curl will not return until requests is completed (or failed). I suspect you are looking for a way to identify errors in the curl, which currently are getting lost. Consider the following:
If you just need error status, you can use bash pipefail option set -o pipefail. This will allow you to check for failure in curl
set -o pipefail
if curl ... | json_pp >> new.json ; then
# All good
else
# Something wrong.
fi
Also, you might want to save the "raw" response, before trying to pretty-print it. Either using a temporary file, or using tee
set -o pipefail
if curl ... | tee raw.json | json_pp >> new.json ; then
# All good
else
# Something wrong - look into raw.json
fi
Status_code_1=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" https://url_1.com)
Status_code_2=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" https://url_2.com)
if [ $Status_code_1 == "200" ]; then
echo "url_1 is running successfully"
else
echo "Error at url_1. Status code:" $Status_code_1
fi
if [ $Status_code_2 == "200" ]; then
echo "url_2 is running successfully"
else
echo "Error Error at url_2. Status code:" $Status_code_2
fi
The main script is scheduled and runs everyday and prints the success message everytime. If the status code is anything other than 200, $Status_code_1 or $Status_code_2, whichever is down prints the error code.
The code is working fine but I want to know how can it be made shorter. Can the curl command from first 2 lines be combined because they have same authorization and everything, it's just the url at the end is different. Also later if statements are pretty much same, only that I'm running them separately for different urls.
Is it possible to write first 2 lines in one line and same for both if statements? I know AND and OR can be used for if statements but say we have 5 urls and 2 are down, how it will print the name of those 2 urls in that case?
Is there a way to combine multiple curl statements and if statements in bash?
curl can retrieve multiple URLs in one run, but then you're left to parse its output into per-URL pieces. Since you want to report on the response for each URL, it is probably to your advantage to run curl separately for each.
But you can make the script less repetitive by writing a shell function that performs one curl run and reports on the results:
test_url() {
local status=$(curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" "$1")
if [ "$status" = 200 ]; then
echo "$2 is running successfully"
else
echo "Error at $2. Status code: $status"
fi
}
Then running it for a given URL is a one-liner:
test_url https://url_1.com url1_1
test_url https://url_2.com url1_2
Altogether, that's also about the same length as the original, but this is the break-even point on length. Each specific URL to test requires only one line, as opposed to six in your version. Also, if you want to change any of the details of the lookup or status reporting then you can do it in one place for all.
To avoid repetition, you can encapsulate code you need to reuse in a function.
httpresult () {
curl -o /dev/null -s -w "%{http_code}\n" -H "Authorization: token abc123" "$#"
}
check200 () {
local status=$(httpresult "$1")
if [ "$status" = "200" ]; then
echo "$0: ${2-$1} is running successfully" >&2
else
echo "$0: error at ${2-$1}. Status code: $status" >&2
fi
}
check200 "https://url_1.com/" "url_1"
check200 "https://url_2.com/" "url_2"
Splitting httpresult to a separate function isn't really necessary, but perhaps useful both as a demonstration of a more modular design, and as something you might reuse in other scripts too.
I changed the formatting of the status message to include the name of the script in the message, and to print diagnostics to standard error instead of standard output, in accordance with common best practices.
The check200 function accepts a URL and optionally a human-readable label to use in the diagnostic messages; if you omit it, they will simply contain the URL, too. It wasn't clear from your question whether the labels are important and useful.
Notice that the standard comparison operator in [ ... ] is =, not == (though Bash will accept both).
I have created a script that uses curl to retrieve a CSV file from a public site and copy it to my server.
#!/bin/sh
INCOMING="http://www.example.jp/csv/ranking.csv"
OUTPUT="/var/www/html/csv/ranking.csv"
curl -s $INCOMING > $OUTPUT
My boss ordered that if the CSV file retrieved from the site is 0 bytes, it should not be overwritten.
I heard that, then I wrote a script that looks like this.
#!/bin/sh
INCOMING="http://www.example.jp/csv/ranking.csv"
OUTPUT="/var/www/html/csv/ranking.csv"
INCOMING_LENGTH=$(curl -L -s -o /dev/null -w '%{size_download}\n' $INCOMING)
if [ "$INCOMING_LENGTH" -ne "0" ]; then
curl -s $INCOMING > $OUTPUT
fi
What I want to ask is,
how to skip csv output progress when curl response size equals zero by a single curl command.
In the script above, I am running the curl command twice,
so even if I check the file size in the first curl command and make sure it is not zero bytes,
if the file size is zero in the second curl command, it will generate a zero-byte CSV file.
This should rarely happen, but my boss is a perfectionist, so I want to eliminate the element of poking around.
Thank you!
I think the solution is the following...
Get the Content-Length size from your HEAD request using curl. This does not perform a download, only sends a request to get the Headers:
GET_HEADER_NODATA=`curl -s -L -I https://my-url/file.csv | grep Content-Length | awk '{print $2}' | tr -d $'\r' `
The above command extracts the size of the Content-Length and removes some characters at the end.
Now you compare if the csv file is empty you downloaded it
if [ $GET_HEADER_NODATA -eq 0 ]; then
echo "do here what you want because the file size is 0"
fi
All this will work if your web server sends the Content-Length in his header.
Good luck!
This question already has answers here:
How can I loop over the output of a shell command?
(4 answers)
Closed 1 year ago.
I need to write a bash script, which will get the subdomains from "subdomains.txt", which are separated by line breaks, and show me their HTTP response code. I want it to look this way:
cat subdomains.txt | ./httpResponse
The problem is, that I dont know, how to make the bash script get the subdomain names. Obviously, I need to use a loop, something like this:
for subdomains in list
do
echo curl --write-out "%{http_code}\n" --silent --output /dev/null "subdomain"
done
But how can I populate the list in loop, using the cat and pipeline? Thanks a lot in advance!
It would help if you provided actual input and expect output, so I'll have to guess that the URL you are passing to curl is in someway derived from the input in the text file. If the exact URL is in the input stream, perhaps you merely want to replace $URL with $subdomain. In any case, to read the input stream, you can simply do:
while read subdomain; do
URL=$( # : derive full URL from $subdomain )
curl --write-out "%{http_code}\n" --silent --output /dev/null "$URL"
done
Playing around with your example let me decide for wget and here is another way...
#cat subdomains.txt
osmc
microknoppix
#cat httpResponse
for subdomain in "$(cat subdomains.txt)"
do
wget -nv -O/dev/null --spider ${subdomain}
done
#sh httpResponse 2>response.txt && cat response.txt
2021-04-05 13:49:25 URL: http://osmc/ 200 OK
2021-04-05 13:49:25 URL: http://microknoppix/ 200 OK
Since wget puts out on stderr 2>response.txt leads to right output.
The && is like the then and is executed only if httpResponse succeeded.
You can do this without cat and a pipeline. Use netcat and parse the first line with sed:
while read -r subdomain; do
echo -n "$subdomain: "
printf "GET / HTTP/1.1\nHost: %s\n\n" "$subdomain" | \
nc "$subdomain" 80 | sed -n 's/[^ ]* //p;q'
done < 'subdomains.txt'
subdomains.txt:
www.stackoverflow.com
www.google.com
output:
www.stackoverflow.com: 301 Moved Permanently
www.google.com: 200 OK
I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me