Redirect cat output to bash script [duplicate] - linux

This question already has answers here:
How can I loop over the output of a shell command?
(4 answers)
Closed 1 year ago.
I need to write a bash script, which will get the subdomains from "subdomains.txt", which are separated by line breaks, and show me their HTTP response code. I want it to look this way:
cat subdomains.txt | ./httpResponse
The problem is, that I dont know, how to make the bash script get the subdomain names. Obviously, I need to use a loop, something like this:
for subdomains in list
do
echo curl --write-out "%{http_code}\n" --silent --output /dev/null "subdomain"
done
But how can I populate the list in loop, using the cat and pipeline? Thanks a lot in advance!

It would help if you provided actual input and expect output, so I'll have to guess that the URL you are passing to curl is in someway derived from the input in the text file. If the exact URL is in the input stream, perhaps you merely want to replace $URL with $subdomain. In any case, to read the input stream, you can simply do:
while read subdomain; do
URL=$( # : derive full URL from $subdomain )
curl --write-out "%{http_code}\n" --silent --output /dev/null "$URL"
done

Playing around with your example let me decide for wget and here is another way...
#cat subdomains.txt
osmc
microknoppix
#cat httpResponse
for subdomain in "$(cat subdomains.txt)"
do
wget -nv -O/dev/null --spider ${subdomain}
done
#sh httpResponse 2>response.txt && cat response.txt
2021-04-05 13:49:25 URL: http://osmc/ 200 OK
2021-04-05 13:49:25 URL: http://microknoppix/ 200 OK
Since wget puts out on stderr 2>response.txt leads to right output.
The && is like the then and is executed only if httpResponse succeeded.

You can do this without cat and a pipeline. Use netcat and parse the first line with sed:
while read -r subdomain; do
echo -n "$subdomain: "
printf "GET / HTTP/1.1\nHost: %s\n\n" "$subdomain" | \
nc "$subdomain" 80 | sed -n 's/[^ ]* //p;q'
done < 'subdomains.txt'
subdomains.txt:
www.stackoverflow.com
www.google.com
output:
www.stackoverflow.com: 301 Moved Permanently
www.google.com: 200 OK

Related

How to skip progress when curl response size equals zero byte in shell script?

I have created a script that uses curl to retrieve a CSV file from a public site and copy it to my server.
#!/bin/sh
INCOMING="http://www.example.jp/csv/ranking.csv"
OUTPUT="/var/www/html/csv/ranking.csv"
curl -s $INCOMING > $OUTPUT
My boss ordered that if the CSV file retrieved from the site is 0 bytes, it should not be overwritten.
I heard that, then I wrote a script that looks like this.
#!/bin/sh
INCOMING="http://www.example.jp/csv/ranking.csv"
OUTPUT="/var/www/html/csv/ranking.csv"
INCOMING_LENGTH=$(curl -L -s -o /dev/null -w '%{size_download}\n' $INCOMING)
if [ "$INCOMING_LENGTH" -ne "0" ]; then
curl -s $INCOMING > $OUTPUT
fi
What I want to ask is,
how to skip csv output progress when curl response size equals zero by a single curl command.
In the script above, I am running the curl command twice,
so even if I check the file size in the first curl command and make sure it is not zero bytes,
if the file size is zero in the second curl command, it will generate a zero-byte CSV file.
This should rarely happen, but my boss is a perfectionist, so I want to eliminate the element of poking around.
Thank you!
I think the solution is the following...
Get the Content-Length size from your HEAD request using curl. This does not perform a download, only sends a request to get the Headers:
GET_HEADER_NODATA=`curl -s -L -I https://my-url/file.csv | grep Content-Length | awk '{print $2}' | tr -d $'\r' `
The above command extracts the size of the Content-Length and removes some characters at the end.
Now you compare if the csv file is empty you downloaded it
if [ $GET_HEADER_NODATA -eq 0 ]; then
echo "do here what you want because the file size is 0"
fi
All this will work if your web server sends the Content-Length in his header.
Good luck!

how to use wget spider to identify broken urls from a list of urls and save broken ones

I am trying to write a shell script to identify broken urls from a list of urls.
here is input_url.csv sample:
https://www.google.com/
https://www.nbc.com
https://www.google.com.hksjkhkh/
https://www.google.co.jp/
https://www.google.ca/
Here is what I have which works:
wget --spider -nd -nv -H --max-redirect 0 -o run.log -i input_url.csv
and this gives me '2019-09-03 19:48:37 URL: https://www.nbc.com 200 OK' for valid urls, and for broken ones it gives me '0 redirections exceeded.'
what i expect is that i only want to save those broken links into my output file.
sample expect output:
https://www.google.com.hksjkhkh/
I think I would go with:
<input.csv xargs -n1 -P10 sh -c 'wget --spider --quiet "$1" || echo "$1"' --
You can use -P <count> option to xargs to run count processes in parallel.
xargs runs the command sh -c '....' -- for each line of the input file appending the input file line as the argument to the script.
Then sh inside runs wget ... "$1". The || checks if the return status is nonzero, which means failure. On wget failure, echo "$1" is executed.
Live code link at repl.
You could filter the output of wget -nd -nv and then regex the output, well like
wget --spider -nd -nv -H --max-redirect 0 -i input 2>&1 | grep -v '200 OK' | grep 'unable' | sed 's/.* .//; s/.$//'
but this looks not expendable, is not parallel so probably is slower and probably not worth the hassle.

Perl/ curl How to get Status Code and Response Body

I am trying to write a simple perl script that calls and API and if the status code is 2xx the do something with the response. While if it is 4xx or 5xx then do something else.
The issue I am encountering is I am able to either get the response code (using a custom write-out formatter and pass the output somewhere else) or I can get the whole response and the headers.
my $curlResponseCode = `curl -s -o /dev/null -w "%{http_code}" ....`;
Will give me the status code only.
my $curlResponse = `curl -si ...`;
Will give me the entire header plus the response.
My question is how can I obtain the response body from the server and the http status code in a neat format that allows me to separate them into two separate variables.
Unfortunately I cannot use LWP or any other separate libraries.
Thanks in advance.
-Spencer
I came up with this solution:
URL="http://google.com"
# store the whole response with the status at the and
HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST $URL)
# extract the body
HTTP_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
# extract the status
HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
# print the body
echo "$HTTP_BODY"
# example using the status
if [ ! $HTTP_STATUS -eq 200 ]; then
echo "Error [HTTP status: $HTTP_STATUS]"
exit 1
fi
...Will give me the entire header plus the response.
...in a neat format that allows me to separate them into two separate variables.
Since header and body are simply delimited by an empty line you can split the content on this line:
my ($head,$body) = split( m{\r?\n\r?\n}, `curl -si http://example.com `,2 );
And to get the status code from the header
my ($code) = $head =~m{\A\S+ (\d+)};
You might also combine this into a single expression with a regexp, although this might be harder to understand:
my ($code,$body) = `curl -si http://example.com`
=~m{\A\S+ (\d+) .*?\r?\n\r?\n(.*)}s;
Pretty fundamentally - you're capturing output from a system command. It is far and away better to do this by using the library built for it - LWP.
Failing that though - curl -v will produce status code and content, and you'll have to parse it.
You might also find this thread on SuperUser useful:
https://superuser.com/questions/272265/getting-curl-to-output-http-status-code
Specifically
#creates a new file descriptor 3 that redirects to 1 (STDOUT)
exec 3>&1
# Run curl in a separate command, capturing output of -w "%{http_code}" into HTTP_STATUS
# and sending the content to this command's STDOUT with -o >(cat >&3)
HTTP_STATUS=$(curl -w "%{http_code}" -o >(cat >&3) 'http://example.com')
(That isn't perl, but you can probably use something similar. At very least, running the -w and capturing your content to a temp file.
Haven't figured out a "pure" Perl solution, but I drafted this snippet to check the HTTP response code of a page via curl:
#!/usr/bin/perl
use v5.30;
use warnings;
use diagnostics;
our $url = "";
my $username = "";
my $password = "";
=begin url_check
Exit if HTTP response code not 200.
=cut
sub url_check {
print "Checking URL status code...\n";
my $status_code =
(`curl --max-time 2.5 --user ${username}:${password} --output /dev/null --silent --head --write-out '%{http_code}\n' $url`);
if ($status_code != '200'){
{
print "URL not accessible. Exiting. \n";
exit;
}
} else {
print "URL accessible. Continuing... \n";
}
}
url_check
The verbose use of curl more or less documents itself. My example allows you to pass credentials to a page, but that can be removed as needed.

How to check status of URLs from text file using bash shell script

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me

How do I pipe or redirect the output of curl -v?

For some reason the output always gets printed to the terminal, regardless of whether I redirect it via 2> or > or |. Is there a way to get around this? Why is this happening?
add the -s (silent) option to remove the progress meter, then redirect stderr to stdout to get verbose output on the same fd as the response body
curl -vs google.com 2>&1 | less
Your URL probably has ampersands in it. I had this problem, too, and I realized that my URL was full of ampersands (from CGI variables being passed) and so everything was getting sent to background in a weird way and thus not redirecting properly. If you put quotes around the URL it will fix it.
The answer above didn't work for me, what did eventually was this syntax:
curl https://${URL} &> /dev/stdout | tee -a ${LOG}
tee puts the output on the screen, but also appends it to my log.
If you need the output in a file you can use a redirect:
curl https://vi.stackexchange.com/ -vs >curl-output.txt 2>&1
Please be sure not to flip the >curl-output.txt and 2>&1, which will not work due to bash's redirection behavior.
Just my 2 cents.
The below command should do the trick, as answered earlier
curl -vs google.com 2>&1
However if need to get the output to a file,
curl -vs google.com > out.txt 2>&1
should work.
I found the same thing: curl by itself would print to STDOUT, but could not be piped into another program.
At first, I thought I had solved it by using xargs to echo the output first:
curl -s ... <url> | xargs -0 echo | ...
But then, as pointed out in the comments, it also works without the xargs part, so -s (silent mode) is the key to preventing extraneous progress output to STDOUT:
curl -s ... <url> | perl -ne 'print $1 if /<sometag>([^<]+)/'
The above example grabs the simple <sometag> content (containing no embedded tags) from the XML output of the curl statement.
The following worked for me:
Put your curl statement in a script named abc.sh
Now run:
sh abc.sh 1>stdout_output 2>stderr_output
You will get your curl's results in stdout_output and the progress info in stderr_output.
This simple example shows how to capture curl output, and use it in a bash script
test.sh
function main
{
\curl -vs 'http://google.com' 2>&1
# note: add -o /tmp/ignore.png if you want to ignore binary output, by saving it to a file.
}
# capture output of curl to a variable
OUT=$(main)
# search output for something using grep.
echo
echo "$OUT" | grep 302
echo
echo "$OUT" | grep title
Solution = curl -vs google.com 2>&1 | less
BUT, if you want to redirect the output to a file and the output is still on the screen, then the URL response contains a newline char \n which messed up your shell.
To avoit this put everything in a variable:
result=$(curl -v . . . . )

Resources