Bash | curl | curls 2 URL's then stops - linux

I am trying to write a simple bash script that will use a list from a text document and curl each URL that is on the list in order to see what the contents of each URL is. It allows me to cURL 2 sites and creates the text documents for the rest however it only downloads the first 2. I have already manage to write the script that pulls there IP's and places them in a seperate file using the grep command. At first i tried
#!/bin/bash
for var in `cat host.txt`; do
curl -s $var >> /tmp/ping/html/$var.html
done
I have tried with and without the silent switch. I then tried the following:
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
wait
done
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would try and do them all at the same time again stopping after 2
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
done
wait
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would do the same, I am new to bash scripting and only know some of the basics, any help would be appreciated

Start with the simple: verify that you are in fact iterating over the entire list:
# This is the recommended way to iterate over the file. See
# http://mywiki.wooledge.org/BashFAQ/001
while read -r var; do
echo "$var"
done < hosts.txt
Then add in the call to curl, checking its exit status
while read -r var; do
echo "$var"
curl "$var" >> /tmp/ping/html/$var.html || echo "curl failed: $?"
done < hosts.txt

You pipe into $var, which could result in a wrong filename, because of the two slashes in the URL. Additionally i would quote the URL. For Example it works with the basename of the URL.
#!/bin/bash
for var in `cat host.txt`; do
name=$(basename $var)
curl -v -s "$var" -o "/tmp/ping/html/$name.html"
done
You may also want to skip blank lines and Comments (#)
#!/bin/bash
file="host.txt"
curl="curl"
while read -r line
do
[[ $line = \#* ]] || [[ -z "${line}" ]] && continue
filename=$(basename $line)
$curl -s "$line" >> "/tmp/ping/html/$filename.html"
done < "$file"

Related

I have to read config file and after reading it will run scp command to fetch all details from the available servers in config

I have a config file that has details like
#pem_file username ip destination
./test.pem ec2-user 00.00.00.11 /Desktop/new/
./test1.pem ec2-user 00.00.00.22 /Desktop/new/
Now I need to know how can I fix the below script to get all the details using scp
while read "$(cat $conf | awk '{split($0,array,"\n")} END{print array[]}')"; do
scp -i array[1] array[2]#array[3]:/home/ubuntu/documents/xyz.xml array[4]
done
please help me.
Build your while read like this:
#!/bin/bash
while read -r file user ip destination
do
echo $file
echo $user
echo $ip
echo $destination
echo ""
done < <(grep -Ev "^#" "$conffile")
Use these variables to build your scp command.
The grep is to remove commented out lines.
If you prefer using an array, you can do this:
#!/bin/bash
while read -a line
do
echo ${line[0]}
echo ${line[1]}
echo ${line[2]}
echo ${line[3]}
echo ""
done < <(grep -Ev "^#" "$conffile")
See https://mywiki.wooledge.org/BashFAQ/001 for looping on files and commands output using while.

Multiple process curl command for urls to output to individual files

I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process.
My file consists of x number of URLs:
https://someurl.com
https://someotherurl.com
My issue comes when attempting to output the results to separate files named after the URLs I curl.
The bash command I have is:
xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt
When I run this I get 'Failed to create file https://someotherurl.com'
You cannot create a file with / in the filename. You could do it this way:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename="${line#https://}"
echo "FILENAME: $filename"
# YOUR CURL COMMAND HERE, USING $filename
fi
done < url.txt
it ignores empty lines
variable substitution is used to remove the https:// part of each URL
this will allow you to create the file
Note: if your URLs containt sub-directories, they must be removed as well.
Ex: you want to do https://www.exemple.com/some/sub/dir
The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the / with _ using tr.
The script would become:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename=$(echo "$line" | tr '/' '_')
filename2=${filename#https:__}
echo "FILENAME: $filename2"
# YOUR CURL COMMAND HERE, USING $filename2
fi
done < url.txt
Because your question is ambiguous, I would assume:
You have a file urls.txt that contains URLs separated by LF.
You want to download all URLs by curl and use each URL as its filename.
Unfortunately, that's not possible because URL contains invalid characters like slash /. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.
After applying this requirement, your script would become like:
seq 100 | xargs -I# echo 'https://example.com?#' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt

Unable to array values outside of function in shell script [duplicate]

Please explain to me why the very last echo statement is blank? I expect that XCODE is incremented in the while loop to a value of 1:
#!/bin/bash
OUTPUT="name1 ip ip status" # normally output of another command with multi line output
if [ -z "$OUTPUT" ]
then
echo "Status WARN: No messages from SMcli"
exit $STATE_WARNING
else
echo "$OUTPUT"|while read NAME IP1 IP2 STATUS
do
if [ "$STATUS" != "Optimal" ]
then
echo "CRIT: $NAME - $STATUS"
echo $((++XCODE))
else
echo "OK: $NAME - $STATUS"
fi
done
fi
echo $XCODE
I've tried using the following statement instead of the ++XCODE method
XCODE=`expr $XCODE + 1`
and it too won't print outside of the while statement. I think I'm missing something about variable scope here, but the ol' man page isn't showing it to me.
Because you're piping into the while loop, a sub-shell is created to run the while loop.
Now this child process has its own copy of the environment and can't pass any
variables back to its parent (as in any unix process).
Therefore you'll need to restructure so that you're not piping into the loop.
Alternatively you could run in a function, for example, and echo the value you
want returned from the sub-process.
http://tldp.org/LDP/abs/html/subshells.html#SUBSHELL
The problem is that processes put together with a pipe are executed in subshells (and therefore have their own environment). Whatever happens within the while does not affect anything outside of the pipe.
Your specific example can be solved by rewriting the pipe to
while ... do ... done <<< "$OUTPUT"
or perhaps
while ... do ... done < <(echo "$OUTPUT")
This should work as well (because echo and while are in same subshell):
#!/bin/bash
cat /tmp/randomFile | (while read line
do
LINE="$LINE $line"
done && echo $LINE )
One more option:
#!/bin/bash
cat /some/file | while read line
do
var="abc"
echo $var | xsel -i -p # redirect stdin to the X primary selection
done
var=$(xsel -o -p) # redirect back to stdout
echo $var
EDIT:
Here, xsel is a requirement (install it).
Alternatively, you can use xclip:
xclip -i -selection clipboard
instead of
xsel -i -p
I got around this when I was making my own little du:
ls -l | sed '/total/d ; s/ */\t/g' | cut -f 5 |
( SUM=0; while read SIZE; do SUM=$(($SUM+$SIZE)); done; echo "$(($SUM/1024/1024/1024))GB" )
The point is that I make a subshell with ( ) containing my SUM variable and the while, but I pipe into the whole ( ) instead of into the while itself, which avoids the gotcha.
#!/bin/bash
OUTPUT="name1 ip ip status"
+export XCODE=0;
if [ -z "$OUTPUT" ]
----
echo "CRIT: $NAME - $STATUS"
- echo $((++XCODE))
+ export XCODE=$(( $XCODE + 1 ))
else
echo $XCODE
see if those changes help
Another option is to output the results into a file from the subshell and then read it in the parent shell. something like
#!/bin/bash
EXPORTFILE=/tmp/exportfile${RANDOM}
cat /tmp/randomFile | while read line
do
LINE="$LINE $line"
echo $LINE > $EXPORTFILE
done
LINE=$(cat $EXPORTFILE)

How to check status of URLs from text file using bash shell script

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.
My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).
I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using
$chmod +x checkurls.sh
The contents of checkurls.sh is given below:
#!/bin/bash
while read url
do
urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
echo "$url $urlstatus" >> urlstatus.txt
done < $1
Finally, I executed it from command line using the following -
$./checkurls.sh urls.txt
Voila! It works.
#!/bin/bash
while read -ru 4 LINE; do
read -r REP < <(exec curl -IsS "$LINE" 2>&1)
echo "$LINE: $REP"
done 4< "$1"
Usage:
bash script.sh urls-list.txt
Sample:
http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html
Output:
http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK
For everything, read the Bash Manual. See man curl, help, man bash as well.
What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:
#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"
And now you check your list using:
cat URL.txt | xargs -P 4 -L1 ./chkurl.sh
This could finish the job up to 4 times faster.
Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.
What it does:
check url using curl and return HTTP status code
send email notifications when url returns other code than 200
create a temporary lock file for failed urls (file naming could be improved)
send email notification when url becoms available again
remove lock file once url becomes available to avoid further notifications
log events to a file and handle increasing log file size (AKA log
rotation, uncomment echo if code 200 logging required)
Code:
#!/bin/sh
EMAIL=" your#email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0
while read url
do
((c++))
LOCK_FILE="checkurls$c.lock"
urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
if [ "$urlstatus" = "200" ]
then
#echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
[ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
if [ -e $LOCK_FILE ]
then
#no action - awaiting URL to be fixed
:
else
/bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
/bin/touch $LOCK_FILE
fi
fi
done < $1
# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
/bin/rm -f -- $LOG_FILE > /dev/null
echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
#do nothing
:
fi
Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : # / ? & would have to be handled for operating system.
I recently released deadlink, a command-line tool for finding broken links in files. Install with
pip install deadlink
and use as
deadlink check /path/to/file/or/directory
or
deadlink replace-redirects /path/to/file/or/directory
The latter will replace permanent redirects (301) in the specified files.
Example output:
if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid
#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
if ping -c 1 $line &> /dev/null
then
echo "$line valid" >> $OUTPUT
else
echo "$line not valid " >> $OUTPUT
fi
done < $INPUT
exit
ping options :
-c count
Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.
you can use this option as well to limit waiting time
-W timeout
Time to wait for a response, in seconds. The option affects only timeout in absense
of any responses, otherwise ping waits for two RTTs.
curl -s -I --http2 http://$1 >> fullscan_curl.txt | cut -d: -f1 fullscan_curl.txt | cat fullscan_curl.txt | grep HTTP >> fullscan_httpstatus.txt
its work me

Shell script with Wget - If else nested inside for loop

I'm trying to make a shell script that reads a list of download URLs to find if they're still active. I'm not sure what's wrong with my current script, (I'm new to this) and any pointers would be a huge help!
user#pc:~/test# cat sites.list
http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite
Script:
#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done
As is, the script outputs everything to notok.txt - (the first google site should go to ok.txt).
But if I run:
wget --spider http://www.google.com/images/srpr/logo3w.png -b
And then do:
grep "200 OK" wget-log
It greps the string without any problems. What noob mistake did I make with the syntax? Thanks m8s!
The -b option is sending wget to the background, so you're doing the grep before wget has finished.
Try without the -b option:
if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then
There are a few issues with what you're doing.
Your for i in will have problems with lines that contain whitespace. Better to use while read to read individual lines of a file.
You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
wget isn't necessarily the best tool for this. I'd advise using curl instead.
So here's a better way to handle this...
#!/bin/bash
sitelist="sites.list"
curl="/usr/bin/curl"
# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
echo "ERROR: Sitelist is missing." >&2
exit 1
elif [[ ! -s "$sitelist" ]]; then
echo "ERROR: Sitelist is empty." >&2
exit 1
elif [[ ! -x "$curl" ]]; then
echo "ERROR: I can't work under these conditions." >&2
exit 1
fi
# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar
while read url; do
# remove comments
url=${url%%#*}
# skip empty lines
if [[ -z "$url" ]]; then
continue
fi
# Handle just ftp, http and https.
# We could do full URL pattern matching, but meh.
case "$url" in
#(f|ht)tp?(s)://*)
# Get just the numeric HTTP response code
http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
case "$http_code" in
200|226)
# You'll get a 226 in ${http_code} from a valid FTP URL.
# If all you really care about is that the response is in the 200's,
# you could match against "2??" instead.
echo "$url" >> ok.txt
;;
*)
# You might want different handling for redirects (301/302).
echo "$url" >> notok.txt
;;
esac
;;
*)
# If we're here, we didn't get a URL we could read.
echo "WARNING: invalid url: $url" >&2
;;
esac
done < "$sitelist"
This is untested. For educational purposes only. May contain nuts.

Resources