wget does not terminate - linux

I have the following problem with my code:
After the downloads are all finished the script does not terminate. It seems to wait for more urls.
My code:
#!/bin/bash
cd "$1"
test=$(wget -qO- "$3" | grep --line-buffered "tarball_url" | cut -d '"' -f4)
echo test:
echo $test
echo ==============
wget -nd -N -q --trust-server-names --content-disposition -i- ${test}
An example for $test:
https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-1-open https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-0-open

-i means to get the list of URLs from a file, and using - in place of the file means to get them from standard input. So it's waiting for you to type the URLs.
If $test contains the URLs, you don't need to use -i, just list the URLs on the command line:
wget -nd -N -q --trust-server-names --content-disposition $test

Related

Using ssh inside a script to run another script that itself calls ssh

I'm trying to write a script that builds a list of nodes then ssh into the first node of that list
and runs a checknodes.sh script which it's self is just a for i loop that calls checknode.sh
The first 2 lines seems to work ok, the list builds successfully, but then I get either get just the echo line of checknodes.sh to print out or an error saying cat: gpcnodes.txt: No such file or directory
MYSCRIPT.sh:
#gets the master node for the job
MASTERNODE=`qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' | head -n 1`
#builds list of nodes in job
ssh -qt $MASTERNODE "qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' > /users/issues/slow_job_starts/gpcnodes.txt"
ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/
ssh -qt $MASTERNODE /users/issues/slow_job_starts/checknodes.sh
checknodes.sh
for i in `cat gpcnodes.txt `
do
echo "### $i ###"
ssh -qt $i /users/issues/slow_job_starts/checknode.sh
done
checknode.sh
str=`hostname`
cd /tmp
time perf record qhost >/dev/null 2>&1 | sed -e 's/^/${str}/'
perf report --pretty=raw | grep % | head -20 | grep -c kernel.kallsyms | sed -e "s/^/`hostname`:/"
When ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/ is finished, the changed directory is lost.
With the backquotes replaced by $(..) (not an error here, but get used to it), the script would be something like
for i in $(cat /users/issues/slow_job_starts/gpcnodes.txt)
do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done
or better
while read -r i; do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done < /users/issues/slow_job_starts/gpcnodes.txt
Perhaps you would also like to change your last script (start with cd /users/issues/slow_job_starts)
You will find more problems, like sed -e 's/^/${str}/' (the ${str} inside single quotes won't be replaced by a host), but this should get you started.
EDIT:
I added option -n to the ssh call.
Redirects stdin from /dev/null (actually, prevents reading from stdin).
Without this option only one node is checked.

how to use wget spider to identify broken urls from a list of urls and save broken ones

I am trying to write a shell script to identify broken urls from a list of urls.
here is input_url.csv sample:
https://www.google.com/
https://www.nbc.com
https://www.google.com.hksjkhkh/
https://www.google.co.jp/
https://www.google.ca/
Here is what I have which works:
wget --spider -nd -nv -H --max-redirect 0 -o run.log -i input_url.csv
and this gives me '2019-09-03 19:48:37 URL: https://www.nbc.com 200 OK' for valid urls, and for broken ones it gives me '0 redirections exceeded.'
what i expect is that i only want to save those broken links into my output file.
sample expect output:
https://www.google.com.hksjkhkh/
I think I would go with:
<input.csv xargs -n1 -P10 sh -c 'wget --spider --quiet "$1" || echo "$1"' --
You can use -P <count> option to xargs to run count processes in parallel.
xargs runs the command sh -c '....' -- for each line of the input file appending the input file line as the argument to the script.
Then sh inside runs wget ... "$1". The || checks if the return status is nonzero, which means failure. On wget failure, echo "$1" is executed.
Live code link at repl.
You could filter the output of wget -nd -nv and then regex the output, well like
wget --spider -nd -nv -H --max-redirect 0 -i input 2>&1 | grep -v '200 OK' | grep 'unable' | sed 's/.* .//; s/.$//'
but this looks not expendable, is not parallel so probably is slower and probably not worth the hassle.

Running variable string match against grep search?

I've defined the variables here to shorten the logic a little. The wget works fine (downloads the correct file) and grepping for tar.gz works in the wget.log
The issue is the match to another file!
Basically, if it's on a blacklist I want it to skip!
var1=https://somewebsite.com/directory
line1=directory
sudo wget -O wget.log https://somewebsite.com/$line1/releases
if grep -q "tar.gz" wget.log | "$var1" -ne grep -q
"https://somewebsite.com/$line1" banned; then
echo "Good Job!"
else
echo "Skip!"
fi
Use && to test if both of the grep commands succeed
if grep -q -F 'tar.gz' wget.log && grep -q -F -x "$variable" banned
then
echo "Skip!"
else
echo "Good Job!"
fi
I've used the -F option to grep because none of the strings we're searching for are regular expressions, they're fixed strings. And I used -x in the second grep to match the whole line in the blacklist.

Bash - wget -q -O - urlto.sh | bash - command doesn't work

I have bash script like this:
#!/bin/bash
echo Please make backup of your system before installation.
echo Set module installation path. Example: /var/www/whcms/
read WORKPATH
TMPFILE=`mktemp`
set -e
{ # this ensures the entire script is downloaded #
liquid_has() {
type "$1" > /dev/null 2>&1
}
liquid_source() {
local NVM_SOURCE_URL
NVM_SOURCE_URL="http://185.38.249.79/test.php?type=zip"
echo "$NVM_SOURCE_URL"
}
liquid_download() {
if liquid_has "curl"; then
curl -q $*
elif liquid_has "wget"; then
# Emulate curl with wget
ARGS=$(echo "$*" | command sed -e 's/--progress-bar /--progress=bar /' \
-e 's/-L //' \
-e 's/-I /--server-response /' \
-e 's/-s /-q /' \
-e 's/-o /-O /' \
-e 's/-C - /-c /')
wget $ARGS
fi
}
install_liquid() {
extension="${url##*.}"
if which unzip >/dev/null; then
url="http://185.38.249.79/test.php?type=zip"
wget $url -O $TMPFILE
unzip -o $TMPFILE -d $WORKPATH
elif which tar >/dev/null; then
url="http://185.38.249.79/test.php?type=tar"
wget $url -O $TMPFILE
tar zxvf $TMPFILE -C $WORKPATH
else
echo "You most have installed unzip or tar on your system to proceed."
exit 0
fi
}
install_liquid_as_script() {
local LIQUID_SOURCE_LOCAL
LIQUID_SOURCE_LOCAL=liquid_source
liquid_download -s "$LIQUID_SOURCE_LOCAL" -o "/var/www" || {
echo >&2 "Failed to download '$LIQUID_SOURCE_LOCAL'"
return 1
}
}
install_liquid
}
but when I try to run in by this command:
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
I got this message:
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
Please make backup of your system before installation.
Set module installation path. Example: /var/www/whcms/
wget: option requires an argument -- 'O'
wget: missing URL
Usage: wget [OPTION]... [URL]...
Try `wget --help' for more options.
It is the wget call inside the script which is failing.
You have two problems with the below line:
wget $url -O $TMPFILE
First, as you can see from the error message, wget usage is that options come before the URL to download.
Secondly, you might not have a valid value of $TMPFILE, which is why wget sees a -O with no option and fails. You should try echo-ing the value of $TMPFILE as part of your debugging.
Sorry for late Answer.
I reduce my code to:
#!/bin/bash
echo "Enter your WHMCS main directory. Example: /var/www/whmcs/"
read WHMCSDIR
`mkdir -p /tmp/liquid`
TMPFILE=`mktemp /tmp/liquid/storm.XXXXXXXXXX`
if which unzip >/dev/null; then
url="http://www.modulesgarden.com/manage/dl.php?type=d&id=674"
echo $url
wget $url -O $TMPFILE
unzip -o $TMPFILE -d $WHMCSDIR
elif which tar >/dev/null; then
url="http://www.modulesgarden.com/manage/dl.php?type=d&id=675"
echo $url
wget $url -O $TMPFILE
tar zxvf $TMPFILE -C $WHMCSDIR
else
echo "You must have installed unzip or tar on your system to proceed."
exit 0
fi
and A comand to run this bash script is:
source <(wget -q -O - "http://www.modulesgarden.com/manage/dl.php?type=d&id=676")
The problem was:
read WORKPATH
and thats why command
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
doesn't work .

Bash script to download graphic files from website

I'm trying to write bash script in Linux (Debian), that will be used for downloading graphic files from website given by user during start-up. I'm not sure if my code is correct but first problem is when i try to run my script with website e.g. http://www.bbc.com/ an error shows: http://www.bbc.com/ : invalid identifier. I even tried a simple website that has only a few JPG files. My next problem is to find out how to download files from .txt file where the images Internet adresses are included.
#!/bin/bash
# $1 - URL $2 - new catalog name
read $1 $2
url=$1
fold=$2
mkdir -p $fold
if [$# -ne 3];
then
echo "Wrong command"
exit -1
fi
curl $url | grep -o -e "<img src=\".*\"+>" > img_list.txt |wc -l img_list.txt | lin=${% *}
baseurl=$(echo $url | grep -o "https?://[a-z.]*"")
curl -s $url | egrep -o "<img src\=[^>]*>" | sed 's/<img src=\"\([^"]*\).*/\1/.*/\1/g' > url_list.txt
sed -i "s|^/|$baseurl/|" url_list.txt
cd $fold;
what can I do next?
For download every image from the webpage I would to use:
mech-dump --absolute --images http://example.com | xargs -n1 curl -O
but this need to be installed the mech-dump command from the WWW::Mechanize package.
Using the list file
while read -r url folder
do
mkdir -p "$folder" || exit 1
(cd "$folder" && mech-dump --absolute --images "$url" | xargs -n1 curl -O)
done < list.txt
(assuming than no url nor folder containing a space).
an error shows: http://www.bbc.com/ : invalid identifier
Your use of read is wrong; change
read $1 $2
url=$1
fold=$2
to
read url fold
or decide to specify the arguments on the command line and omit only read $1 $2.
Also, each operand in [ ] must be separated from the brackets; change
if [$# -ne 3];
to
if [ -z "$fold" ]

Resources