I have the following problem with my code:
After the downloads are all finished the script does not terminate. It seems to wait for more urls.
My code:
#!/bin/bash
cd "$1"
test=$(wget -qO- "$3" | grep --line-buffered "tarball_url" | cut -d '"' -f4)
echo test:
echo $test
echo ==============
wget -nd -N -q --trust-server-names --content-disposition -i- ${test}
An example for $test:
https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-1-open https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-0-open
-i means to get the list of URLs from a file, and using - in place of the file means to get them from standard input. So it's waiting for you to type the URLs.
If $test contains the URLs, you don't need to use -i, just list the URLs on the command line:
wget -nd -N -q --trust-server-names --content-disposition $test
Related
I'm trying to write a script that builds a list of nodes then ssh into the first node of that list
and runs a checknodes.sh script which it's self is just a for i loop that calls checknode.sh
The first 2 lines seems to work ok, the list builds successfully, but then I get either get just the echo line of checknodes.sh to print out or an error saying cat: gpcnodes.txt: No such file or directory
MYSCRIPT.sh:
#gets the master node for the job
MASTERNODE=`qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' | head -n 1`
#builds list of nodes in job
ssh -qt $MASTERNODE "qstat -t -u \* | grep $1 | awk '{print$8}' | cut -d'#' -f 2 | cut -d'.' -f 1 | sed -e 's/$/.com/' > /users/issues/slow_job_starts/gpcnodes.txt"
ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/
ssh -qt $MASTERNODE /users/issues/slow_job_starts/checknodes.sh
checknodes.sh
for i in `cat gpcnodes.txt `
do
echo "### $i ###"
ssh -qt $i /users/issues/slow_job_starts/checknode.sh
done
checknode.sh
str=`hostname`
cd /tmp
time perf record qhost >/dev/null 2>&1 | sed -e 's/^/${str}/'
perf report --pretty=raw | grep % | head -20 | grep -c kernel.kallsyms | sed -e "s/^/`hostname`:/"
When ssh -qt $MASTERNODE cd /users/issues/slow_job_starts/ is finished, the changed directory is lost.
With the backquotes replaced by $(..) (not an error here, but get used to it), the script would be something like
for i in $(cat /users/issues/slow_job_starts/gpcnodes.txt)
do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done
or better
while read -r i; do
echo "### $i ###"
ssh -nqt $i /users/issues/slow_job_starts/checknode.sh
done < /users/issues/slow_job_starts/gpcnodes.txt
Perhaps you would also like to change your last script (start with cd /users/issues/slow_job_starts)
You will find more problems, like sed -e 's/^/${str}/' (the ${str} inside single quotes won't be replaced by a host), but this should get you started.
EDIT:
I added option -n to the ssh call.
Redirects stdin from /dev/null (actually, prevents reading from stdin).
Without this option only one node is checked.
I am trying to write a shell script to identify broken urls from a list of urls.
here is input_url.csv sample:
https://www.google.com/
https://www.nbc.com
https://www.google.com.hksjkhkh/
https://www.google.co.jp/
https://www.google.ca/
Here is what I have which works:
wget --spider -nd -nv -H --max-redirect 0 -o run.log -i input_url.csv
and this gives me '2019-09-03 19:48:37 URL: https://www.nbc.com 200 OK' for valid urls, and for broken ones it gives me '0 redirections exceeded.'
what i expect is that i only want to save those broken links into my output file.
sample expect output:
https://www.google.com.hksjkhkh/
I think I would go with:
<input.csv xargs -n1 -P10 sh -c 'wget --spider --quiet "$1" || echo "$1"' --
You can use -P <count> option to xargs to run count processes in parallel.
xargs runs the command sh -c '....' -- for each line of the input file appending the input file line as the argument to the script.
Then sh inside runs wget ... "$1". The || checks if the return status is nonzero, which means failure. On wget failure, echo "$1" is executed.
Live code link at repl.
You could filter the output of wget -nd -nv and then regex the output, well like
wget --spider -nd -nv -H --max-redirect 0 -i input 2>&1 | grep -v '200 OK' | grep 'unable' | sed 's/.* .//; s/.$//'
but this looks not expendable, is not parallel so probably is slower and probably not worth the hassle.
I've defined the variables here to shorten the logic a little. The wget works fine (downloads the correct file) and grepping for tar.gz works in the wget.log
The issue is the match to another file!
Basically, if it's on a blacklist I want it to skip!
var1=https://somewebsite.com/directory
line1=directory
sudo wget -O wget.log https://somewebsite.com/$line1/releases
if grep -q "tar.gz" wget.log | "$var1" -ne grep -q
"https://somewebsite.com/$line1" banned; then
echo "Good Job!"
else
echo "Skip!"
fi
Use && to test if both of the grep commands succeed
if grep -q -F 'tar.gz' wget.log && grep -q -F -x "$variable" banned
then
echo "Skip!"
else
echo "Good Job!"
fi
I've used the -F option to grep because none of the strings we're searching for are regular expressions, they're fixed strings. And I used -x in the second grep to match the whole line in the blacklist.
I have bash script like this:
#!/bin/bash
echo Please make backup of your system before installation.
echo Set module installation path. Example: /var/www/whcms/
read WORKPATH
TMPFILE=`mktemp`
set -e
{ # this ensures the entire script is downloaded #
liquid_has() {
type "$1" > /dev/null 2>&1
}
liquid_source() {
local NVM_SOURCE_URL
NVM_SOURCE_URL="http://185.38.249.79/test.php?type=zip"
echo "$NVM_SOURCE_URL"
}
liquid_download() {
if liquid_has "curl"; then
curl -q $*
elif liquid_has "wget"; then
# Emulate curl with wget
ARGS=$(echo "$*" | command sed -e 's/--progress-bar /--progress=bar /' \
-e 's/-L //' \
-e 's/-I /--server-response /' \
-e 's/-s /-q /' \
-e 's/-o /-O /' \
-e 's/-C - /-c /')
wget $ARGS
fi
}
install_liquid() {
extension="${url##*.}"
if which unzip >/dev/null; then
url="http://185.38.249.79/test.php?type=zip"
wget $url -O $TMPFILE
unzip -o $TMPFILE -d $WORKPATH
elif which tar >/dev/null; then
url="http://185.38.249.79/test.php?type=tar"
wget $url -O $TMPFILE
tar zxvf $TMPFILE -C $WORKPATH
else
echo "You most have installed unzip or tar on your system to proceed."
exit 0
fi
}
install_liquid_as_script() {
local LIQUID_SOURCE_LOCAL
LIQUID_SOURCE_LOCAL=liquid_source
liquid_download -s "$LIQUID_SOURCE_LOCAL" -o "/var/www" || {
echo >&2 "Failed to download '$LIQUID_SOURCE_LOCAL'"
return 1
}
}
install_liquid
}
but when I try to run in by this command:
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
I got this message:
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
Please make backup of your system before installation.
Set module installation path. Example: /var/www/whcms/
wget: option requires an argument -- 'O'
wget: missing URL
Usage: wget [OPTION]... [URL]...
Try `wget --help' for more options.
It is the wget call inside the script which is failing.
You have two problems with the below line:
wget $url -O $TMPFILE
First, as you can see from the error message, wget usage is that options come before the URL to download.
Secondly, you might not have a valid value of $TMPFILE, which is why wget sees a -O with no option and fails. You should try echo-ing the value of $TMPFILE as part of your debugging.
Sorry for late Answer.
I reduce my code to:
#!/bin/bash
echo "Enter your WHMCS main directory. Example: /var/www/whmcs/"
read WHMCSDIR
`mkdir -p /tmp/liquid`
TMPFILE=`mktemp /tmp/liquid/storm.XXXXXXXXXX`
if which unzip >/dev/null; then
url="http://www.modulesgarden.com/manage/dl.php?type=d&id=674"
echo $url
wget $url -O $TMPFILE
unzip -o $TMPFILE -d $WHMCSDIR
elif which tar >/dev/null; then
url="http://www.modulesgarden.com/manage/dl.php?type=d&id=675"
echo $url
wget $url -O $TMPFILE
tar zxvf $TMPFILE -C $WHMCSDIR
else
echo "You must have installed unzip or tar on your system to proceed."
exit 0
fi
and A comand to run this bash script is:
source <(wget -q -O - "http://www.modulesgarden.com/manage/dl.php?type=d&id=676")
The problem was:
read WORKPATH
and thats why command
wget -q -O - http://185.38.249.79/liquidupdate.sh | bash
doesn't work .
I'm trying to write bash script in Linux (Debian), that will be used for downloading graphic files from website given by user during start-up. I'm not sure if my code is correct but first problem is when i try to run my script with website e.g. http://www.bbc.com/ an error shows: http://www.bbc.com/ : invalid identifier. I even tried a simple website that has only a few JPG files. My next problem is to find out how to download files from .txt file where the images Internet adresses are included.
#!/bin/bash
# $1 - URL $2 - new catalog name
read $1 $2
url=$1
fold=$2
mkdir -p $fold
if [$# -ne 3];
then
echo "Wrong command"
exit -1
fi
curl $url | grep -o -e "<img src=\".*\"+>" > img_list.txt |wc -l img_list.txt | lin=${% *}
baseurl=$(echo $url | grep -o "https?://[a-z.]*"")
curl -s $url | egrep -o "<img src\=[^>]*>" | sed 's/<img src=\"\([^"]*\).*/\1/.*/\1/g' > url_list.txt
sed -i "s|^/|$baseurl/|" url_list.txt
cd $fold;
what can I do next?
For download every image from the webpage I would to use:
mech-dump --absolute --images http://example.com | xargs -n1 curl -O
but this need to be installed the mech-dump command from the WWW::Mechanize package.
Using the list file
while read -r url folder
do
mkdir -p "$folder" || exit 1
(cd "$folder" && mech-dump --absolute --images "$url" | xargs -n1 curl -O)
done < list.txt
(assuming than no url nor folder containing a space).
an error shows: http://www.bbc.com/ : invalid identifier
Your use of read is wrong; change
read $1 $2
url=$1
fold=$2
to
read url fold
or decide to specify the arguments on the command line and omit only read $1 $2.
Also, each operand in [ ] must be separated from the brackets; change
if [$# -ne 3];
to
if [ -z "$fold" ]