How do you use wget to download most up to date file on a site?

How do you use wget to download most up to date file on a site? - linux

Hello I am trying to use wget to download the most update to day McAfee patch and I am having issues singling out the .tar file. This is what I have:
wget -q -O - ftp://ftp.mcafee.com/pub/antivirus/datfiles/4.x/ | grep -o -m 2 "avvdat-[^\']*"
However when I run the above command it gives me:
avvdat-8065.tar">avvdat-8065.tar</a> (95191040 bytes)
avvdat-8066.tar">avvdat-8066.tar</a> (95385600 bytes)
When I need it to just be the most recent.tar file in between the <a> </a> which in this case would be avvdat-8066.tar. Can someone please help me out with greping the correct .tar I am not too good with regex or sed.

Try this,
wget $(wget -q -O - ftp://ftp.mcafee.com/pub/antivirus/datfiles/4.x/ | grep -Eo "ftp://[^\"\]+" | sort | tail -n1)

I'd suggest modifying your grep regex so it retrieves only the file name, then using sort to sort the results and tail to discard all but the last one.
wget -q -O - ftp://ftp.mcafee.com/pub/antivirus/datfiles/4.x/ | grep -o -m 2 "avvdat-[^\'\"]*" | sort | tail -1

Related

Linux curl : no url found (or) curl: malformed url

So I am downloading docker setup on my linux vm, and have to run this command as part of the steps, but even though it mentions url, and I changed once -o to -O but still getting those errors, what to do for this?
this is the command im running
sudo curl -L $(curl -L https://api.github.com/repos/docker/compose/releases/latest | grep "browser_download_url" | grep "$(uname -s)-$(uname -m)\"" | sed -nr 's/\s+"browser_download_url":\s+"(https.*)"/\1/p') -o /usr/local/bin/docker-compose

The grep that is filtering what system you are running is outputting an upper case L in Linux, this may be the cause of your errors. Try this:
sudo curl -L $(curl -L https://api.github.com/repos/docker/compose/releases/latest | grep "browser_download_url" | grep -i "$(uname -s)-$(uname -m)\"" | sed -nr 's/\s+"browser_download_url":\s+"(https.*)"/\1/p') -o /usr/local/bin/docker-compose
Hope this helps!

How to store output for every xargs instance separately

cat domains.txt | xargs -P10 -I % ffuf -u %/FUZZ -w wordlist.txt -o output.json
Ffuf is used for directory and file bruteforcing while domains.txt contains valid HTTP and HTTPS URLs like http://example.com, http://example2.com. I used xargs to speed up the process by running 10 parallel instances. But the problem here is I am unable to store output for each instance separately and output.json is getting override by every running instance. Is there anything we can do to make output.json unique for every instance so that all data gets saved separately. I tried ffuf/$(date '+%s').json instead but it didn't work either.

Sure. Just name your output file using the domain. E.g.:
xargs -P10 -I % ffuf -u %/FUZZ -w wordlist.txt -o output-%.json < domains.txt
(I dropped cat because it was unnecessary.)
I missed the fact that your domains.txt file is actually a list of URLs rather than a list of domain names. I think the easiest fix is just to simplify domains.txt to be just domain names, but you could also try something like:
xargs -P10 -I % sh -c 'domain="%"; ffuf -u %/FUZZ -w wordlist.txt -o output-${domain##*/}.json' < domains.txt

cat domains.txt | xargs -P10 -I % sh -c "ping % > output.json.%"
Like this and your "%" can be part of the file name. (I changed your command to ping for my testing)
So maybe something more like this:
cat domains.txt | xargs -P10 -I % sh -c "ffuf -u %/FUZZ -w wordlist.txt -o output.json.%
"
I would replace your ffuf command with the following script, and call this from the xargs command. It just strips out the invalid file name characters and replaces them with a dot then runs the command:
#!/usr/bin/bash
URL=$1
FILE="`echo $URL | sed 's/:\/\//\./g'`"
ffuf -u ${URL}/FUZZ -w wordlist.txt -o output-${FILE}.json

Most efficient way to get the latest version of an rpm via web

This is my attempt using wget to pull down the web page, dig for latest tar file and rerun a wget to take it down. In the example, i'm taking down pip.
wget https://pypi.org/project/pip/#files
wget $(grep tar.gz index.html | head -1 | awk -F= '{print $2}' | sed 's/>//' | sed 's/\"//g')
gunzip -c $(ls | grep tar |tail -1) | tar xvf -
yum install -y python-setuptools
cd $(ls -d */ | grep pip)
python setup.py install
cd ..
I'm sure that there is a better way, perhaps only using one wget or similar

Do you mean like that?
wget $(curl -s "https://pypi.org/project/pip/#files"|grep -o 'https://[^"]*tar\.gz')

remove duplicate lines in wget output

I want to remove duplicate lines in wget output.
I use this code
wget -q "http://www.sawfirst.com/selena-gomez" -O -|tr ">" "\n"|grep 'selena-gomez-'|cut -d\" -f2|cut -d\# -f1|while read url;do wget -q "$url" -O -|tr ">" "\n"|grep 'name=.*content=.*jpg'|cut -d\' -f4|sort |uniq;done
And output like this
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
I want to remove duplicate lines of output.

Better try :
mech-dump --images "http://www.sawfirst.com/selena-gomez" |
grep -i '\.jpg$' |
sort -u
Package libwww-mechanize-perl for Debian and derivatives.
Output:
http://www.sawfirst.com/wp-content/uploads/2018/03/Selena-Gomez-12.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-760.jpg
http://www.sawfirst.com/wp-content/uploads/2018/02/Selena-Gomez-404.jpg
...

In some cases, tools like Beautiful Soup become more appropriate.
Trying to do this with only wget & grep becomes an interesting exercise, this is my naive try but I am very sure are better ways of doing it
$ wget -q "http://www.sawfirst.com/selena-gomez" -O -|
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez" |
while read url; do
if [[ $url == *jpg ]]
then
echo $url
else
wget -q $url -O - |
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez" |
grep "\.jpg$" &
fi
done | sort -u > selena-gomez
In the first round:
wget -q "http://www.sawfirst.com/selena-gomez" -O -|
grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" |
grep -i "selena-gomez"
URLs matching the desired name will be extracted, in the while loop could be the case that the $url is already ending with .jpg therefore it will be only printed instead of fetching the content again.
This approach just goes deep 1 level, and to try to speed up things it uses & ad the end with the intention to do multiple requests in parallel:
grep "\.jpg$" &
Need to check if the & lock or wait for all background jobs to finish
It ends with sort -u to return a unique list of items found.

Need response time and download time for the URLs and write shell scripts for same

I have use command to get response time :
curl -s -w "%{time_total}\n" -o /dev/null https://www.ziffi.com/suggestadoc/js/ds.ziffi.https.v308.js
and I also need download time of this below mentioned js file link so used wget command to download this file but i get multiple parameter out put. I just need download time from it
$ wget --output-document=/dev/null https://www.ziffi.com/suggestadoc/js/ds.ziffi.https.v307.js
please suggest

I think what you are looking for is this:
wget --output-document=/dev/null https://www.ziffi.com/suggestadoc/js/ds.ziffi.https.v307.js 2>&1 >/dev/null | grep = | awk '{print $5}' | sed 's/^.*\=//'
Explanation:
2>&1 >/dev/null | --> Makes sure stderr gets piped instead of stdout
grep = --> select the line that contains the '=' symbol
sed 's/^.*\=//' --> deletes everything from linestart to the = symbol

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do you use wget to download most up to date file on a site? - linux

Try this, wget $(wget -q -O - ftp://ftp.mcafee.com/pub/antivirus/datfiles/4.x/ | grep -Eo "ftp://[^\"\]+" | sort | tail -n1)

I'd suggest modifying your grep regex so it retrieves only the file name, then using sort to sort the results and tail to discard all but the last one. wget -q -O - ftp://ftp.mcafee.com/pub/antivirus/datfiles/4.x/ | grep -o -m 2 "avvdat-[^\'\"]*" | sort | tail -1

Related

Linux curl : no url found (or) curl: malformed url

How to store output for every xargs instance separately

Most efficient way to get the latest version of an rpm via web

remove duplicate lines in wget output

Need response time and download time for the URLs and write shell scripts for same

Categories

Resources