wget --accept files containing pattern - string

I am trying to write a script that downloads all the files linked in a certain page. These files must contain specific strings and have certain extensions.
Let's say that I want to download all files that contain the string "1080" or "1080p" etc. and that have as extension ".mov" ".avi" ".wmv" etc. This was to show that both the strings and the extensions are multiple.
This is what I have done so far:
wget -Amov -r -np -nc -l1 --no-check-certificate -e robots=off http://www.example.com
Any help is really apreciated.
Thank you.

You can add a pattern for the -A switch, like this:
wget -A "*1080*mov" -r -np -nc -l1 --no-check-certificate -e robots=off http://www.example.com
This example will get all files with "1080", except gif & png files:
wget -A "*1080*" -R gif,png -r -np -nc -l1 --no-check-certificate -e robots=off http://www.example.com

Related

How to download multiple files using wget in linux and save them with custom name for each downloaded file

How to download multiple files using wget. Lets say i have a urls.txt containing several urls and i want to save them automatically with a custom filename for each file. How to do this?
I tried downloading 1 by 1 with this format "wget -c url1 -O filename1" successfully and now i want to try do batch download.
You might take look at xargs command, you would need to prepare file with arguments for each wget call, lets say it is named download.txt and
-O file1.html https://www.example.com
-O file2.html https://www.duckduckgo.com
and then use it as follows
cat download.txt | xargs wget -c
which is equivalent to doing
wget -c -O file1.html https://www.example.com
wget -c -O file2.html https://www.duckduckgo.com
Add the input file and a loop:
for i in `cat urls.txt`; do wget $i -O filename-$i; done

how to recursively fetch some data with pattern using wget

I am trying to download some specific files from this website (http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/), they keep 10 days data. I want to download all the files starting with "ST4" from all the directories starting with "nam_pcpn_anal". I could download all the files staring with "ST4" from one folder like :
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ -A ST4* 'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/nam_pcpn_anal.20160625/'
but I do not know how to search ST4 recursively. I thought the following should work but nope!
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ -A ST4* --accept nam_pcpn_anal*/ST4* 'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/'
Any idea!
The wget manual shows:
-I list
--include-directories=list
Specify a comma-separated list of directories you wish to follow
when downloading. Elements of list may contain wildcards.
So, you could try:
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ \
-A 'ST4*' -I '*/nam_pcpn_anal.*' \
'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/'

Using wget on Linux to scan sub-folders for specific files

Using wget -r -P Home -A jpg http://example.com will result me a list of files from that website directory, what i'm searching for is how do i query a search like: wget -r -P home -A jpg http://example.com/from 65121 to 75121/ file_ 100 to 200.jpg
Example(s):
wget -r -P home -A jpg http://example.com/65122/file_102.jpg
wget -r -P home -A jpg http://example.com/65123/file_103.jpg
wget -r -P home -A jpg http://example.com/65124/file_104.jpg
Is it possible to achieve that on a Linux distro?
I'm fairly new to Linux OS, any tips are welcome.
Use a nested for loop and some bash scripting:
for i in {65121..75121}; do for j in {100..200}; do wget -r -P home -A jpg "http://example.com/${i}/file_${j}.jpg"; done; done
Wget has loop
wget -nd -H -p -A file_{100..200}.jpg -e robots=off http://example.com/{65121..75121}/
If there are only file_{100..200}.jpg It's simpler
wget -nd -H -p -A jpg -e robots=off http://example.com/{65121..75121}/

wget -O for non-existing save path?

I can't wget while there is no path already to save. I mean, wget doens't work for the non-existing save paths. For e.g:
wget -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg
If /path/to/image/ is not previously existed, it always returns:
No such file or directory
How can i make it work to automatically create the path and save?
Try curl
curl http://www.site.org/image.jpg --create-dirs -o /path/to/save/images.jpg
mkdir -p /path/i/want && wget -O /path/i/want/image.jpg http://www.com/image.jpg
To download a file with wget, into a new directory, use --directory-prefix without -O:
wget --directory-prefix=/new/directory/ http://www.example.com/old_image.jpg
Using -O new_file in conjunction with --directory-prefix, will not create the new directory structure, and will save the new file in the current directory.
It may even fail with "No such file or directory" error, if you specify -O /new/directory/new_file
I was able to create folder if it doesn't exists with this command:
wget -N http://www.example.com/old_image.jpg -P /path/to/image
wget is only getting a file NOT creating the directory structure for you (mkdir -p /path/to/image/), you have to do this by urself:
mkdir -p /path/to/image/ && wget -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg
You can tell wget to create the directory (so you dont have to use mkdir) with the parameter --force-directories
alltogether this would be
wget --force-directories -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg
After searching a lot, I finally found a way to use wget to download for non-existing path.
wget -q --show-progress -c -nc -r -nH -i "$1"
=====
Clarification
-q
--quiet --show-progress
Kill annoying output but keep the progress-bar
-c
--continue
Resume download if the connection lost
-nc
--no-clobber
Overwriting file if exists
-r
--recursive
Download in recursive mode (What topic creator asked for!)
-nH
--no-host-directories
Tell wget do not use the domain as a directory (for e.g: https://example.com/what/you/need
- without this option, it will download to "example.com/what/you/need")
-i
--input-file
File with URLs need to be download (in case you want to download a lot of URLs,
otherwise just remove this option)
Happy wget-ing!

Wget Output in Recursive mode

I am using wget -r to download 3 .zip files from a specified webpage. Here is what I have so far:
wget -r -nd -l1 -A.zip http://www.website.com/example
Right now, the zip files all begin with abc_*.zip where * seems to be a random. I want to have the first downloaded file to be called xyz_1.zip, the second to be xyz_2.zip, and the third to be xyz_3.zip.
Is this possible with wget?
Many thanks!
I don't think it's possible with wget alone. After downloading you could use some simple shell scripting to rename the files, like:
i=1; for f in abc_*.zip; do mv "$f" "xyz_$i.zip"; i=$(($i+1)); done
Try to get a listing first and then download each file separately.
let n=1
wget -nv -l1 -r --spider http://www.website.com/example 2>&1 | \
egrep -io 'http://.*\.zip'| \
while read url; do
wget -nd -nv -O $(echo $url|sed 's%^.*/\(.*\)_.*$%\1%')_$n.zip "$url"
let n++
done
I don't think there is a way you can do it within a single wget command.
wget does have a -O option which you can use to tell it which file to output to, but it won't work in your case because multiple files will get concatenated together.
You will have to write a script which renames the files from abc_*.zip to xyz_*.zip after wget has completed.
Alternatively, invoke wget for one zip file at a time and use the -O option.

Resources