How to download multiple links into a folder via wget in linux - linux

When I want to download a file in a folder in Linux via wget I use the following:
wget -P /patch/to/folder/ http://example.com/file.zip
Now let's say I want to download several files into the same folder and my urls are:
http://example.com/file1.zip
http://example.com/file2.zip
http://example.com/file3.zip
How can I achieve this in one with command in the same folder /patch/to/folder/?
Thank you.

You can just append more URLs to your command:
wget -P /patch/to/folder/ http://example.com/file1.zip http://example.com/file2.zip http://example.com/file3.zip
If they only differ by number, you can have bash brace expansion automatically generate all the arguments before wget runs:
wget -P /patch/to/folder/ http://example.com/file{1..3}.zip
You can tell that this is possible from the invocation synopsis in man wget, where the ... is a convention that means it can accept multiple at a time:
SYNOPSIS
wget [option]... [URL]...

Related

How to specify file names to the multiple files we are downloading using wget -I command in linux?

We can download multiple links using wget -i file_name where file_name is the file that contains all URLs we have to download.
I have 3 URLs in a file for example:
google.com
facebook.com
twitter.com
I request these URLs using wget -i file_name. But, how can we specify files names to store the result?
For example,we have to store result from google.com, facebook.com, twitter.com as response1, response2, response3 respectively. Thanks in advance.
I found similar question here
Use the -O file option.
E.g.
wget google.com
...
16:07:52 (538.47 MB/s) - `index.html' saved [10728]
vs.
wget -O foo.html google.com
...
16:08:00 (1.57 MB/s) - `foo.html' saved [10728]
Referring above I come up with a solution to write a simple shell script.
It's simply like executing wget -O <URL> <filename> multiple times.
Create a file download_file.sh with contents like this
#!/bin/bash
wget https://www.google.com -O google_file
wget https://www.facebook.com -O fb_file
wget https://www.twitter.com -O twitter_file
Make the file executable
chmod +x download_file.sh
Run the file
./download_file.sh
All URL will be downloaded with filename defined in the download_file.sh. Also, you can tweak the shell script as your requirement like providing URLs from another file as the argument of this file.

How can I download all the files from a remote directory to my local directory?

I want to download all the files in a specific directory of my site.
Let's say I have 3 files in my remote SFTP directory
www.site.com/files/phone/2017-09-19-20-39-15
a.txt
b.txt
c.txt
My goal is to create a local folder on my desktop with ONLY those downloaded files. No parents files or parents directory needed. I am trying to get the clean report.
I've tried
wget -m --no-parent -l1 -nH -P ~/Desktop/phone/ www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
I got
I want to get
How do I tweak my wget command to get something like that?
Should I use anything else other than wget ?
Ihue,
Taking a shell programatic perspective I would recommend you try the following command line script, note I also added the citation so you can see the original threads.
wget -r -P ~/Desktop/phone/ -A txt www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
-r enables recursive retrieval. See Recursive Download for more information.
-P sets the directory prefix where all files and directories are saved to.
-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list. See Types of Files for more information.
Ref: #don-joey
https://askubuntu.com/questions/373047/i-used-wget-to-download-html-files-where-are-the-images-in-the-file-stored

Using wget to download select directories from ftp server

I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.
As a simple example, I know that I can download an entire directory using a command like:
wget --timestamping --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/
But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:
wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):
wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/
I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.
Thanks for the help!
Based on this doc it seems that the filtering functions of wget are very limited.
When using the --recursive option, wget will download all linked documents after applying the various filters, such as --no-parent and -I, -X, -A, -R options.
In your example:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
This won't download anything, because the -I option specifies to include only links matching /pub/special.requests/cew/2013/county/, but on the page /pub/special.requests/cew/ there are no such links, so the download stops there. This will work though:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
... because in this case the /pub/special.requests/cew/2013/ page does have a link to county/
Btw, you can find more details in this doc than on the man page:
http://www.gnu.org/software/wget/manual/html_node/
can't you simply do (and add the --timestamping/--no-parent etc. as needed)
wget -r ftp://ftp.bls.gov/pub/special.requests/cew/2013/county
The -I seems to work at one directory level at a time, so if we step one step up from county/ we could do:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
But apparently we can't step further up and do
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

Download files using Shell script

I want to download a number of files which are as follows:
http://example.com/directory/file1.txt
http://example.com/directory/file2.txt
http://example.com/directory/file3.txt
http://example.com/directory/file4.txt
.
.
http://example.com/directory/file199.txt
http://example.com/directory/file200.txt
Can anyone help me with it using shell scripting? Here is what I'm using but it is downloading only the first file.
for i in {1..200}
do
exec wget http://example.com/directory/file$i.txt;
done
wget http://example.com/directory/file{1..200}.txt
should do it. That expands to wget http://example.com/directory/file1.txt http://example.com/directory/file2.txt ....
Alternatively, your current code should work fine if you remove the call to exec, which is unnecessary and doesn't do what you seem to think it does.
To download a list of files you can use wget -i <file> where is a file name with a list of url to download.
For more details you can review the help page: man wget

wget .listing file, is there a way to specify the name of it

Ok so I need to run wget but I'm prohibited from creating 'dot' files in the location that I need to run the wget. So my question is 'Can I get wget to use a name other than .listing that I can specify'.
further clarification : this is to sync / mirror an ftp folder with a local one, So using the -O option is not really useful, as I require all files to maintain format.
You can use the -O option to set the output filename, as in:
wget -O file http://stackoverflow.com
You can also use wget --help to get a complete list of options.
For folks that come along afterwards, and are surprised by an answer to the wrong question, here is a copy of one of the comments below:
#FelixD, yes, unfortunately misunderstood the question. Looking at the code for wget version 1.19 (Feb 2017), specifically ftp.c, it appears that the .listing file is hardcoded in macro LIST_FILENAME, and no override possible. There are probably better options for mirroring ftp sites - maybe take a look at lftp and its mirror command, also includes parallel downloads: lftp.yar.ru
#Paul: You can use that -O option specified by spong
No. You can't do this.
wget/src/ftp.c
/* File where the "ls -al" listing will be saved. */
#ifdef MSDOS
#define LIST_FILENAME "_listing"
#else
#define LIST_FILENAME ".listing"
#endif
I have same problem;
wget seems to save the .listing file in current directory where wget was called from, regardless of -O path/outpout_file
As an ugly/desperate solution we can try to run wget from random directories:
cd /temp/random_1; wget ftp://example.com/ -O /full/save_path/to_file_1.txt
cd /temp/random_2; wget ftp://example.com/ -O /full/save_path/to_file_2.txt
Note: manual says that using the --no-remove-listing option will cause it to create .listing.1, .listing.2, etc, so that might be an option to avoid conflicts.
Note: .listing file is not created at all if ftp login failed.

Resources