DownloadingAllZipFromAWebsite - web

I want to download all the zip files from the link:
https://s3.amazonaws.com/tripdata/index.html
I tried the following command but it did not work. I could dwonlaod only one file.
wget -r -np -l 1 -A zip https://s3.amazonaws.com/tripdata/index.html
It would be really appreciated to know the code to download all zip files from a webpage using wget or R.

Related

unzip file into same directory in linux

Example:
Here's list of files in "/tmp/test_dir"
file1
zip -r Test_Files.zip *
When I unzip Test_Files.zip I'm getting the below output
Current working directory "/tmp/test_dir"
/tmp/test_dir/file1
What I'm expecting when I unzip Test_Files.zip;
/tmp/test_dir/Test_Files/file1
Can anyone help how do i get expected result as mentioned above?
Use unzip. You can use -o to overwrite the existing files and -q to make it quiet. In doubt? Just use terminal and type in unzip (or try /usr/bin/unzip) to see helpful information.

wget for selected samples from ftp?

I wanted to download selected files from this site:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP042/SRP042286
If I wanted to download all of them I could do:
$wget -r ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP042/SRP042286/*
But my questions what a simple way to download just selected samples such as SRR1299458/ - SRR1299466/.
You can use wget -r ftp://example.com/path/{dir1,dir2}:
wget -r ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP042/SRP042286/{SRR1299458,SRR1299466}

Issue with wget, not downloading PDF even with -A.pdf

very new to this... apologies for mistakes/stupidity in advance.
Trying to use wget on mac to download PDF's from a list. I have a text file of DOI's (e.g. 10.1046/j.1365-294X.2001.01258.x), essentially I want to enter these DOI's into sci-hub.io and download the corresponding PDF. I have appended the DOI's to become a legitimate web address (e.g. http://sci-hub.io/10.1046/j.1365-294X.2001.01258.x). These work when entered manually into chrome, but not with wget to automate the process.
Tried:
wget -i file.txt
wget -A.pdf -i file.txt
wget -A.pdf -erobots=off -i file.txt
All of these still return files containing html info.
Any suggestions greatly appreciated.

Using wget to download select directories from ftp server

I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.
As a simple example, I know that I can download an entire directory using a command like:
wget --timestamping --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/
But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:
wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):
wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/
I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.
Thanks for the help!
Based on this doc it seems that the filtering functions of wget are very limited.
When using the --recursive option, wget will download all linked documents after applying the various filters, such as --no-parent and -I, -X, -A, -R options.
In your example:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
This won't download anything, because the -I option specifies to include only links matching /pub/special.requests/cew/2013/county/, but on the page /pub/special.requests/cew/ there are no such links, so the download stops there. This will work though:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
... because in this case the /pub/special.requests/cew/2013/ page does have a link to county/
Btw, you can find more details in this doc than on the man page:
http://www.gnu.org/software/wget/manual/html_node/
can't you simply do (and add the --timestamping/--no-parent etc. as needed)
wget -r ftp://ftp.bls.gov/pub/special.requests/cew/2013/county
The -I seems to work at one directory level at a time, so if we step one step up from county/ we could do:
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/
But apparently we can't step further up and do
wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

Download files using Shell script

I want to download a number of files which are as follows:
http://example.com/directory/file1.txt
http://example.com/directory/file2.txt
http://example.com/directory/file3.txt
http://example.com/directory/file4.txt
.
.
http://example.com/directory/file199.txt
http://example.com/directory/file200.txt
Can anyone help me with it using shell scripting? Here is what I'm using but it is downloading only the first file.
for i in {1..200}
do
exec wget http://example.com/directory/file$i.txt;
done
wget http://example.com/directory/file{1..200}.txt
should do it. That expands to wget http://example.com/directory/file1.txt http://example.com/directory/file2.txt ....
Alternatively, your current code should work fine if you remove the call to exec, which is unnecessary and doesn't do what you seem to think it does.
To download a list of files you can use wget -i <file> where is a file name with a list of url to download.
For more details you can review the help page: man wget

Resources