Wget simplify asset filenames from downloading - web

I am downloading a website with wget with the following command line:
wget -E -H -k -K -p --no-directories --content-disposition https://example.com/hello.index
When I've downloaded the assets, this is how the file directory looks like:
index.html.orig
index.html?url=http%3A%2F%2Fmedia.engadget.com%2Fimg%2Fproducts%2F503%2Fasu0%2Fasu0.jpg
index.html?url=http%3A%2F%2Fmedia.engadget.com%2Fimg%2Fproducts%2F536%2Fbi6i%2Fbi6i.jpg
index.html?url=http%3A%2F%2Fmedia.engadget.com%2Fimg%2Fproducts%2F547%2Fbqt8%2Fbqt8.jpg
...
Is there a way that I can instruct wget to download the website such that the names of the files will be:
index.html.orig
asu0.jpg
bi6i.jpg
bqt8.jpg
...
and the index.html file be named appropriately?

Related

wget alternative in curl

I am using wget on linux system like :
wget -r -nH --cut-dirs=3 --user="username" --password="password" --no-parent https://artifactory.company.com/nestetedlink1/nested_link2/directory_to_download -P home/user
Using this I get only files saved in the path no unwanted directory component from url: home/user/file_1, file_2
Is this possible to do the same functionality using curl command?
if I have to stick with WGET how to pass the api key I tried the below with --header='X-Auth-Token:, am I doing something wrong :
wget -r -nH --cut-dirs=3 --header='X-Auth-Token: <api_key>' --no-parent https://artifactory.company.com/nestetedlink1/nested_link2/directory_to_download -P home/user
How to insert api Key instead of username and password in wget?
wget --header='X-JFrog-Art-Api:api-key' https://artifactoryurl/filename.gz

Multiple downloads with wget at the same time

I have a link.txt with multiple links for download,all are protected by the same username and password.
My intention is to download multiple files at the same time, if the file contains 5 links, to download all 5 files at the same time.
I've tried this, but without success.
cat links.txt | xargs -n 1 -P 5 wget --user user007 --password pass147
and
cat links.txt | xargs -n 1 -P 5 wget --user=user007 --password=pass147
give me this error:
Reusing existing connection to www.site.com HTTP request sent,
awaiting response... 404 Not Found
This message appears in all the links i try to download, except for the last link in the file which starts to download.
i am currently use, but this download just one file at the time
wget -user=admin --password=145788s -i links.txt
Use wget's -i and -b flags.
-b
--background
Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
-i file
--input-file=file
Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file literally named -.)
Your command will look like:
wget --user user007 --password "pass147*" -b -i links.txt
Note: You should always quote strings with special characters (eg: *).

Using wget to download all zip files on an shtml page

I've been trying to download all the zip files on this website to an EC2 server. However, it is not recognizing the links and thus not downloading anything. I think it's because the shtml file requires that SSI be enabled and that's somehow causing a problem with wget. But I don't really understand that stuff.
This is the code I've been using unsuccessfully.
wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2015_2016
Thanks for any help you can provide!
The zip links aren't present on the source code, that's why you cannot download them via wget, they're generated via javascript. The file list is "located" inside http://fec.gov//finance/disclosure/tables/foia_files_summary.xml under node <fec_file status="Archive"></fec_file>
You can code a script to parse the xml file and convert the nodes to the actual links because they've a pattern.
UPDATE:
As #cyrus mentioned, the files are also on ftp.fec.gov/FEC/, you can use wget -m for mirroring the ftp and -A zip to restrict the download to zip files, i.e.:
wget -A zip -m --user=anonymous --password=test#test.com ftp://ftp.fec.gov/FEC/
Or wget -r
wget -A zip --ftp-user=anonymous --ftp-password=test#test.com -r ftp://ftp.fec.gov/FEC/*

Ubuntu: Using curl to download an image

I want to download an image accessible from this link: https://www.python.org/static/apple-touch-icon-144x144-precomposed.png into my local system. Now, I'm aware that the curl command can be used to download remote files through the terminal. So, I entered the following in my terminal in order to download the image into my local system:
curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
However, this doesn't seem to work, so obviously there is some other way to download images from the Internet using curl. What is the correct way to download images using this command?
curl without any options will perform a GET request. It will simply return the data from the URI specified. Not retrieve the file itself to your local machine.
When you do,
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
You will receive binary data:
|�>�$! <R�HP#T*�Pm�Z��jU֖��ZP+UAUQ#�
��{X\� K���>0c�yF[i�}4�!�V̧�H_�)nO#�;I��vg^_ ��-Hm$$N0.
���%Y[�L�U3�_^9��P�T�0'u8�l�4 ...
In order to save this, you can use:
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > image.png
to store that raw image data inside of a file.
An easier way though, is just to use wget.
$ wget https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
$ ls
.
..
apple-touch-icon-144x144-precomposed.png
For those who don't have nor want to install wget, curl -O (capital "o", not a zero) will do the same thing as wget. E.g. my old netbook doesn't have wget, and is a 2.68 MB install that I don't need.
curl -O https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
If you want to keep the original name — use uppercase -O
curl -O https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
If you want to save remote file with a different name — use lowercase -o
curl -o myPic.png https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
Create a new file called files.txt and paste the URLs one per line. Then run the following command.
xargs -n 1 curl -O < files.txt
source: https://www.abeautifulsite.net/downloading-a-list-of-urls-automatically
For ones who got permission denied for saving operation, here is the command that worked for me:
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png --output py.png
try this
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > precomposed.png

How do I download a tarball from GitHub using cURL?

I am trying to download a tarball from GitHub using cURL, but it does not seem to be redirecting:
$ curl --insecure https://github.com/pinard/Pymacs/tarball/v0.24-beta2
<html><body>You are being redirected.</body></html>
Note: wget works for me:
$ wget --no-check-certificate https://github.com/pinard/Pymacs/tarball/v0.24-beta2
However I want to use cURL because ultimately I want to untar it inline with something like:
$ curl --insecure https://github.com/pinard/Pymacs/tarball/v0.24-beta2 | tar zx
I found that the URL after redirecting turned out to be https://download.github.com/pinard-Pymacs-v0.24-beta1-0-gcebc80b.tar.gz, but I would like cURL to be smart enough to figure this out.
Use the -L option to follow redirects:
curl -L https://github.com/pinard/Pymacs/tarball/v0.24-beta2 | tar zx
The modernized way of doing this is:
curl -sL https://github.com/user-or-org/repo/archive/sha1-or-ref.tar.gz | tar xz
Replace user-or-org, repo, and sha1-or-ref accordingly.
If you want a zip file instead of a tarball, specify .zip instead of .tar.gz suffix.
You can also retrieve the archive of a private repo, by specifying -u token:x-oauth-basic option to curl. Replace token with a personal access token.
You can also use wget to »untar it inline«. Simply specify stdout as the output file (-O -):
wget --no-check-certificate https://github.com/pinard/Pymacs/tarball/v0.24-beta2 -O - | tar xz
All the other solutions require specifying a release/version number which obviously breaks automation.
This solution- currently tested and known to work with Github API v3- however can be used programmatically to grab the LATEST release without specifying any tag or release number and un-TARs the binary to an arbitrary name you specify in switch --one-top-level="pi-ap". Just swap-out user f1linux and repo pi-ap in below example with your own details and Bob's your uncle:
curl -L https://api.github.com/repos/f1linux/pi-ap/tarball | tar xzvf - --one-top-level="pi-ap" --strip-components 1
with a specific dir:
cd your_dir && curl -L https://download.calibre-ebook.com/3.19.0/calibre-3.19.0-x86_64.txz | tar zx

Resources