wget alternative in curl - linux

I am using wget on linux system like :
wget -r -nH --cut-dirs=3 --user="username" --password="password" --no-parent https://artifactory.company.com/nestetedlink1/nested_link2/directory_to_download -P home/user
Using this I get only files saved in the path no unwanted directory component from url: home/user/file_1, file_2
Is this possible to do the same functionality using curl command?
if I have to stick with WGET how to pass the api key I tried the below with --header='X-Auth-Token:, am I doing something wrong :
wget -r -nH --cut-dirs=3 --header='X-Auth-Token: <api_key>' --no-parent https://artifactory.company.com/nestetedlink1/nested_link2/directory_to_download -P home/user
How to insert api Key instead of username and password in wget?

wget --header='X-JFrog-Art-Api:api-key' https://artifactoryurl/filename.gz

Related

wget recursion and file extraction

I'm trying to use wget to elegantly & politely download all the pdfs from a website. The pdfs live in various sub-directories under the starting URL. It appears that the -A pdf option is conflicting with the -r option. But I'm not a wget expert! This command:
wget -nd -np -r site/path
faithfully traverses the entire site downloading everything downstream of path (not polite!). This command:
wget -nd -np -r -A pdf site/path
finishes immediately having downloaded nothing. Running that same command in debug mode:
wget -nd -np -r -A pdf -d site/path
reveals that the sub-directories are ignored with the debug message:
Deciding whether to enqueue "https://site/path/subdir1". https://site/path/subdir1 (subdir1) does not match acc/rej rules. Decided NOT to load it.
I think this means that the sub directories did not satisfy the "pdf" filter and were excluded. Is there a way to get wget to recurse into sub directories (of random depth) and only download pdfs (into a single local dir)? Or does wget need to download everything and then I need to manually filter for pdfs afterward?
UPDATE: thanks to everyone for their ideas. The solution was to use a two step approach including a modified version of this: http://mindspill.net/computing/linux-notes/generate-list-of-urls-using-wget/
UPDATE: thanks to everyone for their ideas. The solution was to use a two step approach including a modified version of this: http://mindspill.net/computing/linux-notes/generate-list-of-urls-using-wget/
Try this
1)the “-l” switch specifies to wget to go one level down from the primary URL specified. You could obviously switch that to how ever many levels down in the links you want to follow.
wget -r -l1 -A.pdf http://www.example.com/page-with-pdfs.htm
refer man wget for more details
if the above doesn't work,try this
verify that the TOS of the web site permit to crawl it. Then, one solution is :
mech-dump --links 'http://example.com' |
grep pdf$ |
sed 's/\s+/%20/g' |
xargs -I% wget http://example.com/%
The mech-dump command comes with Perl's module WWW::Mechanize (libwww-mechanize-perl package on debian & debian likes distros
for installing mech-dump
sudo apt-get update -y
sudo apt-get install -y libwww-mechanize-shell-perl
github repo https://github.com/libwww-perl/WWW-Mechanize
I haven't tested this, but you cans still give a try, what i think is you still need to find a way to get all URLs of a website and pipe to any of the solutions I have given.
You will need to have wget and lynx installed:
sudo apt-get install wget lynx
Prepare a script name it however you want for this example pdflinkextractor
#!/bin/bash
WEBSITE="$1"
echo "Getting link list..."
lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt
echo "Downloading..."
wget -P pdflinkextractor_files/ -i pdflinks.txt
to run the file
chmod 700 pdfextractor
$ ./pdflinkextractor http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm

Curl command to set permission and download file

How to use curl to download file and at same time to set the file permission. This for linux and permission of file will be '0775'
curl -u username:password -o 'directory/fileName.war' http://134.20.18.28:35000/fileName.war
curl does not do this, chmod does. Just && your commands so if the first succeeds the second runs:
curl -u username:password -o 'directory/fileName.war' http://134.20.18.28:35000/fileName.war && chmod 0775 directory/fileName.war

curl download with username and password in varible

Hi I am writing a auto script in test.sh , attempting to download a file. It works fine when I use all hard code string. But it does not work with variables. Belong are my code example:
#!/bin/bash
USER="admin"
PWD="adminpass"
curl -v -k -u ${USER}:${PWD} ${NEXUS_URL}/${SP1}/60/${SP1}-60.zip --output ${SP1}-60.zip
Above code not working not able to download my file, but if I put it as :
curl -v -k -u "admin":"adminpass" ${NEXUS_URL}/${SP1}/60/${SP1}-60.zip
--output ${SP1}-60.zip
Then it works. So how do I get the variable credential working with this curl command?
Thanks
Option 1
The parameter expansion will not include the double quotes. You can use:
#!/bin/bash
USER='"'admin'"' #single quote, double quote, single quote
PASS='"'adminpass'"'
curl -v -k -u ${USER}:${PASS} ${NEXUS_URL}/${SP1}/60/${SP1}-60.zip
Option 2
Alternatively, you can create a .netrc file and use curl -n as follows:
Documentation from https://ec.haxx.se/usingcurl-netrc.html
Create .netrc containing the following and place it in your home directory.
machine http://something.com
login admin
password adminpass
Run the command
curl -n -k --output ${SP1}-60.zip
curl will automatically look for the .netrc file. You can also specify the file path with curl --netrc-file <netrc_file_path>

Ubuntu: Using curl to download an image

I want to download an image accessible from this link: https://www.python.org/static/apple-touch-icon-144x144-precomposed.png into my local system. Now, I'm aware that the curl command can be used to download remote files through the terminal. So, I entered the following in my terminal in order to download the image into my local system:
curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
However, this doesn't seem to work, so obviously there is some other way to download images from the Internet using curl. What is the correct way to download images using this command?
curl without any options will perform a GET request. It will simply return the data from the URI specified. Not retrieve the file itself to your local machine.
When you do,
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
You will receive binary data:
|�>�$! <R�HP#T*�Pm�Z��jU֖��ZP+UAUQ#�
��{X\� K���>0c�yF[i�}4�!�V̧�H_�)nO#�;I��vg^_ ��-Hm$$N0.
���%Y[�L�U3�_^9��P�T�0'u8�l�4 ...
In order to save this, you can use:
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > image.png
to store that raw image data inside of a file.
An easier way though, is just to use wget.
$ wget https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
$ ls
.
..
apple-touch-icon-144x144-precomposed.png
For those who don't have nor want to install wget, curl -O (capital "o", not a zero) will do the same thing as wget. E.g. my old netbook doesn't have wget, and is a 2.68 MB install that I don't need.
curl -O https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
If you want to keep the original name — use uppercase -O
curl -O https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
If you want to save remote file with a different name — use lowercase -o
curl -o myPic.png https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
Create a new file called files.txt and paste the URLs one per line. Then run the following command.
xargs -n 1 curl -O < files.txt
source: https://www.abeautifulsite.net/downloading-a-list-of-urls-automatically
For ones who got permission denied for saving operation, here is the command that worked for me:
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png --output py.png
try this
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > precomposed.png

How do I download a tarball from GitHub using cURL?

I am trying to download a tarball from GitHub using cURL, but it does not seem to be redirecting:
$ curl --insecure https://github.com/pinard/Pymacs/tarball/v0.24-beta2
<html><body>You are being redirected.</body></html>
Note: wget works for me:
$ wget --no-check-certificate https://github.com/pinard/Pymacs/tarball/v0.24-beta2
However I want to use cURL because ultimately I want to untar it inline with something like:
$ curl --insecure https://github.com/pinard/Pymacs/tarball/v0.24-beta2 | tar zx
I found that the URL after redirecting turned out to be https://download.github.com/pinard-Pymacs-v0.24-beta1-0-gcebc80b.tar.gz, but I would like cURL to be smart enough to figure this out.
Use the -L option to follow redirects:
curl -L https://github.com/pinard/Pymacs/tarball/v0.24-beta2 | tar zx
The modernized way of doing this is:
curl -sL https://github.com/user-or-org/repo/archive/sha1-or-ref.tar.gz | tar xz
Replace user-or-org, repo, and sha1-or-ref accordingly.
If you want a zip file instead of a tarball, specify .zip instead of .tar.gz suffix.
You can also retrieve the archive of a private repo, by specifying -u token:x-oauth-basic option to curl. Replace token with a personal access token.
You can also use wget to »untar it inline«. Simply specify stdout as the output file (-O -):
wget --no-check-certificate https://github.com/pinard/Pymacs/tarball/v0.24-beta2 -O - | tar xz
All the other solutions require specifying a release/version number which obviously breaks automation.
This solution- currently tested and known to work with Github API v3- however can be used programmatically to grab the LATEST release without specifying any tag or release number and un-TARs the binary to an arbitrary name you specify in switch --one-top-level="pi-ap". Just swap-out user f1linux and repo pi-ap in below example with your own details and Bob's your uncle:
curl -L https://api.github.com/repos/f1linux/pi-ap/tarball | tar xzvf - --one-top-level="pi-ap" --strip-components 1
with a specific dir:
cd your_dir && curl -L https://download.calibre-ebook.com/3.19.0/calibre-3.19.0-x86_64.txz | tar zx

Resources