wget -O for non-existing save path? - linux

I can't wget while there is no path already to save. I mean, wget doens't work for the non-existing save paths. For e.g:
wget -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg
If /path/to/image/ is not previously existed, it always returns:
No such file or directory
How can i make it work to automatically create the path and save?

Try curl
curl http://www.site.org/image.jpg --create-dirs -o /path/to/save/images.jpg

mkdir -p /path/i/want && wget -O /path/i/want/image.jpg http://www.com/image.jpg

To download a file with wget, into a new directory, use --directory-prefix without -O:
wget --directory-prefix=/new/directory/ http://www.example.com/old_image.jpg
Using -O new_file in conjunction with --directory-prefix, will not create the new directory structure, and will save the new file in the current directory.
It may even fail with "No such file or directory" error, if you specify -O /new/directory/new_file

I was able to create folder if it doesn't exists with this command:
wget -N http://www.example.com/old_image.jpg -P /path/to/image

wget is only getting a file NOT creating the directory structure for you (mkdir -p /path/to/image/), you have to do this by urself:
mkdir -p /path/to/image/ && wget -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg
You can tell wget to create the directory (so you dont have to use mkdir) with the parameter --force-directories
alltogether this would be
wget --force-directories -O /path/to/image/new_image.jpg http://www.example.com/old_image.jpg

After searching a lot, I finally found a way to use wget to download for non-existing path.
wget -q --show-progress -c -nc -r -nH -i "$1"
=====
Clarification
-q
--quiet --show-progress
Kill annoying output but keep the progress-bar
-c
--continue
Resume download if the connection lost
-nc
--no-clobber
Overwriting file if exists
-r
--recursive
Download in recursive mode (What topic creator asked for!)
-nH
--no-host-directories
Tell wget do not use the domain as a directory (for e.g: https://example.com/what/you/need
- without this option, it will download to "example.com/what/you/need")
-i
--input-file
File with URLs need to be download (in case you want to download a lot of URLs,
otherwise just remove this option)
Happy wget-ing!

Related

Create Directory, download file and execute command from list of URL

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything up to the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything from the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt
Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do
wget -i urls.txt
Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.

How to download multiple files using wget in linux and save them with custom name for each downloaded file

How to download multiple files using wget. Lets say i have a urls.txt containing several urls and i want to save them automatically with a custom filename for each file. How to do this?
I tried downloading 1 by 1 with this format "wget -c url1 -O filename1" successfully and now i want to try do batch download.
You might take look at xargs command, you would need to prepare file with arguments for each wget call, lets say it is named download.txt and
-O file1.html https://www.example.com
-O file2.html https://www.duckduckgo.com
and then use it as follows
cat download.txt | xargs wget -c
which is equivalent to doing
wget -c -O file1.html https://www.example.com
wget -c -O file2.html https://www.duckduckgo.com
Add the input file and a loop:
for i in `cat urls.txt`; do wget $i -O filename-$i; done

how to recursively fetch some data with pattern using wget

I am trying to download some specific files from this website (http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/), they keep 10 days data. I want to download all the files starting with "ST4" from all the directories starting with "nam_pcpn_anal". I could download all the files staring with "ST4" from one folder like :
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ -A ST4* 'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/nam_pcpn_anal.20160625/'
but I do not know how to search ST4 recursively. I thought the following should work but nope!
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ -A ST4* --accept nam_pcpn_anal*/ST4* 'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/'
Any idea!
The wget manual shows:
-I list
--include-directories=list
Specify a comma-separated list of directories you wish to follow
when downloading. Elements of list may contain wildcards.
So, you could try:
wget -r -nd -N --no-parent -nH --cut-dirs=100 -P ~/test/ \
-A 'ST4*' -I '*/nam_pcpn_anal.*' \
'http://nomads.ncep.noaa.gov/pub/data/nccf/com/hourly/prod/'

UNIX loop through list of url and save using wget

I am trying to download many files and can do so in a long manner using unix, but how can I do it using a loop function? I have many tables like CA30 and CA1-3 to download. Can I put the table names in a list list("CA30", "CA1-3") and have a loop go through the list?
#!/bin/bash
# get the CA30 files and put into folder for CA30
sudo wget -PO "https://www.bea.gov/regional/zip/CA30.zip"
sudo mkdir -p in/CA30
sudo unzip O/CA30.zip -d in/CA30
# get the CA30 files and put into folder for CA1-3
sudo wget -PO "https://www.bea.gov/regional/zip/CA1-3.zip"
sudo mkdir -p in/CA1-3
sudo unzip O/CA1-3.zip -d in/CA1-3
#!/bin/bash
for base in CA30 CA1-3; do
# get the $base files and put into folder for $base
wget -PO "https://www.bea.gov/regional/zip/${base}.zip"
mkdir -p in/${base}
unzip O/${base}.zip -d in/${base}
done
I have removed sudo - there's no reason to perform unprivileged operations with superuser privileges. If you can write into a particular folder, change the working directory.

Wget Output in Recursive mode

I am using wget -r to download 3 .zip files from a specified webpage. Here is what I have so far:
wget -r -nd -l1 -A.zip http://www.website.com/example
Right now, the zip files all begin with abc_*.zip where * seems to be a random. I want to have the first downloaded file to be called xyz_1.zip, the second to be xyz_2.zip, and the third to be xyz_3.zip.
Is this possible with wget?
Many thanks!
I don't think it's possible with wget alone. After downloading you could use some simple shell scripting to rename the files, like:
i=1; for f in abc_*.zip; do mv "$f" "xyz_$i.zip"; i=$(($i+1)); done
Try to get a listing first and then download each file separately.
let n=1
wget -nv -l1 -r --spider http://www.website.com/example 2>&1 | \
egrep -io 'http://.*\.zip'| \
while read url; do
wget -nd -nv -O $(echo $url|sed 's%^.*/\(.*\)_.*$%\1%')_$n.zip "$url"
let n++
done
I don't think there is a way you can do it within a single wget command.
wget does have a -O option which you can use to tell it which file to output to, but it won't work in your case because multiple files will get concatenated together.
You will have to write a script which renames the files from abc_*.zip to xyz_*.zip after wget has completed.
Alternatively, invoke wget for one zip file at a time and use the -O option.

Resources