wget:: How to rename all the already dowloaded files as name given by wget --content-dispostion? - linux

I have downloaded 1200 jpeg files using wget. But the name of the files are based on links from which they are downloaded.
For ex.
http://www.*.*.*/index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk will download the file with name index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk.jpg but its name on server is different. Now I want all the files to be named as the name on the server.
One way is to delete all the files and re-download it with wget option --content-dispostion but total size of the download is 8GB and downloading it again is not a good option.
How can I rename all the downloaded files as names on server?
Edit: Name of the jpeg files downloaded from the links using wget --content-disposition or browser would be like 2014:08:09_18:07:51_IMG_5543.jpg (not created by wget, it's oringinal name on server, uploader's file name). I want all the files to be named as their oringinal names without again downloading them.

If the webserver supports HEAD request, you can use commands like wget --server-response --spider $URL, otherwise you can use range 0-1 to get one byte only. After you have the response heahers, you can write a script to rename. – Wu Yongzheng

Related

How to preserve folder structure and filename while downloading using wget?

There's a file named links.txt which has different links from i am trying to fetch the links and download them using wget
Command shown below
wget -i links.txt -nc -c -x
The folder depth is quite a lot and the end result is that the file being downloaded have their names shortened for some reason.
I don't know if its a character limit in linux or something or a wget thing but i would ideally want the exact folder structure and filename to be retained.
Well the folder structure is fine as of now , only issue is the files names are getting shortened .
Lets say for example the actual fileName is - this_is_the_whole_fileName.mp4
the downloaded file would be named - this_is
FileName is only partially mentioned and the file extension is also absent
On another note - Aria2 doesn't shorten the filesnames and work well for most part , only issue there is that some files take time to start downloading and Aria2 gives error in such cases

lftp mirror with prefix

I did a partial download of a directory full of logs from a remote ftp server.
For example mget *201610* this will pull down all the logs with 201610 as apart of the name.
What I would like to do is continue the download since I only have about 100 of the 400 files for that month. Since I already downloaded a lot of the files mirror -c would be the best option as it only downloads the stuff I don't have.
Question:
how do I incorporate mirror -c for only the subset of files that have 201610 in them? I dont want to start downloading all the files for other months just the missing ones from 201610
Thanks
Use -I option of mirror, it allows to specify a wildcard for files to be included.

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

wget Downloading and replacing file only if target is newer than source

This is what I'm trying to achieve :
User uploads file1.jpg to Server A
Using wget Server B only downloads file1.jpg from Server A if the file is newer than the one that already exists on Server B and then replaces the file on Server B with the new one.
I know I can use :
wget -N http://www.mywebsite.com/files/file1.jpg
To check that the target file is newer than the source but I'm a little confused as to how I format the command to let it know what and where the actual source file is?
Is it something like? :
wget -N http://www.mywebsite.com/files/file1.jpg /serverb/files/file1.jpg
Cheers!
You can use -P option to specify the directory where the file(s) will be downloaded:
$ wget -N -P /serverb/files/ http://www.mywebsite.com/files/file1.jpg
You are also talking about downloading and replacing the file. Be aware, that wget overwrites the file, so it is "broken" while being downloaded. I don't think you can do atomic replacement of the file using only wget. You need a small script that uses temporary files and move to atomically replace the file in Server B.

Using wget on a directory

I'm fairly new to shell and I'm trying to use wget to download a .zip file from one directory to another. The only file in the directory I am copying the file from is the .zip file. However when I use wget IP address/directory it downloads an index.html file instead of the .zip. Is there something I am missing to get it to download the .zip without having to explicitly state it?
wget is the utility to download file from web.
you have mentioned you want to copy from one directory to other. you meant it is on same server/node?
In that case you can simply use cp command
And if you want if from any other server/node [file transfer] you can use scp or ftp

Resources