How to preserve folder structure and filename while downloading using wget? - web

There's a file named links.txt which has different links from i am trying to fetch the links and download them using wget
Command shown below
wget -i links.txt -nc -c -x
The folder depth is quite a lot and the end result is that the file being downloaded have their names shortened for some reason.
I don't know if its a character limit in linux or something or a wget thing but i would ideally want the exact folder structure and filename to be retained.
Well the folder structure is fine as of now , only issue is the files names are getting shortened .
Lets say for example the actual fileName is - this_is_the_whole_fileName.mp4
the downloaded file would be named - this_is
FileName is only partially mentioned and the file extension is also absent
On another note - Aria2 doesn't shorten the filesnames and work well for most part , only issue there is that some files take time to start downloading and Aria2 gives error in such cases

Related

(Linux) Recursively overwrite all files in folder with data from another file

I find myself in a situation similar to this question:
Linux: Overwrite all files in folder with specified data?
The answers there work nicely, however, they are for typed-out text. Allow me to provide context.
I have a Linux terminal which the following file structure: (with files & folders irrelevant to the question removed)
root/
empty.svg
svg/
257238.svg
297522.svg
a7yf872.svg
236y27fh.svg
38277.svg
... (~200 other .svg files with arbitrary names)
2903852.svg
The framework I am working with requires those .svg files to exist with those specific filenames, but obviously, it does not care about SVG image they contain. I do not plan on using such files and they take up a hefty amount of space on disk, so I wish to convert them all into empty SVGs, aka the empty.svg file on my root directory, which is a 12x12 transparent SVG file (124 bytes). This way the framework shouldn't error out like it did when I tried simply overwriting the raw data of those SVGs with plaintext using the answer of the question linked at the top of this question. I've tried many methods by trying to be creative with my basic Linux command-line knowledge but no success. How do I accomplish this?
TL;DR: How to recursively overwrite all files in a folder with the raw data of another file from Linux CLI?
Similar to the link, you can use tee command, but instead of echo use cat to copy file contents, where cat is the command to read the contents of the file.
cat empty.svg | tee svg/257238.svg svg/297522.svg <etc>
But if there are a lot of files in svg directory it will be useful to use loop to automate the previous command:
for f in svg/*; do
if [[ "$f" == *.svg ]]; then
cat empty.svg > "$f"
fi
done
Here we use pipes and redirections to connect commands and redirect previous command output.

How to overwrite ".listing" file when using "wget" command

I have a generic script that uses wget to download the file (passed as parameter to the script) from FTP server. The script always downloads the files into the same local folder. The problem I am running into is that .listing file created by wget gets deleted by default so if the script is called in parallel for different files, whichever process gets to delete the .listing file succeeds and the rest fail.
So I tried to use --no-remove-listing along with wget command, but then I get the error:
File ".listing" already there; not retrieving.
I looked at another post but as mentioned in the comments by original poster, the question hasn't been answered even though it is marked so.
One option I was thinking about is to change the script to create subdirectory with filename and download the file there. But since it is a large script, I was trying to see if there is an easier option to just change wget command.

How to use command zip in linux that folder have short path?

I used command zip in linux (RedHat), this is my command:
zip -r /home/username/folder/compress/zip.zip /home/username/folder/compressed/*
Then, i open file zip.zip, i see architecture as path folder compress.
I want to in folder zip only consist list file *.txt
Because i used this command in script crontab hence i can't use command cd to path folder before run command zip
Please help me
I skimmed the zip man page and this is what I have found. There is not an option archive files relative to a different directory. The closest I have found is zip -j which removes the entire path and stores the files directly in the zip rather than sub directories. I do not know what happens in the case of file name conflicts such as if /home/username/folder/compressed/a.txt and /home/username/folder/compressed/subdir/a.txt both exist. If this is not a problem for you, you can use this option, but I am concerned because you did specify the -r option indicating that you expect zip to traverse sub folders.
I also thought of the possibility that your script could somehow call zip with a different working directory, but I took a look at this unix stack exchange page and it looks like their options use cd.
I have to admit I do not understand why you cannot use cd and I am very curious about it. You said something about using crontab, but I have never heard of anything wrong with changing directories in a crontab script.
I used option -j in command zip
zip -jr /home/username/folder/compress/zip.zip /home/username/folder/compressed/*
and i was yet settled this problem, thanks

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

wget:: How to rename all the already dowloaded files as name given by wget --content-dispostion?

I have downloaded 1200 jpeg files using wget. But the name of the files are based on links from which they are downloaded.
For ex.
http://www.*.*.*/index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk will download the file with name index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk.jpg but its name on server is different. Now I want all the files to be named as the name on the server.
One way is to delete all the files and re-download it with wget option --content-dispostion but total size of the download is 8GB and downloading it again is not a good option.
How can I rename all the downloaded files as names on server?
Edit: Name of the jpeg files downloaded from the links using wget --content-disposition or browser would be like 2014:08:09_18:07:51_IMG_5543.jpg (not created by wget, it's oringinal name on server, uploader's file name). I want all the files to be named as their oringinal names without again downloading them.
If the webserver supports HEAD request, you can use commands like wget --server-response --spider $URL, otherwise you can use range 0-1 to get one byte only. After you have the response heahers, you can write a script to rename. – Wu Yongzheng

Resources