Usually I create archives of data in linux at the commandline with tar & gzip (or pigz - as this uses parallel processing for compression).
However, listing the contents of such an archive is painfully slow because of the sequential format of tar archives. This is especially true if an archive contains many files that are several GB each.
What is an alternative to my combination to create gzipped tar archives of files in linux. Especially I'm looking for something that allows for a retrieval of the list or tree of files inside the archives, similar to tar - but much more performant?
zip? The zip file format contains a catalog of the contents (at the end, IIRC), which can be retrieved with zipinfo(1).
7zip is probably the best solution nowadays.
ZIP format is a bit outdated and was designed for FAT filesystem, which is where many of its limitations come from.
dar also might be an option. But as far as I can tell there is only one developer and no community around (unlike 7zip which has several forks and ports made by independent developers).
As we know, we could use rsync, tar or cp to transfer directory or files on NFS file system or different server.
Just be curious that which way is best to transfer complex directory?
Or writing a customized c program would be simple and fast to copy or create the link directory structure tree.
The best way is use rsync. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
The utility of command tar is: "designed to store and extract files from an archive file known as a tarfile". Not to do a transfer between servers.. And cp is more simple than rsync command. Replace all the files in the destination.
About program customized in C, i think that It is difficult to optimize something that has already been optimized.
What I am trying to do is this;
I get these zip files from clients which are 1.5gb in general. They all include pictures only. I need to make them into 100mb files to actually upload it to my server. Problem is that, If I break my 1.5gb zip file, I need to re-attach all of them if I need to use one.
When I break the 1.5gb zip file into a 100mb zip file, I need the 100mb one to act as a separate new file so the server will unzip it and upload the pictures into the database. I have looked for this problem but most of the threads are about how to split a zip file. This is partially what I want to do and I can do it now but I also need those smaller pieces to be able to unzip on its own. Is it possible to break a zip file into smaller pieces that will act as a new, stand alone zip files?
Thanks.
I have the same question. I think unzip in the Linux shell cannot handle a zip file larger than 1 GB, and I need to unzip them unattended in a headless NAS. What I do for now is unzip everything in the desktop HD, select files until they almost reach 1 GB, archive and delete them, then select the next set of files until I reach 1 GB.
Your answer is not clear, but I will try to answer it based upon my understanding of your dilemma.
Questions
Why does the file size need to be limited?
Is it the transfer to the server that is the constraining factor?
Is application (on the server) unable to process files over a certain size?
Can the process be altered so that image file fragments can be recombined on the server before processing?
What operating systems are in use on the client and the server?
Do you have shell access to the server?
A few options
Use imagemagick to reduce the files so they fit within the file size constraints
On Linux/Mac, this is relatively straightforward to do:
split -b 1m my_large_image.jpg (you need the b parameter for it to work on binary files)
Compress each file into its own zip
Upload to the server
Unzip
Concatenate the fragments back into an image file:
cat xaa xab xac xad (etc) > my_large_image.jpg
I try the following command
rsync -av --progress --inplace --rsh='ssh' /home/tom/workspace/myapp.war root#172.241.181.124:/home/rtom/uploads
But it seems it transfers the whole file again each time I execute the command when I make a small change in app that regenerates the myapp.war.
I want also the connection to automatically resume if connection is lost. I think this part is working.
The transfer should occur over ssh.
The connection speed is very slow and can break too so it is important that it transfers only what has changed. Of course it must also ensure that the file was correctly transfered.
rsync does handle relatively small changes and partial uploads in a file efficiently. There has been significant effort in the rsync algorithm towards this direction.
The problem is that WAR files are "extended" JAR files, which are essentially ZIP arhives and therefore compressed.
A small change in an uncompressed file will change the whole compressed segment where that file belongs and - most importantly - it can also change its size significantly. That can overcome the ability of rsync to detect and handle changes in the final compressed file.
On ZIP archives each uncompressed file has its own compressed segment. Therefore the order in which files are placed in the archive is also important with regard to achieving a degree of similarity to a previous version. Depending on how the WAR file is created, just adding a new file or renaming one can cause segments to move, essentially making the WAR file unrecognisable. In other words:
A small change in your application normally means a rather large change in your WAR file.
rsync is not designed to handle changes in compressed files. However, it can handle changes in your application. One solution would be to use it to upload your application files and then create the WAR file on the remote host.
A slightly different approach - that does not need any development tools on the remote host - would be to unpack (i.e. unzip) the WAR file locally, upload its contents and then pack (i.e. zip) it again on the remote host. This solution only requires a zip or jar implementation on the remote host.
I've been using (Ubuntu's) file-roller to compress a range of files, e.g. .gz, .zip, .rar, .tar.gz, etc. It's nice because it provides a simple, uniform interface to de-compressing files in particular folders. However, it's pretty slow, apparently because it pops open a GUI window to tell you its uncompressing the file.
So I am wondering if anyone can recommend a tool that will uncompress multiple compression formats, and has a uniform interface?
7-zip can uncompress a wide variety of formats, including 7z, ZIP, GZIP, BZIP2, TAR, ARJ, CAB, CHM, CPIO, DEB, DMG, HFS, ISO, LZH, LZMA, MSI, NSIS, RAR, RPM, UDF, WIM, XAR and Z.
If using 7zip as a developer don't forget you can easily embed it in your own applications. Scroll down to "How can I add support for 7z archives to my application?" in that link. Great stuff gotta love 7zip. If you want an app to build on with a uniform interface 7zip is it. Not to mention its a SF project so you can take a look around if you like.
File-roller is simply a front-end to these file formats. It sits on top and parses the output from the compression programs. I doubt you will get any noticeable performance advantages by replacing it.
You could just go into the terminal, bypass the GUI and write, for an example:
unrar x -r mybig.archive.rar
tar xvfz mybig.archive.tar.gz
unzip mybig.archive.zip
Update: Ran a test (1.4G rar archive)
unrar (non-free): 1m25.207s
file-roller: 1m39.311s
7z-rar: 1m17.084s
unrar-free: failed
rar (shareware): 1m29.109s
14 extra seconds for a full front-end, I think it is acceptable. 7zip is fastest, without frontend.
Windows : Universal Extractor - is an application destined to extract virtually any type of archive available in today’s market: RAR, ZIP, 7Z, EXE, TAR, NRG, ISO, DLL, you name it; this program is able to process all of them at incredible speed.
There’s no other purpose to this program than extracting the contents of archives. As such, you cannot rely on it to create archives. Also, the number of files it can process simultaneously is restricted to one, so batch decompressing is not possible.
7zip can do the simple job, but not like Universal Extractor
Ubuntu: p7zip-rar or p4zip or Archive Manager