I've been using (Ubuntu's) file-roller to compress a range of files, e.g. .gz, .zip, .rar, .tar.gz, etc. It's nice because it provides a simple, uniform interface to de-compressing files in particular folders. However, it's pretty slow, apparently because it pops open a GUI window to tell you its uncompressing the file.
So I am wondering if anyone can recommend a tool that will uncompress multiple compression formats, and has a uniform interface?
7-zip can uncompress a wide variety of formats, including 7z, ZIP, GZIP, BZIP2, TAR, ARJ, CAB, CHM, CPIO, DEB, DMG, HFS, ISO, LZH, LZMA, MSI, NSIS, RAR, RPM, UDF, WIM, XAR and Z.
If using 7zip as a developer don't forget you can easily embed it in your own applications. Scroll down to "How can I add support for 7z archives to my application?" in that link. Great stuff gotta love 7zip. If you want an app to build on with a uniform interface 7zip is it. Not to mention its a SF project so you can take a look around if you like.
File-roller is simply a front-end to these file formats. It sits on top and parses the output from the compression programs. I doubt you will get any noticeable performance advantages by replacing it.
You could just go into the terminal, bypass the GUI and write, for an example:
unrar x -r mybig.archive.rar
tar xvfz mybig.archive.tar.gz
unzip mybig.archive.zip
Update: Ran a test (1.4G rar archive)
unrar (non-free): 1m25.207s
file-roller: 1m39.311s
7z-rar: 1m17.084s
unrar-free: failed
rar (shareware): 1m29.109s
14 extra seconds for a full front-end, I think it is acceptable. 7zip is fastest, without frontend.
Windows : Universal Extractor - is an application destined to extract virtually any type of archive available in today’s market: RAR, ZIP, 7Z, EXE, TAR, NRG, ISO, DLL, you name it; this program is able to process all of them at incredible speed.
There’s no other purpose to this program than extracting the contents of archives. As such, you cannot rely on it to create archives. Also, the number of files it can process simultaneously is restricted to one, so batch decompressing is not possible.
7zip can do the simple job, but not like Universal Extractor
Ubuntu: p7zip-rar or p4zip or Archive Manager
Related
I'm looking for a way to update tgz file.
I know what I can update tgz file by unzipping it, inserting file in directory and re-compressing it.
But I do not want decompressed file was created in my disc.
Tar has 'r' option(--append) what appending files to the end of an archive but when I use tar with 'r' option, console logged 'cannot update compressed archive.', 'Error is not recoverable: exiting now.'
If it is impossible, I cannot help but if there is a way, plz let me know.
I modified my question according to Basile Starynkevitch's comment.
This is command for testing.
tar -rf myComppressedFile.tgz insert.txt
Result
tar: Cannot update compressed archives
tar: Error is not recoverable: exiting now
vim myCompressedFile.tgz
./file1.txt
./file2.txt
./file3.txt
./directory1/
./directory1/file4.txt
what I want to update
./file1.txt
./file2.txt
./file3.txt
./directory1/
./directory1/file4.txt
./insert.txt <<<< I want to add like this file
My tgz file is about more 100 megabytes.
Is there a way to add or update file in tar.gz(tgz) without decompress?
No, there is no way.
Since compression by gzip happens on the entire tar archive (after making the tar ball). Observe the runtime behavior of your tar command using strace(1) (so read syscalls(2)) or ltrace(1)
Read documentation of tar(1). Study the source code of GNU tar (it is free software).
On my Debian/Sid/x86-64, the libc6-dev package provides the /usr/include/tar.h header. I invite you to look inside that header file, it describes the format of tar balls.
Consider using other approaches: some sqlite database, using postgresql or mongodb or GDBM, some afio archive (where each file is perhaps compressed before archiving it). See also the tardy utility.
I'm looking for a way to update tgz file
Why ?
Did you consider instead using version control on the underlying files before their archival with tar? Something like git?
But I do not want decompressed file was created in my disk.
Why ?
The decompressed file might be a temporary one....
Of course, you would need to have different approaches if you deal with a few megabytes of data or if you have a few petabytes of data. Of course, if you are developing a critical embedded application (e.g. medical devices, interplanetary satellite software, DO-178C software systems), things could be different.
There are lots of trade-offs to consider (including money and development time and economical impact of loss of data or of corrupted archive and legal regulations regarding the archive and its data integrity : in a flight recorder you should ensure the data is readable after the crash of an aircraft).
My tgz file is about more 100 megabytes.
This is tiny, and practically would fit (for most Linux systems in 2020, even a cheap RaspBerryPi) in the page cache. You might use some file system kept in RAM (e.g. tmpfs)
I zipped a large regular unix file (.dat) using tar -cvzf command . This file is of around 200 gb in size.
After zipping it became 27gb in size. But while reading data in that zipped file i can see annonymous data added at start of file.
Is this possible?
I tried to unzip that file again and found that unzipped file has no such anonymous records.
The GNU tar command is free software. Please study its source code. Read of course its man page, tar(1).
Indeed, a tar archive starts with a header documented in header file tar.h. There is a POSIX standard related to tar.
See also Peter Miller's tardy utility.
Don't confuse tar archives with zip ones handled by Info-ZIP (so zip and unzip commands).
GNU zip -a compressor, the gzip program which can be started by tar, notably your tar czvf command- is also free software, and of course you should study its source code if interested.
Some Unix shells (notably sash or busybox) have a builtin tar.
I tried to unzip that file again and found that unzipped file has no such anonymous records.
AFAIK, most Linux filesystems try to implement more or less the POSIX standard -based upon read(2) and write(2) system calls, and they don't know about records. If you need "records", consider using databases (like sqlite or PostGreSQL) or indexed files (like GDBM) - both built above Linux file systems or block devices.
Read also a good textbook on operating systems.
Notice that "a large regular unix file" is mostly a sequence of bytes. There is no notion of records inside them, except as a convention used by other user-space programs thru syscalls(2). See also path_resolution(7) and inode(7).
I manage a computer cluster. It is a multi-user system. I have a large directory filled with files (terabytes in size). I'd like to compress it so the user who owns it can save space and still be able to extract files from it.
Challenges with possible solutions :
tar : The directory's size makes it challenging to decompress the subsequent tarball due to tar's poor random access read. I'm referring to the canonical way of compressing, i.e. tar cvzf mytarball.tar.gz mybigdir
squashfs : It appears that this would be a great solution, except in order to mount it, it requires root access. I don't really want to be involved in mounting their squashfs file every time they want to access a file.
Compress then tar : I could compress the files first and then use tar to create the archive. This would have the disadvantage that I wouldn't save as much space with compression and I wouldn't get back any inodes.
Similar questions (here) have been asked before, but the solutions are not appropriate in this case.
QUESTION:
Is there a convenient way to compress a large directory such that it is quick and easy to navigate and doesn't require root permissions?
You add it in tags, but do not mention it in question. For me zip is the simplest way to manage big archives (with many files). Moreover tar+gzip is actually two step operation which need special operations to speedup. And zip is available for lot of platforms so you win also in this direction.
I am trying different ways to update/write an image on a linux device and using rsync for this.
For file system synchronization rsync checks and only transfers missing /changed files reducing the bandwidth.
In similar way I created a binary file of 10MB(original.bin) and modified this file by adding few changes (modified.bin)and tried to rsync the original.bin file.First time it transfers the whole file as there is no copy on the device.Next modified.bin file is renamed to original.bin and did rsync. It only transferred changes in the modified.bin I want to know if this is the same with .dd.xz files as well. I have 2 .dd.xz files (image1.dd.xz and image2.dd.xz which has few dlls and mono packgaes added) and when these files are extracted to .dd files and rsync transfers only changes.
But when i rsync the files as .dd.xz it transfers the whole file again. Can some one help me to understand if this is expected behaviour or rsync behaves same on .dd files as with any other text files?
xz is the extension used by the xz compress tool. Compressed files don't work with rsync for obvious reasons.
Consider whether you're better off using dd images without compressing them. You can (de)compress them faster using the pixz command which does its job in parallel using all available processors.
Usually I create archives of data in linux at the commandline with tar & gzip (or pigz - as this uses parallel processing for compression).
However, listing the contents of such an archive is painfully slow because of the sequential format of tar archives. This is especially true if an archive contains many files that are several GB each.
What is an alternative to my combination to create gzipped tar archives of files in linux. Especially I'm looking for something that allows for a retrieval of the list or tree of files inside the archives, similar to tar - but much more performant?
zip? The zip file format contains a catalog of the contents (at the end, IIRC), which can be retrieved with zipinfo(1).
7zip is probably the best solution nowadays.
ZIP format is a bit outdated and was designed for FAT filesystem, which is where many of its limitations come from.
dar also might be an option. But as far as I can tell there is only one developer and no community around (unlike 7zip which has several forks and ports made by independent developers).