Why can't we gzip more than 1 file using 7zip

Why can't we gzip more than 1 file using 7zip - zip

Recently I noticed that if I try to compress more than 1 file using 7zip, gzip format is not present in the Archive Format-List. Can anyone explain why?
Can't we have more than 1 file in gzip?
Screenshot
I'm using 7-Zip v9.20. I also tried with v16. But same there/
Thanks in advance

No. You can't have more than one file in the gzip format.
Instead you would use tar followed by gzip. tar converts a set of directories and files into a stream of bytes, which is then compressed by gzip. You have probably already seen these files, with the suffix .tar.gz.

Related

File compressed with tar gzip or zip has slightly the same size

I tried to compress the same file with different compression type :
tar cjf dump.sql.tar.bz dump.sql
tar czf dump.sql.tar.gz dump.sql
zip -r dump.sql.zip dump.sql
results :
size file
79968725 dump.sql ~77MB
9846256 dump.sql.tar.bz ~9.4MB
13863797 dump.sql.tar.gz ~14MB
13863826 dump.sql.zip ~14MB
The best compression comes with bzip2 file. Anyway, the original file was compressed with bzip2 but its size was around 7.7MB. How to get this level of compression ?
Is there some other compression types/options to get better performances ?
Also, why the gzipped file has slightly the same size as the zipped file ? I thought gzip had a better compression rate than zip. Am I missing something ?
Any tips/hints are welcomes and will be appreciated.
Thank you.

Pretty much every compressor out there (it's the case for gz, bz2, zip and xz) lets you choose the compression level (usually from 1 to 9 for instance). The faster it compresses, the lowest the compression ratio. The slower it is, the better compression you get.
The best lossless compressor I know of is xz. It should give you better compression than bz2:
vfcJ dump.sql.tar.xz dump.sql
What file size do you get with this?

Unzip the archive with more than one entry

I'm trying to decompress ~8GB .zip file piped from curl command. Everything I have tried is being interrupted at <1GB and returns a message:
... has more than one entry--rest ignored
I've tried: funzip, gunzip, gzip -d, zcat, ... also with different arguments - all end up in the above message.
The datafile is public, so it's easy to repro the issue:
curl -L https://archive.org/download/nycTaxiTripData2013/faredata2013.zip | funzip > datafile

Are you sure the mentioned file deflates to a single file? If it extracts to multiple files you unfortunately cannot unzip on the fly.
Zip is a container as well as compression format and it doesn't know where the new file begins. You'll have to download the whole file and unzip it.

Regarding lz4mt compression and linux buffering issue

I am using lz4mt multi-threaded version of lz4 and in my workflow I am sending thousands of large size files (620 MB) from client to server and when file reaches on server my rule will trigger and compress file using lz4mt and then remove uncompressed file. The problem is sometimes when I remove uncompressed file, I am not able to get compressed file of right size its because lz4mt returns immediately before sending output to disk.
So is there any way lz4mt will remove uncompressed file itself after compressing as done by bzip2.
Input: bzip2 uncompress_file
Output: Compressed file only
whereas
Input: lz4mt uncompress_file
Output: (Uncompressed + Compressed) file
Below script sync command also not working properly I think.
The script which execute as my rule triggers is:
script.sh
/bin/lz4mt uncompressed_file output_file
/bin/sync
/bin/rm uncompressed_file
Please tell me how to solve above issue.
Thanks a lot

Author here. You could try the following methods
Concatenate commands with && or ;.
Add lz4mt command line option -q (suppress prompt), and -f (force overwrite).
Try it with original lz4.

Difference in .tar.gz and first gz and then tar

I made two compressed copy of my folder, first by using the command tar czf dir.tar.gz dir
This gives me an archive of size ~16kb. Then I tried another method, first i gunzipped all files inside the dir and then used
gzip ./dir/*
tar cf dir.tar dir/*.gz
but the second method gave me dir.tar of size ~30kb (almost double). Why there is so much difference in size?

Because zip process in general is more efficient on big sample than on small files. You have zipped 100 files of 1ko for example. Each file will have a certain compression, plus the overhead of the gzip format.
file1.tar -> files1.tar.gz (admit 30 bytes of headers/footers)
file2.tar -> files2.tar.gz (admit 30 bytes of headers/footers)
...
file100.tar -> files100.tar.gz (admit 30 bytes of headers/footers)
------------------------------
30*100 = 3ko of overhead.
But if you try to compress a tar file of 100ko (which contains your 100 files), the overhead of the gzip format will be added only one time (instead of 100 times) and the compression can be better)

Overhead from the per-file metadata and suboptimal conpression by gzip when processing files individually resulting from gzip not observing data in full and thus compressing with suboptimal dictionary (which is reset after each file).

tar cf should create an uncompressed archive, it means the size of your directory should almost be the same as your archive, maybe even more.
tar czf will run gunzip compression through it.
This can be further checked by doing a man tar in shell prompt in Linux,
-z, --gzip, --gunzip, --ungzip
filter the archive through gzip

Fast Concatenation of Multiple GZip Files

I have list of gzip files:
file1.gz
file2.gz
file3.gz
Is there a way to concatenate or gzipping these files into one gzip file
without having to decompress them?
In practice we will use this in a web database (CGI). Where the web will receive
a query from user and list out all the files based on the query and present them
in a batch file back to the user.

With gzip files, you can simply concatenate the files together, like so:
cat file1.gz file2.gz file3.gz > allfiles.gz
Per the gzip RFC,
A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.
Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.
Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.

Here is what man 1 gzip says about your requirement.
Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:
gzip -c file1 > foo.gz
gzip -c file2 >> foo.gz
Then
gunzip -c foo
is equivalent to
cat file1 file2
Needless to say, file1 can be replaced by file1.gz.
You must notice this:
gunzip will extract all members at once
So to get all members individually, you will have to use something additional or write, if you wish to do so.
However, this is also addressed in man page.
If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.

Just use cat. It is very fast (0.2 seconds for 500 MB for me)
cat *gz > final
mv final final.gz
You can then read the output with zcat to make sure it's pretty:
zcat final.gz
I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).
PV:
Better yet, if you have it, 'pv' instead of cat:
pv *gz > final
mv final final.gz
This gives you a progress bar as it works, but does the same thing as cat.

You can create a tar file of these files and then gzip the tar file to create the new gzip file
tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why can't we gzip more than 1 file using 7zip - zip

Recently I noticed that if I try to compress more than 1 file using 7zip, gzip format is not present in the Archive Format-List. Can anyone explain why? Can't we have more than 1 file in gzip? Screenshot I'm using 7-Zip v9.20. I also tried with v16. But same there/ Thanks in advance

No. You can't have more than one file in the gzip format. Instead you would use tar followed by gzip. tar converts a set of directories and files into a stream of bytes, which is then compressed by gzip. You have probably already seen these files, with the suffix .tar.gz.

Related

File compressed with tar gzip or zip has slightly the same size

Unzip the archive with more than one entry

Regarding lz4mt compression and linux buffering issue

Difference in .tar.gz and first gz and then tar

Fast Concatenation of Multiple GZip Files

Categories

Resources