Large Zip file is corrupted on 32 bit JDK - zip

We are creating single Zip file from multiple files using ZipOutputStream (on 32 bit jdk).
If we create Zip file using 5 pdf files(each pdf is of 1 GB) than it creates corrupt Zip file. If I create Zip file using (4 pdf files - each pdf is of 1 GB) than it creates correct Zip file.
Is there any limitation in Zip file size on 32 bit JDK?

The original ZIP format had a number of limits (uncompressed size of a file, compressed size of a file and total size of the archive) at 4GB.
More info: http://en.wikipedia.org/wiki/ZIP_(file_format)
The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16 EiB (264 bytes).
The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip supports ZIP64 from version Java 7.[29]

Related

Why does PRDownloader have a download limitation?

Hello, I am using the PRDownloader library to download files, but there seems to be some limitation with the file size.
implementation 'com.mindorks.android:prdownloader:0.6.0'
When I download files with a size of 20 Mb, there is no problem, it downloads fine, but if I download a file with a weight of 25 Mb, it doesn't download completely, it downloads a file of 2.21KB.
Why doesn't it let me download larger files?
How can I remove this limitation, to be able to download larger files?
Thank you.

NIFI excel file

I have an Excel file that has several sheets. Preprocessor Convertexcelto CSV processor can not convey it. How to solve this error?
org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor.error
IOException thrown from ConvertExcelToCSVProcessor[id=a0aa3e82-1d3c-18c0-b1a7-b9d4429d5958]:
java.io.IOException: Zip bomb detected! The file would exceed the max.
ratio of compressed file size to the size of the expanded data. This
may indicate that the file is used to inflate memory usage and thus
could pose a security risk. You can adjust this limit via
ZipSecureFile.setMinInflateRatio() if you need to work with files
which exceed this limit. Uncompressed size: 358510, Raw/compressed
size: 3584, ratio: 0.009997 Limits: MIN_INFLATE_RATIO: 0.010000,
Entry: xl/styles.xml

why does it take so long to delete about 5000 files of very small size (about 100 bytes) on windows 10 64-bit

i was looking for a specific file in windows explorer when i noticed that there are copies of "desktop.ini" file everywhere, i went to the parent folder and ran a windows search for "desktop.ini" and found about 5000 files with the same name at various locations (subfolders), all in the range of about 100-200 bytes, i selected all of them and deleted them, why did it take windows about 2.5 minutes to delete all of those files? assuming an average file size of about 150 bytes, the amount of data to be deleted should be approximately (150*5000) / 1024 kBytes, i.e, about 732 kB. how is windows then able to delete a single file of much greater sizes instantaniously?
Delete function is O(n), where n equals the item count, not the file size....
in English:
your file's aren't "erased".... only it's registration entry is deleted. This depends mostly on the amount of files, not the amount of bytes.
By the way, this is how you can recover "deleted files"; by restoring it's registration entry.

Does Unzip algorithm runs over whole compressed data? or over certain number of bytes?

I have to unzip a file which is being downloaded from a server. I have gone through the zip file structure. What I want to understand is, if compressed data is constructed using how many bytes of a data bytes? Is the compression algorithm runs over all the file and generates output or compression algorithm runs on let's say 256 bytes, output result, and select next 256 bytes.
Similarly, do I need to download the whole file before running uncompressing algorithm? or I can download 256 bytes ( for example) and run the algorithm on it?

With which data format can I distribute a big number of small files?

I am about to publish a machine learning dataset. This dataset contains about 170,000 files (png images of 32px x 32px). I first wanted to share them by a zip archive (57.2MB). However, extracting those files takes extremely long (more than 15 minutes - I'm not sure when I started).
Is there a better format to share those files?
Try .tar.xz - better compression ratio but a little slower to extract than .tar.gz
I just did some Benchmarks:
Experiments / Benchmarks
I used dtrx to extract the following and time dtrx filename to get the time.
Format File size Time to extract
.7z 27.7 MB > 1h
.tar.bz2 29.1 MB 7.18s
.tar.lzma 29.3 MB 6.43s
.xz 29.3 MB 6.56s
.tar.gz 33.3 MB 6.56s
.zip 57.2 MB > 30min
.jar 70.8 MB 5.64s
.tar 177.9 MB 5.40s
Interesting. The extracted content is 47 MB big. Why is .tar more than 3 times the size of its content?
Anyway. I think tar.bz2 might be a good choice.
Just use tar.gz at the lowest compression level (just to get rid of the tar zeros between files). png files are already compressed, so there is no point in trying to compress them further. (Though you can use various tools to try to minimize the size of each png file before putting them into the distribution.)

Resources