Should all file structures in a ZIP file be consecutive? - zip

While reading a ZIP file, can we safely assume that all file structures (by that I mean Local File Header + file data (compressed or stored) + Data Descriptor) are exactly consecutive? Can there be any irrelevant data in between?

PkWare Appnote tells that
"Immediately following the local header for a file is the compressed
or stored data for the file. The series of [local file header][file
data][data descriptor] repeats for each file in the .ZIP archive."
So there should be no gaps between them.
However, I would recommend to parse and read central directory, not go through local file headers (except that you need streamed processing).

Related

Steganography with Microsoft Word - DOCX

I write a application hide a of string within a .docx file.
A Docx file comprises of a collection of XML files that are contained inside a ZIP archive. So, My program treat that file like a zip file and hide secret string in it.
After research, I found a way to insert data to ZIP archive.
A Secret String is injected after a file section right before the 1st central directory header. After that, a pointer in an end of central directory record is updated to compensate the shift of the central directory header.
My output docx file work fine with typical file archivers (7-zip, WinRAR, File Roller, &c) or file managers (Windows Explorer).
But when I open my output docx file with Microsoft Word it said:
Here is link for input and output file
What step did I wrong or missing?

To find another piece of a zip partitioned compressed file

The following files exist:
compress.zip
compress.z01
compress.z02
compress.z03
compress.z04
compress.z05 (fake)
An error occurs in the above case.
Is there a way to get information about other partitioned compressed file fragments by reading compressed.zip?
For example, information about z01, z02, z03, and z04

Is there a tool to extract a file from a ZIP archive when that file is not present in central directory but has its own LFH?

I'm looking for a tool that can extract files by searching aggressively through a ZIP archive. The compressed files are preceded with LFHs but no CDHs are present. Unzip outputs an empty folder.
I found one called 'binwalk' but even though it finds the hidden files inside ZIP archives it seems not to know how to extract them.
Thank You in advance.
You can try sunzip. It reads the zip file as a stream, and will extract files as it encounters the local headers and compressed data.
Use the -r option to retain the files decompressed in the event of an error. You will be left with a temporary directory starting with _z containing the extracted files, but with temporary, random names.

Parse bytes of a zip file?

I am requesting a zip file from an API and I'm trying to retrieve it by bytes range (setting a Range header) and then parsing each of the parts individually. After reading some about gzip and zip compression, I'm having a hard time figuring out:
Can I parse a portion out of a zip file?
I know that gzip files usually compresses a single file so you can decompress and parse it in parts, but what about zip files?
I am using node-js and tried several libraries like adm-zip or zlib but it doesn't look like they allow this kind of possibility.
Zip files have a catalog at the end of the file (in addition to the same basic information before each item), which lists the file names and the location in the zip file of each item. Generally each item is compressed using deflate, which is the same algorithm that gzip uses (but gzip has a custom header before the deflate stream).
So yes, it's entirely feasible to extract the compressed byte stream for one item in a zip file, and prepend a fabricated gzip header (IIRC 14 bytes is the minimum size of this header) to allow you to decompress just that file by passing it to gunzip.
If you want to write code to inflate the deflated stream yourself, I recommend you make a different plan. I've done it, and it's really not fun. Use zlib if you must do it, don't try to reimplement the decompression.

Find data file in Zip file using microcontroller

I need a microcontroller to read a data file that's been stored in a Zip file (actually a custom OPC-based file format, but that's irrelevant). The data file has a known name, path, and size, and is guaranteed to be stored uncompressed. What is the simplest & fastest way to find the start of the file data within the Zip file without writing a complete Zip parser?
NOTE: I'm aware of the Zip comment section, but the data file in question is too big to fit in it.
I ended up writing a simple parser that finds the file data in question using the central directory.

Resources