reading zip file header efficiently in node.js - node.js

There are various zip modules for node. Generally they seem to follow a pattern like this:
// Creating a zipfile object
var zf = new zipfile.ZipFile('./test/data/world_merc.zip');
// the zipfile has a list of names:
// zf.names[0] === 'world_merc.prj'
The snippet above was lifted from the node-zipfile README here https://github.com/mapbox/node-zipfile, but for example a similar example exists for the AdmZip package: https://github.com/cthackers/adm-zip.
So this struck me as odd, because it appears both of these libraries assume synchronous code (at the very least, you need to open the file to read the header, which is blocking, right)?
So I dug into the implementation of AdmZip and it turns out you can pass a buffer to the AdmZip constructor, e.g. you can do this:
fs.readFile('./my_file.zip', function(err, buffer) {
var zip = new AdmZip(buffer);
var zipEntries = zip.getEntries();
});
But that's only marginally better, because it appears AdmZip expects that I want to read the whole file in just to access the header. I read the zip spec and my understanding is that the file "central directory file header" which lists the contents is at the end of the file anyway.
So that was a super long lead in to the question, does there exist a node library which will efficiently and asynchronously read the zip contents (e.g. not realize the entire zip file in memory if all I'm going to do is look at the central directory header)?

After much searching I found a suitable implementation that efficiently reads the header async:
https://github.com/antelle/node-stream-zip

Related

How to open a binary file in my case .nii file using node.js

I want to open a binary file, or at least when I try to open this with the vscode editor, is say that, can't be opened, because is a binary file.
Can someone explain to me what I can do in order to open this type of files and read the content?
About the .nii file format. is a NIFTI1 and used on medical visualization like MRI.
What I trying to do is to read this file at the lowest level and then make some computations.
I will like to use Node.js for this, not any Python or C++.
More details about the file format can be found here.
https://nifti.nimh.nih.gov/
I don't know about how VScode handle binary file but for exemple with Atom (or with another text editor like vi), you can open and view the content of a binary file. This is not very usefull however as the content is not particularly human readable, except maybe some metadata at the top of the file.
$ vim yourniifile.nii
Anyway, it's all depends on what you want to do with that file, which "computation" you're planned to apply to it, and how you will use it after that.
Luckily, there are some npm packages that can help you with the task of reading and processing that kind of file, like nifti-reader-js or nifti-js, for exemple:
const fs = require('fs');
const niftijs = require('nifti-js');
let rawData = fs.readFileSync('yourniifile.nii');
let data = niftijs.parse(rawData);
console.log(data);

adm-zip doesn't compress data

I'm trying to use adm-zip to add files from memory to a zip file also in memory. It seems that the zip file is created correctly (the result of saving zipData can be unzipped in Windows), but the compression ratio is always zero.
This is a model of the code that I expected to work but doesn't. As can be seen from the output, "compressedData" is null and "size" and "compressedSize" are the same whatever value is passed as the file content.
var admzip = require("adm-zip")
var zip = new admzip();
zip.addFile("tmp.txt", "aaaaaaaaaaaaaaaaaaaa");
var zipData = zip.toBuffer();
console.log(zip.getEntries()[0].toString());
https://runkit.com/embed/pn5kaiir12b0
How do I get it to compress the files as well as just zipping?
This is an old question but to anyone who is also experiencing this issue, the reason is that the adm-zip does not compress the data until the compressedData field is accessed for the first time.
Quote from the docs
[Buffer] Buffer compressedData When setting compressedData, the LOC
Data Header must also be present at the beginning of the Buffer. If
the compressedData was set for a ZipEntry anf no other change was made
to its properties (comment, extra etc), reading this property will
return the same value. If changes had been made, reading this property
will recompress the data and recreate the entry headers.
If no compressedData was specified, reading this property will
compress the data and create the required headers.
The output of the compressedData Buffer contains the LOC Data Header

nodejs - change contents of zip file without re-generating the whole archive

I'm using node-zip (which uses JSZip under the hood). I need to modify the contents of a zip file, and I'd love to be able to modify it without generating the entire zip again, because it can take a long time for large archives. Here's an example:
var zip = new JSZip()
// Add some files to the zip
zip.file('file1', "file 1 contents\n")
zip.file('file2', "file 2 contents\n")
zip.file('file3', "file 3 contents\n")
// Generate the zip file
buffer = zip.generate()
// Make some changes
zip.file('file1', "changed file contents\n")
// Now I have to generate the entire zip again to get the buffer
buffer = zip.generate()
How can I do something like
updatedBuffer = zip.updateFile(buffer, 'file1', 'changed file contents\n')
Where I get an updated archive buffer but I only have to spend CPU cycles updating the one file
Assuming JSZip v2 here (zip.generate()):
You can get the buffer with asNodeBuffer(), modify it and update the content for your file:
var buffer = zip.file("file1").asNodeBuffer();
// change buffer
zip.file("file1", buffer);
Edit: if you mean editing in place a zip file stored on disk: no, JSZip can't do that.

File much larger after copying with it.bytes

I wanted to copy a file from one location to another using a Groovy script. I found that the copied files was orders of magnitude larger than the original file after copying.
After some trial and error I found the correct way to copy but am still puzzled as to why it should be bigger.
def existingFile = new File("/x/y/x.zip")
def newFile1 = new File("/x/y/y.zip")
def newFile2 = new File("/x/y/z.zip")
new File(newFile1) << new File(existingFile).bytes
new File(newFile2).bytes = new File(existingFile).bytes
If you run this code, newFile1 will be much larger than existingFile, while newFile2 will be the same size as existingFile.
Note that both zip files are valid afterwards.
Does anyone know why this happens? Am I use the first copy incorrectly? Or is it something odd in my setup?
If the file already exists before this code is called then you'll get different behaviour from << and .bytes = ...
file << byteArray
will append the contents of byteArray to the end of the file, whereas
file.bytes = byteArray
will overwrite the file with the specified content. When the byteArray is ZIP data, both versions will give you a result that is a valid ZIP file, because the ZIP format can cope with arbitrary data prepended to the beginning of the file before the actual ZIP data without invalidating the file (typically this is used for things like self-extracting .exe stubs).

node.js readFile race condition - reading file when not fully written to disk

My node.js app runs a function every second to (recursively) read a directory tree for .json files. These files are uploaded to the server via FTP from clients, and are placed in the folder that the node script is running on.
What I've found (at least what I think is happening), is that node is not waiting for the .json file to be fully written before trying to read it, and as such, is throwing a 'Unexpected end of input' error. It seems as though the filesystem needs a few seconds (milliseconds maybe) to write the file properly. This could also have something to do with the file being written from FTP (overheads possibly, I'm totally guessing here...)
Is there a way that we can wait for the file to be fully written to the filesytem before trying to read it with node?
fs.readFile(file, 'utf8', function(err, data) {
var json = JSON.parse(data); // throws error
});
You can check to see if the file is still growing with this:
https://github.com/felixge/node-growing-file

Resources