nodejs - change contents of zip file without re-generating the whole archive - node.js

I'm using node-zip (which uses JSZip under the hood). I need to modify the contents of a zip file, and I'd love to be able to modify it without generating the entire zip again, because it can take a long time for large archives. Here's an example:
var zip = new JSZip()
// Add some files to the zip
zip.file('file1', "file 1 contents\n")
zip.file('file2', "file 2 contents\n")
zip.file('file3', "file 3 contents\n")
// Generate the zip file
buffer = zip.generate()
// Make some changes
zip.file('file1', "changed file contents\n")
// Now I have to generate the entire zip again to get the buffer
buffer = zip.generate()
How can I do something like
updatedBuffer = zip.updateFile(buffer, 'file1', 'changed file contents\n')
Where I get an updated archive buffer but I only have to spend CPU cycles updating the one file

Assuming JSZip v2 here (zip.generate()):
You can get the buffer with asNodeBuffer(), modify it and update the content for your file:
var buffer = zip.file("file1").asNodeBuffer();
// change buffer
zip.file("file1", buffer);
Edit: if you mean editing in place a zip file stored on disk: no, JSZip can't do that.

Related

Linux Split for tar.gz works well when joined but when tranferred to remote machine with help of S3 bucket

I have few files which i did tar.gz.
As this file can get too big thus I used the Linux split.
As this needs to be transferred to a different machine i have used s3 bucket to transfer these files. I used application/octet-stream content-type to upload these files.
The files when downloaded shows exactly same size as original size thus no bytes are lost.
now when I do cat downloaded_files_* > tarball.tar.gz the size is exactly as the original file
but only the part with _aa gets extracted.
i checked the type of files
file downloaded_files_aa
this is tar zip file(gzip compressed data, from Unix, last modified: Sun May 17 15:00:41 2020)
but all other files are data files
I am wondering how can i get the files.
Note: Http upload via API gateway done to upload the files to s3
================================
Just putting my debugging finding with a hope probably it will help someone facing same problem.
As we wanted to use API gateway out upload calls were done http calls. This is something which is not using regular aws sdk.
https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html
Code Samples: https://docs.aws.amazon.com/AmazonS3/latest/API/samples/AWSS3SigV4JavaSamples.zip
After some debugging, we found this leg was working fine.
As the machine which we wanted to download the files had direct access to s3 we used the aws sdk for downloading the files.
This is the URL
https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html
this code does not work well, though it showed the exact file size download as upload the file lost some information. The code also complained about still pending bytes. Some changes were done to get rid of error but it never worked.
the code which I found here is working like magic
InputStream reader = new BufferedInputStream(
object.getObjectContent());
File file = new File("localFilename");
OutputStream writer = new BufferedOutputStream(new FileOutputStream(file));
int read = -1;
while ( ( read = reader.read() ) != -1 ) {
writer.write(read);
}
writer.flush();
writer.close();
reader.close();
This code also make the download much faster then our previous approach.

Create CSV or PDF files without saving it to disc and stream it to AWS S3 Bucket

I have to create 2 files 1. PDF and 2. CSV, Both should be generated on a successful result.
I can generate PDF from HTML template using html-pdf package, I can create file stream without actually saving the file to disk, then stream the file using streaming-s3 package.
My problem starts with CSV files.
Curerently I'm creating a CSV file on the disk,
let testCsvData = ["Line 1", "Line 2", "Line 3", ..., "Line n"];
const testCSV = path.join('./test.csv');
fs.writeFileSync(testCSV, testCsvData.join(os.EOL));
Then createReadStream and get the file stream to push.
let testStream = fs.createReadStream('./test.csv');
I don't want to save the CSV file to the disk and then get the file stream. Thats simply waste of space right!
Is there any way that I can directly get the file stream without saving it?

adm-zip doesn't compress data

I'm trying to use adm-zip to add files from memory to a zip file also in memory. It seems that the zip file is created correctly (the result of saving zipData can be unzipped in Windows), but the compression ratio is always zero.
This is a model of the code that I expected to work but doesn't. As can be seen from the output, "compressedData" is null and "size" and "compressedSize" are the same whatever value is passed as the file content.
var admzip = require("adm-zip")
var zip = new admzip();
zip.addFile("tmp.txt", "aaaaaaaaaaaaaaaaaaaa");
var zipData = zip.toBuffer();
console.log(zip.getEntries()[0].toString());
https://runkit.com/embed/pn5kaiir12b0
How do I get it to compress the files as well as just zipping?
This is an old question but to anyone who is also experiencing this issue, the reason is that the adm-zip does not compress the data until the compressedData field is accessed for the first time.
Quote from the docs
[Buffer] Buffer compressedData When setting compressedData, the LOC
Data Header must also be present at the beginning of the Buffer. If
the compressedData was set for a ZipEntry anf no other change was made
to its properties (comment, extra etc), reading this property will
return the same value. If changes had been made, reading this property
will recompress the data and recreate the entry headers.
If no compressedData was specified, reading this property will
compress the data and create the required headers.
The output of the compressedData Buffer contains the LOC Data Header

reading zip file header efficiently in node.js

There are various zip modules for node. Generally they seem to follow a pattern like this:
// Creating a zipfile object
var zf = new zipfile.ZipFile('./test/data/world_merc.zip');
// the zipfile has a list of names:
// zf.names[0] === 'world_merc.prj'
The snippet above was lifted from the node-zipfile README here https://github.com/mapbox/node-zipfile, but for example a similar example exists for the AdmZip package: https://github.com/cthackers/adm-zip.
So this struck me as odd, because it appears both of these libraries assume synchronous code (at the very least, you need to open the file to read the header, which is blocking, right)?
So I dug into the implementation of AdmZip and it turns out you can pass a buffer to the AdmZip constructor, e.g. you can do this:
fs.readFile('./my_file.zip', function(err, buffer) {
var zip = new AdmZip(buffer);
var zipEntries = zip.getEntries();
});
But that's only marginally better, because it appears AdmZip expects that I want to read the whole file in just to access the header. I read the zip spec and my understanding is that the file "central directory file header" which lists the contents is at the end of the file anyway.
So that was a super long lead in to the question, does there exist a node library which will efficiently and asynchronously read the zip contents (e.g. not realize the entire zip file in memory if all I'm going to do is look at the central directory header)?
After much searching I found a suitable implementation that efficiently reads the header async:
https://github.com/antelle/node-stream-zip

File much larger after copying with it.bytes

I wanted to copy a file from one location to another using a Groovy script. I found that the copied files was orders of magnitude larger than the original file after copying.
After some trial and error I found the correct way to copy but am still puzzled as to why it should be bigger.
def existingFile = new File("/x/y/x.zip")
def newFile1 = new File("/x/y/y.zip")
def newFile2 = new File("/x/y/z.zip")
new File(newFile1) << new File(existingFile).bytes
new File(newFile2).bytes = new File(existingFile).bytes
If you run this code, newFile1 will be much larger than existingFile, while newFile2 will be the same size as existingFile.
Note that both zip files are valid afterwards.
Does anyone know why this happens? Am I use the first copy incorrectly? Or is it something odd in my setup?
If the file already exists before this code is called then you'll get different behaviour from << and .bytes = ...
file << byteArray
will append the contents of byteArray to the end of the file, whereas
file.bytes = byteArray
will overwrite the file with the specified content. When the byteArray is ZIP data, both versions will give you a result that is a valid ZIP file, because the ZIP format can cope with arbitrary data prepended to the beginning of the file before the actual ZIP data without invalidating the file (typically this is used for things like self-extracting .exe stubs).

Resources