I use Node.JS to fetch files from my S3 bucket.
The files over there are gzipped (gz).
I know that the contents of each file is composed by lines, where each line is a JSON of some record that failed to be put on Kinesis.
Each file consists of ~12K such records. and I would like to be able to process the records while the file is being downloaded.
If the file was not gzipped, that could be easily done using streams and readline module.
So, the only thing that stopping me from doing this is the gunzip process which, to my knowledge, needs to be executed on the whole file.
Is there any way of gunzipping a partial of a file?
Thanks.
EDIT 1: (bad example)
Trying what #Mark Adler suggested:
const fileStream = s3.getObject(params).createReadStream();
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', line => {
const gunzipped = zlib.gunzipSync(line);
console.log(gunzipped);
})
I get the following error:
Error: incorrect header check
at Zlib._handle.onerror (zlib.js:363:17)
Yes. node.js has a complete interface to zlib, which allows you to decompress as much of a gzip file at a time as you like.
A working example that solves the above problem
The following solves the problem in the above code:
const fileStream = s3.getObject(params).createReadStream().pipe(zlib.createGunzip());
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', gunzippedLine => {
console.log(gunzippedLine);
})
Related
I was wondering if it is possible to use https.get() from the Node standard library to download a zip and directly extract it into a subfolder.
I have found many solutions that download the zip first and extract it afterwards. But is there a way to do it directly?
This was my attempt:
const zlib = require("node:zlib");
const fs = require("fs");
const { pipeline } = require("node:stream");
const https = require("https");
const DOWNLOAD_URL =
"https://downloadserver.com/data.zip";
const unzip = zlib.createUnzip();
const output = fs.createWriteStream("folderToExtract");
https.get(DOWNLOAD_URL, (res) => {
pipeline(res, unzip, output, (error) => {
if (error) console.log(error);
});
});
But I get this error:
Error: incorrect header check
at Zlib.zlibOnError [as onerror] (node:zlib:189:17) {
errno: -3,
code: 'Z_DATA_ERROR'
}
I am curious, is this even possible?
Most unzippers start at the end of the zip file, reading the central directory there and using that to find the entries to unzip. This requires having the entire zip file accessible at the start.
What you'd need is a streaming unzipper, which starts at the beginning of the zip file. You can try unzip-stream and see if it meets your needs.
I think this is similar to Simplest way to download and unzip files in Node.js cross-platform?
An answer in the above discussion using same package:
You're getting the error probably because zlib only support gzip files and not zip
Currently, I tried to make zip file(or any format of compressed file) containing few files that I want to put into zip file.
I thought it would work with adm-zip module.
but I found out that the way adm-zip module put files into zip is buffer.
It takes a lot of memory when I put files that size is very huge.
In the result, My server stopped working.
Below is What I'd done.
var zip = new AdmZip();
zip.addLocalFile('../largeFile', 'dir1'); //put largeFile into /dir1 of zip
zip.addLocalFile('../largeFile2', 'dir1');
zip.addLocalFile('../largeFile3', 'dir1/dir2');
zip.writeZip(/*target file name*/ `./${threadId}.zip`);
Is there any solution to solve this situation?
to solve memory issue the best practice is to use streams and not load all files into memory for example
import {
createReadStream,
createWriteStream
} from 'fs'
import { createGzip } from 'zlib'
const [, , src, dest] = process.argv
const srcStream = createReadStream(src)
const gzipStream = createGzip()
const destStream = createWriteStream(dest)
srcStream
.pipe(gzipStream)
.pipe(destStream)
I recently used event-stream library for nodejs to parse a huge csv file, saving results to database.
How do I solve the task of not just reading a file, but modifying each line, writing result to new file?
Is it some combination of through and map method, or duplex? Any help is highly appreciated.
If you use event-stream for read you can use split() method process csv line by line. Then change and write line to new writable stream.
var fs = require('fs');
var es = require('event-stream');
const newCsv = fs.createWriteStream('new.csv');
fs.createReadStream('old.csv')
.pipe(es.split())
.pipe(
es.mapSync(function(line) {
// modify line way you want
newCsv.write(line);
}))
newCsv.end();
So I am working on the nodeschool.io stream-adventure tutorial track and I'm having trouble with the last problem. The instructions say:
An encrypted, gzipped tar file will be piped in on process.stdin. To beat this
challenge, for each file in the tar input, print a hex-encoded md5 hash of the
file contents followed by a single space followed by the filename, then a
newline.
You will receive the cipher name as process.argv[2] and the cipher passphrase as
process.argv[3]. You can pass these arguments directly through to
`crypto.createDecipher()`.
The built-in zlib library you get when you `require('zlib')` has a
`zlib.createGunzip()` that returns a stream for gunzipping.
The `tar` module from npm has a `tar.Parse()` function that emits `'entry'`
events for each file in the tar input. Each `entry` object is a readable stream
of the file contents from the archive and:
`entry.type` is the kind of file ('File', 'Directory', etc)
`entry.path` is the file path
Using the tar module looks like:
var tar = require('tar');
var parser = tar.Parse();
parser.on('entry', function (e) {
console.dir(e);
});
var fs = require('fs');
fs.createReadStream('file.tar').pipe(parser);
Use `crypto.createHash('md5', { encoding: 'hex' })` to generate a stream that
outputs a hex md5 hash for the content written to it.
This is my attempt so far to work on it:
var tar = require('tar');
var crypto = require('crypto');
var zlib = require('zlib');
var map = require('through2-map');
var cipherAlg = process.argv[2];
var passphrase = process.argv[3];
var cryptoStream = crypto.createDecipher(cipherAlg, passphrase);
var parser = tar.Parse(); //emits 'entry' events per file in tar input
var gunzip = zlib.createGunzip();
parser.on('entry', function(e) {
e.pipe(cryptoStream).pipe(map(function(chunk) {
console.log(chunk.toString());
}));
});
process.stdin
.pipe(gunzip)
.pipe(parser);
I know it's not complete yet, but my issue is that when I try to run this, the input never gets piped to the tar file parsing part. It seems to hang up on the piping to gunzip. This is my exact error:
events.js:72
throw er; // Unhandled 'error' event
^
Error: incorrect header check
at Zlib._binding.onerror (zlib.js:295:17)
I'm totally stumped because the node documentation for Zlib has no mention of headers except for when it has examples with the http/request modules. There are a number of other questions regarding this error with node, but most use buffers rather than streams, so I couldn't find a relevant answer to my problem. All help is greatly appreciated
I actually figured it out, I was supposed to decrypt the stream before unzipping it.
So instead of:
process.stdin
.pipe(gunzip)
.pipe(parser);
it should be:
process.stdin
.pipe(cryptoStream)
.pipe(gunzip)
.pipe(parser);
I have a file named mytext.txt and I'd like to compress this file to archive.rar. How can I do this in nodejs?
I've found nothing similar to rar only zip.
Find an rar command line utility that you can execute like
$ rar myfile.dat compressed.rar
Node.js can do command line calls. (See child_process.exec)
Give the normal command to the exec function, and it should get the job done.
For zipping a single file, zlib module can be very useful.
(function () {
'use strict';
var zlib = require('zlib');
var gzip = zlib.createGzip();
var fs = require('fs');
var inp = fs.createReadStream('mytext.txt');
var out = fs.createWriteStream('mytext.txt.gz');
inp.pipe(gzip).pipe(out);
}());
Unfortunately Nodejs dosn't native support Rar compression/decompression, i frustated with this too so i created a module called "super-winrar" making super easy deal with rar files in nodejs :)
check it out: https://github.com/KiyotakaAyanokojiDev/super-winrar
Exemple creating a file "archive.rar" and appending "mytext.txt" file:
const Rar = require('super-winrar');
// create a rar constructor with file path! (created if not exists)
const rar = new Rar('archive.rar');
// handle erros, otherwise will throw an exception!
rar.on('error', err => console.log(err.message));
rar.once('ready', async () => {
await rar.append(['mytext.txt']);
});