How to read a section of a large file with Node.js?

How to read a section of a large file with Node.js? - node.js

I have a very large, binary file (>25 GB), and I need to very quickly read a small range of bytes from it at a specific offset. How can I accomplish this in Node.js in an efficient way?

A fairly minimal example of what you want, refer to https://nodejs.org/api/all.html#fs_fs_createreadstream_path_options for more details
const fs = require("fs");
const stream = fs.createReadStream("test.txt", { start: 1, end: 5 });
stream.on("data", chunk => console.log(chunk.toString()));
Provided you have a file called test.txt of course...

Related

adm-zip not adding all files

I'm noticing a strange behavior while using this library. I'm trying to compress multiple EML files, to do so I first convert them to buffers and add them to adm-zip instance using the addFile() method. Here's my code:
const zip = new AdmZip();
assetBodies.forEach((body) => {
// emlData to buffer
let emlBuffer = Buffer.from(body);
zip.addFile(`${new Date().getTime()}.eml`, emlBuffer);
});
zip.getEntries().forEach((entry) => {
console.log("entry name", entry.entryName);
});
const willSendthis = zip.toBuffer();
The problem is that sometimes it compresses all the files and sometimes it doesn't.
For example, I received 5 items in the assetBodies array, but when I log the entries of the zip file I only see 1 or 2, sometimes 5.
Am I missing something or there's an issue with the library?
EDIT:
It's worth mentioning that some of the files are quite large in terms of text so I wonder if that could be the issue

gunzip partials read from read-stream

I use Node.JS to fetch files from my S3 bucket.
The files over there are gzipped (gz).
I know that the contents of each file is composed by lines, where each line is a JSON of some record that failed to be put on Kinesis.
Each file consists of ~12K such records. and I would like to be able to process the records while the file is being downloaded.
If the file was not gzipped, that could be easily done using streams and readline module.
So, the only thing that stopping me from doing this is the gunzip process which, to my knowledge, needs to be executed on the whole file.
Is there any way of gunzipping a partial of a file?
Thanks.
EDIT 1: (bad example)
Trying what #Mark Adler suggested:
const fileStream = s3.getObject(params).createReadStream();
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', line => {
const gunzipped = zlib.gunzipSync(line);
console.log(gunzipped);
})
I get the following error:
Error: incorrect header check
at Zlib._handle.onerror (zlib.js:363:17)

Yes. node.js has a complete interface to zlib, which allows you to decompress as much of a gzip file at a time as you like.

A working example that solves the above problem
The following solves the problem in the above code:
const fileStream = s3.getObject(params).createReadStream().pipe(zlib.createGunzip());
const lineReader = readline.createInterface({input: fileStream});
lineReader.on('line', gunzippedLine => {
console.log(gunzippedLine);
})

Writing long strings to file (node js)

I have a string which is 169 million chars long, which I need to write to a file and then read from another process.
I have read about WriteStream and ReadStream, but how do I write the string to a file when it has no method 'pipe'?

Create a write stream is a good idea. You can use it like this:
var fs = require('fs');
var wstream = fs.createWriteStream('myOutput.txt');
wstream.write('Hello world!\n');
wstream.write('Another line\n');
wstream.end();
You can call to write as many time as you need, with parts of that 16 million chars string. Once you have finished writing the file, you can create a read stream to read chunks of the file.
However, 16 million chars are not that much, I would say you could read and write it at once and keep in memory the whole file.
Update: As requested in comment, I update with an example to pipe the stream to zip on-the-fly:
var zlib = require('zlib');
var gzip = zlib.createGzip();
var fs = require('fs');
var out = fs.createWriteStream('input.txt.gz');
gzip.pipe(out);
gzip.write('Hello world!\n');
gzip.write('Another line\n');
gzip.end();
This will create a gz file, and inside, only one file with same name (without the .gz at the end).

This might solve your problem
var fs = require('fs');
var request = require('request');
var stream = request('http://i.imgur.com/dmetFjf.jpg');
var writeStream = fs.createWriteStream('./testimg.jpg');
stream.pipe(writeStream);
Follow the link for more details
http://neethack.com/2013/12/understand-node-stream-what-i-learned-when-fixing-aws-sdk-bug/

If you're looking to write what's called a blocking process, eg something that will prevent you from doing something else, approaching that process asynchronously is the best solution (and why node.js is good at solving these types of problems). With that said, avoid methods that have fs.*Sync as that will be a synchronous method. fs.writeFile is what I believe you're looking for. Read the Docs

In node.js: How to convert jpg images to binaries data?

And on the contrary, how can I convert the binaries data back to image? Because the image data save in the backend are stored as binaries.

Try this .
var fs = require("fs");
fs.readFile('image.jpg', function(err, data) {
if (err) throw err;
// Encode to base64
var encodedImage = new Buffer(data, 'binary').toString('base64');
// Decode from base64
var decodedImage = new Buffer(encodedImage, 'base64').toString('binary');
});
Hope it will be useful for you.

You can do it by using fs.createReadStream instead of Buffer, Buffer is deprecated method.
Find more info about the differences in https://medium.com/tensult/stream-and-buffer-concepts-in-node-js-87d565e151a0

If you want a solution for reading files(obviously you can read images too) and get it converted to binary, I wrote a small code in NodeJS, have a look hope it will help you out. It is all about reading a file into binary, but surely you can convert the string to array or byte-array. If you get suck here, please let me know in the comments below.
Here is a simple yet robust snippet you can try.
params format:
getBinary({
path : '<file_relative_path>',
padlength: '<prepending_padding_length>', (Default: 4)
debug: false, (Default: true)
limit: 10 (Default: Full_File_Length)
putSpacing: Boolean (Default: false)
})
Params Description:
1. path: Specifies the relative file path, to be read.
2. padlength: After reading the file, it reads object as number
(ex: hex(f): 1111, hex(0): 0), so if you need a
uniform length binary string then you will need to
fill the strings. as hex(0): 0000 when padlength is 4.
3. limit: limits the read buffer to render.
4. putSpacing: if true it puts a space after each padlength.
or
getBinary('<file_relative_path>');
Get it here: https://computopedia.com/how-to-convert-image-to-binary-nodejs/
Gist: https://gist.github.com/shankha96/cffe620776066078289ea1f8b15956e0

Buffer entire file in memory with Node.js

I have a relatively small file (some hundreds of kilobytes) that I want to be in memory for direct access for the entire execution of the code.
I don't know exactly the internals of Node.js, so I'm asking if a fs open is enough or I have to read all file and copy to a Buffer?

Basically, you need to use the readFile or readFileSync function from the fs module. They return the complete content of the given file, but differ in their behavior (asynchronous versus synchronous).
If blocking Node.js (e.g. on startup of your application) is not an issue, you can go with the synchronized version, which is as easy as:
var fs = require('fs');
var data = fs.readFileSync('/etc/passwd');
If you need to go asynchronous, the code is like that:
var fs = require('fs');
fs.readFile('/etc/passwd', function (err, data ) {
// ...
});
Please note that in either case you can give an options object as the second parameter, e.g. to specify the encoding to use. If you omit the encoding, the raw buffer is returned:
var fs = require('fs');
fs.readFile('/etc/passwd', { encoding: 'utf8' }, function (err, data ) {
// ...
});
Valid encodings are utf8, ascii, utf16le, ucs2, base64 and hex. There is also a binary encoding, but it is deprecated and should not be used any longer. You can find more details on how to deal with encodings and buffers in the appropriate documentation.

As easy as
var buffer = fs.readFileSync(filename);

With Node 0.12, it's possible to do this synchronously now:
var fs = require('fs');
var path = require('path');
// Buffer mydata
var BUFFER = bufferFile('../public/mydata');
function bufferFile(relPath) {
return fs.readFileSync(path.join(__dirname, relPath)); // zzzz....
}
fs is the file system. readFileSync() returns a Buffer, or string if you ask.
fs correctly assumes relative paths are a security issue. path is a work-around.
To load as a string, specify the encoding:
return readFileSync(path,{ encoding: 'utf8' });

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to read a section of a large file with Node.js? - node.js

I have a very large, binary file (>25 GB), and I need to very quickly read a small range of bytes from it at a specific offset. How can I accomplish this in Node.js in an efficient way?

Related

adm-zip not adding all files

gunzip partials read from read-stream

Writing long strings to file (node js)

In node.js: How to convert jpg images to binaries data?

Buffer entire file in memory with Node.js

Categories

Resources