createWriteStream vs writeFile? - node.js

What is the basic difference between these two operations ?
someReadStream.pipe(fs.createWriteStream('foo.png'));
vs
someReadStream.on('data', function(chunk) { blob += chunk } );
someReadStream.on('end', function() { fs.writeFile('foo.png', blob) });
When using request library for scraping, I can save pics (png, bmp) etc.. only with the former method and with the latter one there is same gibbersh (binary) data but image doesn't render.
How are they different ?

When you are working with streams in node.js you should prefer to pipe them.
According to Node.js’s stream-event docs, data events emit either buffers (by default) or strings (if encoding was set).
When you are working with text streams you can use data events to concatenate chunks of string data together. Then you'll be able to work with your data as one string.
But when working with binary data it's not so simple, because you'll receive buffers. To concatenate buffers you use special methods like Buffer.concat. It's possible to use a similar approach for binary streams:
var buffers = [];
readstrm.on('data', function(chunk) {
buffers.push(chunk);
});
readstrm.on('end', function() {
fs.writeFile('foo.png', Buffer.concat(buffers));
});
You can notice when something goes wrong by checking the output file's size.

Related

Why while loop is needed for reading a non-flowing mode stream in Node.js?

In the node.js documentation, I came across the following code
const readable = getReadableStreamSomehow();
// 'readable' may be triggered multiple times as data is buffered in
readable.on('readable', () => {
let chunk;
console.log('Stream is readable (new data received in buffer)');
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
console.log(`Read ${chunk.length} bytes of data...`);
}
});
// 'end' will be triggered once when there is no more data available
readable.on('end', () => {
console.log('Reached end of stream.');
});
Here is the comment from the node.js documentation concerning the usage of the while loop, saying it's needed to make sure all data is read
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
I couldn't understand why it is needed and tried to replace while with just if statement, and the process terminated after the very first read. Why?
From the node.js documentation
The readable.read() method should only be called on Readable streams operating in paused mode. In flowing mode, readable.read() is called automatically until the internal buffer is fully drained.
Be careful that this method is only meant for stream that has been paused.
And even further, if you understand what a stream is, you'll understand that you need to process chunks of data.
Each call to readable.read() returns a chunk of data, or null. The chunks are not concatenated. A while loop is necessary to consume all data currently in the buffer.
So i hope you understand that if you are not looping through your readable stream and only executing 1 read, you won't get your full data.
Ref: https://nodejs.org/api/stream.html

Incorect header check error from Zlib.gunzip() in Node.JS app using the HTTPS module

I have a Node.JS app and I am using the https module to make GET requests to a web server. The headers in the response coming back have the content-type set to gzip. I have directly inspected the data and it does appear to be compressed data and definitely not plain text.
I accumulate the chunks as they come in. I then try decompressing the accumulated data using zlib. So far everything I tried results in an "Incorrect header check" error when execute the decompression call. The code below shows the use of a Buffer object with type set to binary. I previously tried passing the accumulated data directly to the decompression call but that failed too.
Why doesn't this work?
// Make the request to the designated external server.
const httpsRequest = https.request(postOptions,
function(extRequest)
{
console.log('(httpsRequest) In request handler.');
// Process the response from the external server.
let dataBody = "";
// The data may come to us in pieces. The 'on' event handler will accumulate them for us.
let iNumSlices = 0;
extRequest.on('data', function(dataSlice) {
iNumSlices++;
console.log('(httpsRequest:on) Received slice # ' + iNumSlices +'.');
dataBody += dataSlice;
});
// When we have received all the data from the external server, finish the request.
extRequest.on('end', function() {
// SUCCESS: Return the result to AWS.
console.log('(httpsRequest:end) Success. Data body length: ' + dataBody.length +'.');
console.log('(httpsRequest:end) Content: ');
let buffer = Buffer.from(dataBody, "binary");
// Check for GZip compressed data.
if (extRequest.headers['content-encoding'] == 'gzip') {
// Decompress the data.
zlib.gunzip(buffer, (err, buffer) => {
if (err) {
// Reject the promise with the error.
reject(err);
return;
} else {
console.log(errPrefix + buffer.toString('utf8'));
}
});
} else {
console.log(errPrefix + dataBody);
let parsedDataBodyObj = JSON.parse(dataBody);
resolve(parsedDataBodyObj);
}
});
});
You may have it in you actual code - but the code snippet doesn't include a call to end(), which is mandatory.
It may be related to the way you accumulate the chunks with dataBody += dataSlice.
Since the data is compressed, this (probably) means that the type of a chunk is already a Buffer, and using += to concatenate it into a string seems to mess it up, even though you later call Buffer.from.
Try replacing it with making dataBody an empty array, then push chunks into it, then finally call Buffer.concat(dataBody).
Another options is that https.request already decompresses the data under the hood, so that once you accumulate the chunks into a buffer (as detailed in the previous section), all you're left with is to call buffer.toString(). I myself experienced it in this other answer and it seems to be related to Node.js version.
I'll end up this answer with a live demo of a similar working code which may come handy for you (it queries StackExchange API, gets a gzip compressed chunks, and then decompress it):
It includes a code that works on 14.16.0 (current StackBlitz version) - which, as I described, already decompresses the data under the hood - but not on Node.js 15.13.0,
It includes a commented-out code that works for Node.js 15.13.0 the latter but not for 14.16.0.

Buffering a Float32Array to a client

This should be obvious, but for some reason I am not getting any result. I have already spent way too much time just trying different ways to get this working without results.
TLDR: A shorter way to explain this question could be: I know how to stream a sound from a file. How to stream a buffer containing sound that was synthesized on the server instead?
This works:
client:
var stream = ss.createStream();
ss(socket).emit('get-file', stream, data.bufferSource);
var parts = [];
stream.on('data', function(chunk){
parts.push(chunk);
});
stream.on('end', function () {
var blob=new Blob(parts,{type:"audio"});
if(cb){
cb(blob);
}
});
server (in the 'socket-connected' callback of socket.io)
var ss = require('socket.io-stream');
// ....
ss(socket).on('get-file', (stream:any, filename:any)=>{
console.log("get-file",filename);
fs.createReadStream(filename).pipe(stream);
});
Now, the problem:
I want to alter this audio buffer and send the modified audio instead of just the file. I converted the ReadStream into an Float32Array, and did some processes sample by sample. Now I want to send that modified Float32Array to the client.
In my view, I just need to replaces the fs.createReadStream(filename) with(new Readable()).push(modifiedSoundBuffer). However, I get a TypeError: Invalid non-string/buffer chunk. Interestingly, if I convert this modifiedSodunBuffer into a Uint8Array, it doesn't yell at me, and the client gets a large array, which looks good; only that all the array values are 0. I guess that it's flooring all the values?
ss(socket).on('get-buffer', (stream:any, filename:any)=>{
let readable=(new Readable()).push(modifiedFloat32Array);
readable.pipe(stream);
});
I am trying to use streams for two reasons: sound buffers are large, and to allow concurrent processing in the future
if you will convert object Float32Array to buffer before sending like this Readable()).push(Buffer.from(modifiedSoundBuffer)) ?

Performing piped operations on individual chunks (node-wav)

I'm new to node and I'm working on an audio stream server. I'm trying to process / transform the chunks of a stream as they come out of each pipe.
So, file = fs.createReadStream(path) (filestream) is piped into file.pipe(wavy) (remove headers and output raw PCM) gets piped in to .pipe(waver) (add proper wav header to chunk) which is piped into .pipe(spark) (ouput chunk to client).
The idea is that each filestream chunk has headers removed if any (only applies to first chunk), then using the node-wav Writer that chunk is endowed with headers and then sent to the client. As I'm sure you guessed this doesn't work.
The pipe operations into node-wav are acting on the entire filestream, not the individual chunks. To confirm I've checked the output client side and it is effectively dropping the headers and re-adding them to the entire data stream.
From what I've read of the Node Stream docs it seems like what I'm trying to do should be possible, just not the way I'm doing it. I just can't pin down how to accomplish this.
Is it possible, and if so what am I missing?
Complete function:
processAudio = (path, spark) ->
wavy = new wav.Reader()
waver = new wav.Writer()
file = fs.createReadStream(path)
file.pipe(wavy).pipe(waver).pipe(spark)
I don't really know about wavs and headers but if you're "trying to process / transform the chunks of a stream as they come out of each pipe." you can use the Transform stream.
It permits you to sit between 2 streams and modify the bytes between them:
var util = require('util');
var Transform = require('stream').Transform;
util.inherits(Test, Transform);
function Test(options) {
Transform.call(this, options);
}
Test.prototype._transform = function(chunk, encoding, cb) {
// do something with chunk, then pass a modified chunk (or not)
// to the downstream
cb(null, chunk);
};
To observe the stream and potentially modify it, pipe like:
file.pipe(wavy).pipe(new Test()).pipe(waver).pipe(spark)

node - send large JSON over net socket

The problem is that sending large serialized JSON (over 16,000 characters) over a net socket gets split into chunks. Each chunk fires the data event on the receiving end. So simply running JSON.parse() on the incoming data may fail with SyntaxError: Unexpected end of input.
The work around I've managed to come up with so far is to append a null character ('\u0000') to the end of the serialized JSON, and check for that on the receiving end. Here is an example:
var partialData = '';
client.on( 'data', function( data ) {
data = data.toString();
if ( data.charCodeAt( data.length - 1 ) !== 0 ) {
partialData += data;
// if data is incomplete then no need to proceed
return;
} else {
// append all but the null character to the existing partial data
partialData += data.substr( 0, data.length - 1 );
}
// pass parsed data to some function for processing
workWithData( JSON.parse( partialData ));
// reset partialData for next data transfer
partialData = '';
});
One of the failures of this model is if the receiver is connected to multiple sockets, and each socket is sending large JSON files.
The reason I'm doing this is because I need to pass data between two processes running on the same box, and I prefer not to use a port. Hence using a net socket. So there would be two questions: First, is there a better way to quickly pass large JSON data between two Node.js processes? Second, if this is the best way then how can I better handle the case where the serialized JSON is being split into chunks when sent?
You can use try...catch every time to see if it is a valid json. Not very good performance though.
You can calculate size of your json on sending side and send it before JSON.
You can append a boundary string that's unlikely be in JSON. Your \u0000 - yes, it seems to be a legit way. But most popular choice is newline.
You can use external libraries like dnode which should already do something I mentioned before. I'd recommend trying that. Really.
One of the failures of this model is if the receiver is connected to multiple sockets, and each socket is sending large JSON files.
Use different buffers for every socket. No problem here.
It is possible to identify each socket individually and build buffers for each one. I add an id to each socket when I receive a connection and then when I receive data I add that data to a buffer.
net.createServer( function(socket) {
// There are many ways to assign an id, this is just an example.
socket.id = Math.random() * 1000;
socket.on('data', function(data) {
// 'this' refers to the socket calling this callback.
buffers[this.id] += data;
});
});
Each time you can check if you have received that "key" delimiter that will tell you that a buffer is ready to be used.

Resources