Access of global variables in setImmediate in node.js - node.js

Below is a piece of code:
var buffer = new Buffer(0, 'hex'); //Global buffer
socket.on('data', function(data) {
// Concatenate the received data to buffer
buffer = Buffer.concat([buffer, new Buffer(data, 'hex')]);
setImmediate(function() { // Executed asynchronously
/*Process messages received in buffer*/
var messageLength = getMessageLength(buffer);
while (buffer.length >= messageLength) {
/*Process message and send response*/
}
//Remove message from buffer after processing is done
buffer.splice(messageLength);
}) // End of setImmediate
}) //End of socket.on
I am using a global variable 'buffer', inside the setImmediate block(executed asynchronously). Is there a guarantee that the global buffer variable does not change(either due to addition of data or deletion of data) during the execution of code in setImmediate block?? If no, how to handle such that the buffer is accessed safely??

The oft-repeated saying "NodeJS is single-threaded" means there is no question of "safety" here. Simultaneous accesses to a variable are not possible because simultaneous operations do not occur. Even though the setImmediate code is executed asynchronously, that does not mean it is executed as the SAME TIME. It just means it is executed "soon". The parent function can return before this happens - but the parent function is not running when the anonymous setImmediate callback is triggered. At that time, the callback is the only thing running.
These operations are thus safe - but for what it's worth, it's not very efficient. NodeJS buffers are fixed-length, which is why you need to need to keep re-allocating a new one to append data. They're suitable for one-time loads but not really ideal for constant append operations. Consider using a readable stream. This allows you to pull out and process any length of data you want at a time, and can return a buffer. But internally it does not constantly re-allocate its storage block for the data read.

Related

Why while loop is needed for reading a non-flowing mode stream in Node.js?

In the node.js documentation, I came across the following code
const readable = getReadableStreamSomehow();
// 'readable' may be triggered multiple times as data is buffered in
readable.on('readable', () => {
let chunk;
console.log('Stream is readable (new data received in buffer)');
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
console.log(`Read ${chunk.length} bytes of data...`);
}
});
// 'end' will be triggered once when there is no more data available
readable.on('end', () => {
console.log('Reached end of stream.');
});
Here is the comment from the node.js documentation concerning the usage of the while loop, saying it's needed to make sure all data is read
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
I couldn't understand why it is needed and tried to replace while with just if statement, and the process terminated after the very first read. Why?
From the node.js documentation
The readable.read() method should only be called on Readable streams operating in paused mode. In flowing mode, readable.read() is called automatically until the internal buffer is fully drained.
Be careful that this method is only meant for stream that has been paused.
And even further, if you understand what a stream is, you'll understand that you need to process chunks of data.
Each call to readable.read() returns a chunk of data, or null. The chunks are not concatenated. A while loop is necessary to consume all data currently in the buffer.
So i hope you understand that if you are not looping through your readable stream and only executing 1 read, you won't get your full data.
Ref: https://nodejs.org/api/stream.html

Can this code cause a race condition in socket io?

I am very new to node js and socket io. Can this code lead to a race condition on counter variable. Should I use a locking library for safely updating the counter variable.
"use strict";
module.exports = function (opts) {
var module = {};
var io = opts.io;
var counter = 0;
io.on('connection', function (socket) {
socket.on("inc", function (msg) {
counter += 1;
});
socket.on("dec" , function (msg) {
counter -= 1;
});
});
return module;
};
No, there is no race condition here. Javascript in node.js is single threaded and event driven so only one socket.io event handler is ever executing at a time. This is one of the nice programming simplifications that come from the single threaded model. It runs a given thread of execution to completion and then and only then does it grab the next event from the event queue and run it.
Hopefully you do realize that the same counter variable is accessed by all socket.io connections. While this isn't a race condition, it means that there's only one counter that all socket.io connections are capable of modifying.
If you wanted a per-connection counter (separeate counter for each connection), then you could define the counter variable inside the io.on('connection', ....) handler.
The race conditions you do have to watch out for in node.js are when you make an async call and then continue the rest of your coding logic in the async callback. While the async operation is underway, other node.js code can run and can change publicly accessible variables you may be using. That is not the case in your counter example, but it does occur with lots of other types of node.js programming.
For example, this could be an issue:
var flag = false;
function doSomething() {
// set flag indicating we are in a fs.readFile() operation
flag = true;
fs.readFile("somefile.txt", function(err, data) {
// do something with data
// clear flag
flag = false;
});
}
In this case, immediately after we call fs.readFile(), we are returning control back to the node.js. It is free at that time to run other operations. If another operation could also run this code, then it will tromp on the value of flag and we'd have a concurrency issue.
So, you have to be aware that anytime you make an async operation and then the rest of your logic continues in the callback for the async operation that other code can run and any shared variables can be accessed at that time. You either need to make a local copy of shared data or you need to provide appropriate protections for shared data.
In this particular case, the flag could be incremented and decremented rather than simply set to true or false and it would probably serve the desired purpose of keeping track of whether this file is current being read or not.
Shorter answer:
"Race condition" is when you execute a series of ordered asynchronous functions and because of their async nature they won't finish processing in their original order.
In your code, you are executing a series of ordered synchronous process (increasing or decreasing the counter), So they finish instantly after they start, resulting in ordered output. So no racing here!

NodeJS streams and premature end

Assuming a Readable Stream in NodeJS and a Data (on('data', ...)) event handler tied to it that is relatively slow, is it possible for the End event to fire before the last Data handler(s) has finished, and if so, will it prematurely terminate that handler? Or, will all Data events get dispatched and run?
In my case, I am working with large files and want to commit to a DB every data chunk. I am worried that I may lose the last record or two (or more) if End is fired before the last DB calls in the handler actually complete.
Event 'end' fire after last 'data' event. But it may happend before the last Data handler has finished. It is possible that before one 'data' handler has finished, next is started. It depends of what you have in your code, but it is possible that later call of event 'data' finish before earlier. It may cause errors and problems in your code.
Example how to cause problems (to your own tests):
var fs = require('fs');
var rr = fs.createReadStream('somebigfile.jpg');
var i=0;
rr.on('data', function(chunk) {
i++;
var s = i;
console.log('readable:' + s);
setTimeout(function(){
console.log('timeout:'+s);
}, 50-i*10);
});
rr.on('end', function() {
console.log('end');
});
It will print in your console when start each 'data' event handler. And after some miliseconds when it finish. Finish may be in different order.
Solution:
Readable Streams have two modes 'flowing mode' and a 'paused mode'. When you add 'data' event handler, you auto set Readable Streams to flowing mode.
From documentation :
When in flowing mode, data is read from the underlying system and
provided to your program as fast as possible
In this mode events will not wait for your slow actions to finish. For your need is 'paused mode'.
From documentation:
In paused mode, you must explicitly call stream.read() to get chunks
of data out. Streams start out in paused mode.
In other words: you demand chunk of data, you get it, you work with it, and when you ready you ask for new chunk of data. In this mode you controll when you want to get your data.
How to change to 'paused mode':
It is default mode for this stream. But when you register 'data' event handler it switch to 'flowing mode'. Therefore not use readstream.on('data',...)
Instead use readstream.on('readable', function(){...}) when it fire, then it means that stream is ready to give chunk of data. To get chunk of data use var chunk = readstream.read();
Example from docs:
var fs = require('fs');
var rr = fs.createReadStream('foo.txt');
rr.on('readable', function() {
console.log('readable:', rr.read());
});
rr.on('end', function() {
console.log('end');
});
Please read documentation for more details, because there are more posibilities when stream is auto switched to 'flowing mode'.
Work with slow handlers and flowing mode:
If you want/need work in 'flowing mode', there is also solution. You can pause and resume stream. When you get chunk form readstream('data'), pause stream and when you finish work then resume it.
Example from documentation:
var readable = getReadableStreamSomehow();
readable.on('data', function(chunk) {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(function() {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});

NODE fs.readFile, JSON.parse and fs.writeFile

I'm writing an app in Node and have been running into a rare but detrimental occurrence.
So I have a schedule.txt and I write to it when the user makes a change but then also read it every second and then parse it for use throughout the program.
Rarely what happens is as a user is writing to the file (asynchronously) the app (based on the timer) reads the same file and attempts to parse it and fails.
I know from a design stand-point maybe this is just bound to happen... but I'm wondering if there is a quick fix I can do now. Would using writeFileSync help my situation? (make it more 'atomic'?) I just want to make sure that the app doesn't read the file while another process is still writing to the file.
TIA!
Niko
Seems like you'd want to serialize your read/writes. If it were me, I might try having a "manager" object which encapsulates the serialization, which you'd use like:
var fileManager = require('./file-manager');
// somewhere in the program
fileManager.scheduleWrite(data, function(err){
// now the write is done
});
// somewhere else in the program
fileManager.scheduleRead(function(err, data){
// `data` contains the data
});
Then implement it using Q or a similar promises lib, like:
// in file-manager.js
var wait = Q();
module.exports = {
scheduleWrite: function(data, cb){
wait = wait.then(function(){
// write data and call cb()
});
},
scheduleRead: function(){
wait = wait.then(function(){
// read data and call cb(data)
});
}
};
The wait var will "stack up" into a serialized chain of tasks where the next one won't start until the previous one completes.

How to execute an async task with socket.io and node.js?

When I receive an "on" event on the server side, I want to start a task in parallel so it does not block the current event loop thread. Is it possible to do so? How?
I don't want to block the server side loop and I want to be able to send back a message to the client once the task is done, something such as:
client.on('execute-parallel-task', function(msg) {
setTimeout(function() {
// do something that takes a while
client.emit('finished-that-task');
},0);
// this block should return asap, not waiting for the previous call
});
I am not sure if setTimeout will do the job.
It depends what the takes a while is. If it takes a while asynchronously (you can tell because you'll have to register a callback or complete handler), and takes a while because it's blocked on something like IO, rather than CPU bound, it'll inherently be parallel.
If however, its something synchronous or CPU bound, whilst you can use setTimeout, setImmediate etc. to send back a message immediately, once the handler for setTimeout or setImmediate executes, your single thread of execution will be stuck handling that; you're not really fixing the problem, merely deferring it.
To exhibit true parallel behaviour, you'll need to launch a child process. You can use the message passing functionality to notify your worker what work to do, and to notify the parent process once the work is complete.
var cp = require('child_process');
var child = cp.fork(__dirname + '/my-child-worker.js');
n.on('message', function(m) {
if (m === "done") {
// Whey!
}
});
n.send(/* Job id, or something */);
Then in my-child-worker.js;
process.on('message', function (m) {
switch (m) {
case 'get-x':
// blah
break;
// other jobs
}
process.send('done');
});
you do not need the setTimeout.
Your function(msg) will be called once the execute parallel task finishes.
if you are designing a task to run in an async manner, you can look at something like the async lib for node.js
Async Node JS Link

Resources