nodejs concurrency synchronous execution - node.js

I've two WebSockets getting data asynchronously, every time I get some message from the sockets I execute some code in CompareData.
The problem is that CompareData should be executed synchronously, or (better) only if it is not already running
This is my code:
function CompareData(data) {
console.log('data ', data);
AsyncFunction();
};
ws1 = new WebSocket(WS1_URL);
ws2 = new WebSocket(WS2_URL);
ws1.on('message', (data) => {
CompareData(data);
});
ws2.on('message', (data) => {
CompareData(data);
});
Can you help me, please? I'm very new to NodeJs

Node.js is single threaded. So you don't really get true concurrency issues occurring in Node programs as you might in other languages. In your example, there can only be at most one WebSocket callback for CompareData occurring at any given time.

You should not make synchronous call in node.js but you can make those call sequential. See below example might be helpful.
var messages = [];
var inProgress = false;
function CompareData(data) {
return new Promise((resolve, reject) => {
// do some stuff and resolve
setTimeout(() => {
resolve(data);
}, 1000);
});
};
const start = async () => {
if (!inProgress) {
if (messages.length !== 0) {
inProgress = true;
try {
const data = await CompareData(messages.shift());
console.log(data);
} catch (error) {
console.log(error);
}
inProgress = false;
await start();
}else{
console.log('Process Done');
}
}
}
const handler = (data) => {
messages.push(data);
start();
}
handler(1);
handler(2);
handler(3);
handler(4);
// ws1 = new WebSocket(WS1_URL);
// ws2 = new WebSocket(WS2_URL);
// ws1.on('message', handler);
// ws2.on('message', handler);

You should use some mutex in order to avoid that two async operations of compareData are executed at the same time, like node-mutex or mutexify.

My suggestions are:
First of all, you need to know when CompareData is finished. Reorganize your code to use promises or callbacks. If you're using third-party async functions, I'm almost sure they provide some feedback on completion - This is a must have in async world
Add inProgress = false flag somewhere to serve for you as simple lock. As someone posted, JS is single-threaded and you're guaranteed that your code won't get interrupted in the middle of operation. Thanks to that you can use really simple locks instead of complicated os-based mutexes known from multithreaded
langs.
In ws.on(...) check if inProgress is set. If not, lock it and run CompareData
In CompareData completion callback or on promise resolution set inProgress back to false, so you're no longer ignoring incoming data.
If you can simply discard the data, there is no need to complicate this scenario with extra queues, mutexes, etc.
If you need to serve it all, then queue incoming data and serve next piece after completion callback is fired.
This is basically what Rahul's suggests, but he uses features that are not established in current version of standard, so don't use it if you're not transpiling your code.

Related

Socket.io async/await for .on()

I'm building a socket.io Node JS application and my socket.io server will be listening for data from many socket.io clients, I need to save data to an API via my socket.io server as quickly as possible and figure that async/await is the best way forward.
Right now, I've got a function inside my .on('connection'), but is there a way I can make this an async function rather than have a nested function inside?
io.use((socket, next) => {
if (!socket.handshake.query || !socket.handshake.query.token) {
console.log('Authentication Error 1')
return
}
jwt.verify(socket.handshake.query.token, process.env.AGENT_SIGNING_SECRET, (err, decoded) => {
if (err) {
console.log('Authentication Error 2')
return
}
socket.decoded = decoded
next()
})
}).on('connection', socket => {
socket.on('agent-performance', data => {
async function savePerformance () {
const saved = await db.saveToDb('http://127.0.0.1:8000/api/profiler/save', data)
console.log(saved)
}
savePerformance()
})
})
Sort of, but you'll probably want to keep your current code if there can be multiple agent-performance events. You can modify the following, but it'd be messy and less readable. Event emitters still exist for a reason, they're not made obsolete by the introduction of promises. If it's performance you're after, your current code is probably faster and more resistant to backpressure and easier to error-handle.
events.on is a utility function that takes an event emitter (like socket) and returns an iterator that yields promises. You can await those with for await of.
events.once is a utility function that takes an event emitter (like socket) and returns a promise that resolves when the specified event is executed.
const { on, once } = require('events');
(async function() {
// This is an iterator that can emit infinite number of times.
const iterator = on(io, 'connection');
// Yield a promise, await it, run what is between `{ }` and repeat.
for await (const socket of iterator) {
const data = await once(socket, 'agent-performance');
const saved = await db.saveToDb(/* etc */);
}
})();
As the names imply, on is similar to socket.on and once is similar to socket.once. In the above example:
connected user 1, first agent-performance event: OK
connected user 1, second agent-performance event: not handled, there's no more event handler, since once is "used up".
connected user 2, first agent-performance event: OK
The documentation for on has a note about concurrency when using for await (x of on(...)), but I don't know if that would be a problem in your usecase.
// The execution of this inner block is synchronous and it
// processes one event at a time (even with await). Do not use
// if concurrent execution is required.

Possible to make an event handler wait until async / Promise-based code is done?

I am using the excellent Papa Parse library in nodejs mode, to stream a large (500 MB) CSV file of over 1 million rows, into a slow persistence API, that can only take one request at a time. The persistence API is based on Promises, but from Papa Parse, I receive each parsed CSV row in a synchronous event like so: parseStream.on("data", row => { ... }
The challenge I am facing is that Papa Parse dumps its CSV rows from the stream so fast that my slow persistence API can't keep up. Because Papa is synchronous and my API is Promise-based, I can't just call await doDirtyWork(row) in the on event handler, because sync and async code doesn't mix.
Or can they mix and I just don't know how?
My question is, can I make Papa's event handler wait for my API call to finish? Kind of doing the persistence API request directly in the on("data") event, making the on() function linger around somehow until the dirty API work is done?
The solution I have so far is not much better than using Papa's non-streaming mode, in terms of memory footprint. I actually need to queue up the torrent of on("data") events, in form of generator function iterations. I could have also queued up promise factories in an array and work it off in a loop. Any which way, I end up saving almost the entire CSV file as huge collection of future Promises (promise factories) in memory, until my slow API calls have worked all the way through.
async importCSV(filePath) {
let parsedNum = 0, processedNum = 0;
async function* gen() {
let pf = yield;
do {
pf = yield await pf();
} while (typeof pf === "function");
};
var g = gen();
g.next();
await new Promise((resolve, reject) => {
try {
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
parseStream.on("data", row => {
// Received a CSV row from Papa.parse()
try {
console.log("PA#", parsedNum, ": parsed", row.filter((e, i) => i <= 2 ? e : undefined)
);
parsedNum++;
// Simulate some really slow async/await dirty work here, for example
// send requests to a one-at-a-time persistence API
g.next(() => { // don't execute now, call in sequence via the generator above
return new Promise((res, rej) => {
console.log(
"DW#", processedNum, ": dirty work START",
row.filter((e, i) => i <= 2 ? e : undefined)
);
setTimeout(() => {
console.log(
"DW#", processedNum, ": dirty work STOP ",
row.filter((e, i) => i <= 2 ? e : undefined)
);
processedNum++;
res();
}, 1000)
})
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
parseStream.on("finish", () => {
console.log(`Parsed ${parsedNum} rows`);
resolve();
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
while(!(await g.next()).done);
}
So why the rush Papa? Why not allow me to work down the file a bit slower -- the data in the original CSV file isn't gonna run away, we have hours to finish the streaming, why hammer me with on("data") events that I can't seem to slow down?
So what I really need is for Papa to become more of a grandpa, and minimize or eliminate any queuing or buffering of CSV rows. Ideally I would be able to completely sync Papa's parsing events with the speed (or lack thereof) of my API. So if it weren't for the dogma that async code can't make sync code "sleep", I would ideally send each CSV row to the API inside the Papa event, and only then return control to Papa.
Suggestions? Some kind of "loose coupling" of the event handler with the slowness of my async API is fine too. I don't mind if a few hundred rows get queued up. But when tens of thousands pile up, I will run out of heap fast.
Why hammer me with on("data") events that I can't seem to slow down?
You can, you just were not asking papa to stop. You can do this by calling stream.pause(), then later stream.resume() to make use of Node stream's builtin back-pressure.
However, there's a much nicer API to use than dealing with this on your own in callback-based code: use the stream as an async iterator! When you await in the body of a for await loop, the generator has to pause as well. So you can write
async importCSV(filePath) {
let parsedNum = 0;
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
for await (const row of parseStream) {
// Received a CSV row from Papa.parse()
const data = row.filter((e, i) => i <= 2 ? e : undefined);
console.log("PA#", parsedNum, ": parsed", data);
parsedNum++;
await dirtyWork(data);
}
console.log(`Parsed ${parsedNum} rows`);
}
importCSV('sample.csv').catch(console.error);
let processedNum = 0;
function dirtyWork(data) {
// Simulate some really slow async/await dirty work here,
// for example send requests to a one-at-a-time persistence API
return new Promise((res, rej) => {
console.log("DW#", processedNum, ": dirty work START", data)
setTimeout(() => {
console.log("DW#", processedNum, ": dirty work STOP ", data);
processedNum++;
res();
}, 1000);
});
}
Async code in JavaScript can sometimes be a little hard to grok. It's important to remember how Node operates handles concurrency.
The node process is single-threaded, but it uses a concept called an event loop. The consequence of this is that async code and callbacks are essentially equivalent representations of the same thing.
Of course, you need an async function to use await, but your callback from Papa Parse can be an async function:
parse.on("data", async row => {
await sync(row)
})
Once the await operation completes, the arrow function ends, and all references to row will be eliminated, so the garbage collector can successfully collect row, releasing that memory.
The effect this has is concurrently executing sync every time a row is parsed, so if you can only sync one record at a time, then I would recommend wrapping the sync function in a debouncer.

How do I achieve a synchronous requirement using asynchronous NodeJS

I am adding user validation an data modification page on a node.js application.
In a synchronous universe, in a single function I would:
Lookup the original record in the database
Lookup the user in LDAP to see if they are the owner or admin
Do the logic and write the record.
In an asynchronous universe that won't work. To solve it I've built a series of hand-off functions:
router.post('/writeRecord', jsonParser, function(req, res) {
post = req.post;
var smdb = new AWS.DynamoDB.DocumentClient();
var params = { ... }
smdb.query(params, function(err,data){
if( err == null ) writeRecordStep2(post,data);
}
});
function writeRecord2( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep3( ru, post, data, ldap1 )
}
}
}
function writeRecord3( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep4( ru, post, data, ldap1, ldap2 )
}
}
}
function writeRecordStep4( ru, post, data, ldap1, ldap2 ){
// Do stuff with collected data
}
Additionally, because the LDAP and Dynamo logic are in their own source documents, these functions are scattered tragically around the code.
This strikes me as inefficient, as well as inelegant. I'm eager to find a more natural asynchronous pattern to achieve the same result.
Any promise library should sort your issue out. My preferred choice is bluebird. In summary they help you in performing blocking operations.
If you haven't heard about bluebird then just use it. It converts all function of a module and return promise which is then-able. Simply put, it promisifies all functions.
Here is the mechanism:
Module1.someFunction() \\do your job and finally pass the return object to next call
.then() \\Use that object which is return from the first call, do your job and return the updated value
.then() \\same goes on
.catch() \\do your job when any error occurs.
Hope you understand. Here is an example:
var readFile = Promise.promisify(require("fs").readFile);
readFile("myfile.js",
"utf8").then(function(contents) {
return eval(contents);
}).then(function(result) {
console.log("The result of evaluating
myfile.js", result);
}).catch(SyntaxError, function(e) {
console.log("File had syntax error", e);
//Catch any other error
}).catch(function(e) {
console.log("Error reading file", e);
});
I could not tell from your pseudo-code exactly which async operations depend upon results from with other ones and knowing that is key to the most efficient way to code a series of asynchronous operations. If two operations do not depend upon one another, they can run in parallel which generally gets to an end result faster. I also can't tell exactly what data needs to be passed on to later parts of the async requests (too much pseudo-code and not enough real code to show us what you're really attempting to do).
So, without that level of detail, I'll show you two ways to approach this. The first runs each operation sequentially. Run the first async operation, when it's done, run the next one and accumulates all the results into an object that is passed along to the next link in the chain. This is general purpose since all async operations have access to all the prior results.
This makes use of promises built into the AWS.DynamboDB interface and makes our own promise for conn.search() (though if I knew more about that interface, it may already have a promise interface).
Here's the sequential version:
// promisify the search method
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
// utility function that does a search and adds the result to the object passed in
// returns a promise that resolves to the object
function ldapSearch(data, key) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... }).then(results => {
// put our results onto the passed in object
data[key] = results;
// resolve with the original object (so we can collect data here in a promise chain)
return data;
});
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
// The latest AWS interface gets a promise with the .promise() method
smdb.query(params).promise().then(dbresult => {
return ldapSearch({post, dbresult}, "ldap1");
}).then(result => {
// result.dbresult
// result.ldap1
return ldapSearch(result, "ldap2")
}).then(result => {
// result.dbresult
// result.ldap1
// result.ldap2
// doSomething with all the collected data here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
And, here's a parallel version that runs all three async operations at once and then waits for all three of the to be done and then has all the results at once:
// if the three async operations you show can be done in parallel
// first promisify things
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
function ldapSearch(params) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... });
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
Promise.all([
ldapSearch(...),
ldapSearch(...),
smdb.query(params).promise()
]).then(([ldap1Result, ldap2Result, queryResult]) => {
// process ldap1Result, ldap2Result and queryResult here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
Keep in mind that due to the pseudo-code nature of the code in your question, this is also pseudo-code where implementation details (exactly what parameters you're searching for, what response you're sending, etc...) have to be filled in. This should be illustrative of promise chaining to serialize operations and the use of Promise.all() for parallelizing operations and promisifying a method that didn't have promises built in.

How to work around amqplib's Channel#consume odd signature?

I am writing a worker that uses amqplib's Channel#consume method. I want this worker to wait for jobs and process them as soon as they appear in the queue.
I wrote my own module to abstract away ampqlib, here are the relevant functions for getting a connection, setting up the queue and consuming a message:
const getConnection = function(host) {
return amqp.connect(host);
};
const createChannel = function(conn) {
connection = conn;
return conn.createConfirmChannel();
};
const assertQueue = function(channel, queue) {
return channel.assertQueue(queue);
};
const consume = Promise.method(function(channel, queue, processor) {
processor = processor || function(msg) { if (msg) Promise.resolve(msg); };
return channel.consume(queue, processor)
});
const setupQueue = Promise.method(function setupQueue(queue) {
const amqp_host = 'amqp://' + ((host || process.env.AMQP_HOST) || 'localhost');
return getConnection(amqp_host)
.then(conn => createChannel(conn)) // -> returns a `Channel` object
.tap(channel => assertQueue(channel, queue));
});
consumeJob: Promise.method(function consumeJob(queue) {
return setupQueue(queue)
.then(channel => consume(channel, queue))
});
My problem is with Channel#consume's odd signature. From http://www.squaremobius.net/amqp.node/channel_api.html#channel_consume:
#consume(queue, function(msg) {...}, [options, [function(err, ok) {...}]])
The callback is not where the magic happens, the message's processing should actually go in the second argument and that breaks the flow of promises.
This is how I planned on using it:
return queueManager.consumeJob(queue)
.then(msg => {
// do some processing
});
But it doesn't work. If there are no messages in the queue, the promise is rejected and then if a message is dropped in the queue nothing happens. If there is a message, only one message is processed and then the worker stalls because it exited the "processor" function from the Channel#consume call.
How should I go about it? I want to keep the queueManager abstraction so my code is easier to reason about but I don't know how to do it... Any pointers?
As #idbehold said, Promises can only be resolved once. If you want to process messages as they come in, there is no other way than to use this function. Channel#get will only check the queue once and then return; it wouldn't work for a scenario where you need a worker.
just as an option. You can present your application as a stream of some messages(or events). There is a library for this http://highlandjs.org/#examples
Your code should look like this(it isn`t a finished sample, but I hope it illustrates the idea):
let messageStream = _((push, next) => {
consume(queue, (msg) => {
push(null, msg)
})
)
// now you can operate with your stream in functional style
message.map((msg) => msg + 'some value').each((msg) => // do something with msg)
This approach provides you a lot of primitives for synchronization and transformation
http://highlandjs.org/#examples

How to wait for all async calls to finish

I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.

Resources