Socket.io async/await for .on() - node.js

I'm building a socket.io Node JS application and my socket.io server will be listening for data from many socket.io clients, I need to save data to an API via my socket.io server as quickly as possible and figure that async/await is the best way forward.
Right now, I've got a function inside my .on('connection'), but is there a way I can make this an async function rather than have a nested function inside?
io.use((socket, next) => {
if (!socket.handshake.query || !socket.handshake.query.token) {
console.log('Authentication Error 1')
return
}
jwt.verify(socket.handshake.query.token, process.env.AGENT_SIGNING_SECRET, (err, decoded) => {
if (err) {
console.log('Authentication Error 2')
return
}
socket.decoded = decoded
next()
})
}).on('connection', socket => {
socket.on('agent-performance', data => {
async function savePerformance () {
const saved = await db.saveToDb('http://127.0.0.1:8000/api/profiler/save', data)
console.log(saved)
}
savePerformance()
})
})

Sort of, but you'll probably want to keep your current code if there can be multiple agent-performance events. You can modify the following, but it'd be messy and less readable. Event emitters still exist for a reason, they're not made obsolete by the introduction of promises. If it's performance you're after, your current code is probably faster and more resistant to backpressure and easier to error-handle.
events.on is a utility function that takes an event emitter (like socket) and returns an iterator that yields promises. You can await those with for await of.
events.once is a utility function that takes an event emitter (like socket) and returns a promise that resolves when the specified event is executed.
const { on, once } = require('events');
(async function() {
// This is an iterator that can emit infinite number of times.
const iterator = on(io, 'connection');
// Yield a promise, await it, run what is between `{ }` and repeat.
for await (const socket of iterator) {
const data = await once(socket, 'agent-performance');
const saved = await db.saveToDb(/* etc */);
}
})();
As the names imply, on is similar to socket.on and once is similar to socket.once. In the above example:
connected user 1, first agent-performance event: OK
connected user 1, second agent-performance event: not handled, there's no more event handler, since once is "used up".
connected user 2, first agent-performance event: OK
The documentation for on has a note about concurrency when using for await (x of on(...)), but I don't know if that would be a problem in your usecase.
// The execution of this inner block is synchronous and it
// processes one event at a time (even with await). Do not use
// if concurrent execution is required.

Related

Problem in reading event stream in AWS Lambda. Nodejs code working locally as desired but not in AWS Lambda

Here's the workflow:
Get a https link --> write to filesystem --> read from filesystem --> Get the sha256 hash.
It works all good on my local machine running node 10.15.3 But when i initiate a lambda function on AWS, the output is null. Some problem may lie with the readable stream. Here's the code. You can run it directly on your local machine. It will output a sha256 hash as required. If you wish to run on AWS Lambda, Comment/Uncomment as marked.
//Reference: https://stackoverflow.com/questions/11944932/how-to-download-a-file-with-node-js-without-using-third-party-libraries
var https = require('https');
var fs = require('fs');
var crypto = require('crypto')
const url = "https://upload.wikimedia.org/wikipedia/commons/a/a8/TEIDE.JPG"
const dest = "/tmp/doc";
let hexData;
async function writeit(){
var file = fs.createWriteStream(dest);
return new Promise((resolve, reject) => {
var responseSent = false;
https.get(url, response => {
response.pipe(file);
file.on('finish', () =>{
file.close(() => {
if(responseSent) return;
responseSent = true;
resolve();
});
});
}).on('error', err => {
if(responseSent) return;
responseSent = true;
reject(err);
});
});
}
const readit = async () => {
await writeit();
var readandhex = fs.createReadStream(dest).pipe(crypto.createHash('sha256').setEncoding('hex'))
try {
readandhex.on('finish', function () { //MAY BE PROBLEM IS HERE.
console.log(this.read())
fs.unlink(dest, () => {});
})
}
catch (err) {
console.log(err);
return err;
}
}
const handler = async() =>{ //Comment this line to run the code on AWS Lambda
//exports.handler = async (event) => { //UNComment this line to run the code on AWS Lambda
try {
hexData = readit();
}
catch (err) {
console.log(err);
return err;
}
return hexData;
};
handler() //Comment this line to run the code on AWS Lambda
There can be multiple things that you need check.
Since, the URL you are accessing is a public one, make sure either your lambda is outside VPC or your VPC has NAT Gateway attached with internet access.
/tmp is valid temp directory for lambda, but you may need to create doc folder inside /tmp before using it.
You can check cloud-watch logs for more information on what's going if enabled.
I've seen this difference in behaviour between local and lambda before.
All async functions return promises. Async functions must be awaited. Calling an async function without awaiting it means execution continues to the next line(s), and potentially out of the calling function.
So your code:
exports.handler = async (event) => {
try {
hexData = readit();
}
catch (err) {
console.log(err);
return err;
}
return hexData;
};
readit() is defined as const readit = async () => { ... }. But your handler does not await it. Therefore hexData = readit(); assigns an unresolved promise to hexData, returns it, and the handler exits and the Lambda "completes" without the code of readit() having been executed.
The simple fix then is to await the async function: hexData = await readit();. The reason why it works locally in node is because the node process will wait for promises to resolve before exiting, even though the handler function has already returned. But since Lambda "returns" as soon as the handler returns, unresolved promises remain unresolved. (As an aside, there is no need for the writeit function to be marked async, because it doesn't await anything, and already returns a promise.)
That being said, I don't know promises well, and I barely know anything about events. So there are others things which raise warning flags for me but I'm not sure about them, maybe they're perfectly fine, but I'll raise it here just in case:
file.on('finish' and readandhex.on('finish'. These are both events, and I believe are non-blocking, so why would the handler and therefore lambda wait around for them?
In the first case, it's within a promise and resolve() is called from within the event function, so that may be fine (as I said, I don't know much about these 2 subjects so am not sure) - the important thing is that the code must block at that point until the promise is resolved. If the code can continue execution (i.e. return from writeit()) until the finish event is raised, then it won't work.
The second case is almost certainly going to be a problem because it's just saying that if x event is raised, then do y. There's no promise being awaited, so nothing to block the code, so it will happily continue to the end of the readit() function and then the handler and lambda. Again this is based on the assumption that events are non blocking (in the sense of, a declaration that you want to execute some code on some event, does not wait at that point for that event to be raised).

Possible to make an event handler wait until async / Promise-based code is done?

I am using the excellent Papa Parse library in nodejs mode, to stream a large (500 MB) CSV file of over 1 million rows, into a slow persistence API, that can only take one request at a time. The persistence API is based on Promises, but from Papa Parse, I receive each parsed CSV row in a synchronous event like so: parseStream.on("data", row => { ... }
The challenge I am facing is that Papa Parse dumps its CSV rows from the stream so fast that my slow persistence API can't keep up. Because Papa is synchronous and my API is Promise-based, I can't just call await doDirtyWork(row) in the on event handler, because sync and async code doesn't mix.
Or can they mix and I just don't know how?
My question is, can I make Papa's event handler wait for my API call to finish? Kind of doing the persistence API request directly in the on("data") event, making the on() function linger around somehow until the dirty API work is done?
The solution I have so far is not much better than using Papa's non-streaming mode, in terms of memory footprint. I actually need to queue up the torrent of on("data") events, in form of generator function iterations. I could have also queued up promise factories in an array and work it off in a loop. Any which way, I end up saving almost the entire CSV file as huge collection of future Promises (promise factories) in memory, until my slow API calls have worked all the way through.
async importCSV(filePath) {
let parsedNum = 0, processedNum = 0;
async function* gen() {
let pf = yield;
do {
pf = yield await pf();
} while (typeof pf === "function");
};
var g = gen();
g.next();
await new Promise((resolve, reject) => {
try {
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
parseStream.on("data", row => {
// Received a CSV row from Papa.parse()
try {
console.log("PA#", parsedNum, ": parsed", row.filter((e, i) => i <= 2 ? e : undefined)
);
parsedNum++;
// Simulate some really slow async/await dirty work here, for example
// send requests to a one-at-a-time persistence API
g.next(() => { // don't execute now, call in sequence via the generator above
return new Promise((res, rej) => {
console.log(
"DW#", processedNum, ": dirty work START",
row.filter((e, i) => i <= 2 ? e : undefined)
);
setTimeout(() => {
console.log(
"DW#", processedNum, ": dirty work STOP ",
row.filter((e, i) => i <= 2 ? e : undefined)
);
processedNum++;
res();
}, 1000)
})
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
parseStream.on("finish", () => {
console.log(`Parsed ${parsedNum} rows`);
resolve();
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
while(!(await g.next()).done);
}
So why the rush Papa? Why not allow me to work down the file a bit slower -- the data in the original CSV file isn't gonna run away, we have hours to finish the streaming, why hammer me with on("data") events that I can't seem to slow down?
So what I really need is for Papa to become more of a grandpa, and minimize or eliminate any queuing or buffering of CSV rows. Ideally I would be able to completely sync Papa's parsing events with the speed (or lack thereof) of my API. So if it weren't for the dogma that async code can't make sync code "sleep", I would ideally send each CSV row to the API inside the Papa event, and only then return control to Papa.
Suggestions? Some kind of "loose coupling" of the event handler with the slowness of my async API is fine too. I don't mind if a few hundred rows get queued up. But when tens of thousands pile up, I will run out of heap fast.
Why hammer me with on("data") events that I can't seem to slow down?
You can, you just were not asking papa to stop. You can do this by calling stream.pause(), then later stream.resume() to make use of Node stream's builtin back-pressure.
However, there's a much nicer API to use than dealing with this on your own in callback-based code: use the stream as an async iterator! When you await in the body of a for await loop, the generator has to pause as well. So you can write
async importCSV(filePath) {
let parsedNum = 0;
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
for await (const row of parseStream) {
// Received a CSV row from Papa.parse()
const data = row.filter((e, i) => i <= 2 ? e : undefined);
console.log("PA#", parsedNum, ": parsed", data);
parsedNum++;
await dirtyWork(data);
}
console.log(`Parsed ${parsedNum} rows`);
}
importCSV('sample.csv').catch(console.error);
let processedNum = 0;
function dirtyWork(data) {
// Simulate some really slow async/await dirty work here,
// for example send requests to a one-at-a-time persistence API
return new Promise((res, rej) => {
console.log("DW#", processedNum, ": dirty work START", data)
setTimeout(() => {
console.log("DW#", processedNum, ": dirty work STOP ", data);
processedNum++;
res();
}, 1000);
});
}
Async code in JavaScript can sometimes be a little hard to grok. It's important to remember how Node operates handles concurrency.
The node process is single-threaded, but it uses a concept called an event loop. The consequence of this is that async code and callbacks are essentially equivalent representations of the same thing.
Of course, you need an async function to use await, but your callback from Papa Parse can be an async function:
parse.on("data", async row => {
await sync(row)
})
Once the await operation completes, the arrow function ends, and all references to row will be eliminated, so the garbage collector can successfully collect row, releasing that memory.
The effect this has is concurrently executing sync every time a row is parsed, so if you can only sync one record at a time, then I would recommend wrapping the sync function in a debouncer.

nodejs concurrency synchronous execution

I've two WebSockets getting data asynchronously, every time I get some message from the sockets I execute some code in CompareData.
The problem is that CompareData should be executed synchronously, or (better) only if it is not already running
This is my code:
function CompareData(data) {
console.log('data ', data);
AsyncFunction();
};
ws1 = new WebSocket(WS1_URL);
ws2 = new WebSocket(WS2_URL);
ws1.on('message', (data) => {
CompareData(data);
});
ws2.on('message', (data) => {
CompareData(data);
});
Can you help me, please? I'm very new to NodeJs
Node.js is single threaded. So you don't really get true concurrency issues occurring in Node programs as you might in other languages. In your example, there can only be at most one WebSocket callback for CompareData occurring at any given time.
You should not make synchronous call in node.js but you can make those call sequential. See below example might be helpful.
var messages = [];
var inProgress = false;
function CompareData(data) {
return new Promise((resolve, reject) => {
// do some stuff and resolve
setTimeout(() => {
resolve(data);
}, 1000);
});
};
const start = async () => {
if (!inProgress) {
if (messages.length !== 0) {
inProgress = true;
try {
const data = await CompareData(messages.shift());
console.log(data);
} catch (error) {
console.log(error);
}
inProgress = false;
await start();
}else{
console.log('Process Done');
}
}
}
const handler = (data) => {
messages.push(data);
start();
}
handler(1);
handler(2);
handler(3);
handler(4);
// ws1 = new WebSocket(WS1_URL);
// ws2 = new WebSocket(WS2_URL);
// ws1.on('message', handler);
// ws2.on('message', handler);
You should use some mutex in order to avoid that two async operations of compareData are executed at the same time, like node-mutex or mutexify.
My suggestions are:
First of all, you need to know when CompareData is finished. Reorganize your code to use promises or callbacks. If you're using third-party async functions, I'm almost sure they provide some feedback on completion - This is a must have in async world
Add inProgress = false flag somewhere to serve for you as simple lock. As someone posted, JS is single-threaded and you're guaranteed that your code won't get interrupted in the middle of operation. Thanks to that you can use really simple locks instead of complicated os-based mutexes known from multithreaded
langs.
In ws.on(...) check if inProgress is set. If not, lock it and run CompareData
In CompareData completion callback or on promise resolution set inProgress back to false, so you're no longer ignoring incoming data.
If you can simply discard the data, there is no need to complicate this scenario with extra queues, mutexes, etc.
If you need to serve it all, then queue incoming data and serve next piece after completion callback is fired.
This is basically what Rahul's suggests, but he uses features that are not established in current version of standard, so don't use it if you're not transpiling your code.

Why is my RxJS Observable completing right away?

I'm a bit new to RxJS and it is kicking my ass, so I hope someone can help!
I'm using RxJS(5) on my express server to handle behaviour where I have to save a bunch of Document objects and then email each of them to their recepients. The code in my documents/create endpoint looks like this:
// Each element in this stream is an array of `Document` model objects: [<Document>, <Document>, <Document>]
const saveDocs$ = Observable.fromPromise(Document.handleCreateBatch(docs, companyId, userId));
const saveThenEmailDocs$ = saveDocs$
.switchMap((docs) => sendInitialEmails$$(docs, user))
.do(x => {
// Here x is the `Document` model object
debugger;
});
// First saves all the docs, and then begins to email them all.
// The reason we want to save them all first is because, if an email fails,
// we can still ensure that the document is saved
saveThenEmailDocs$
.subscribe(
(doc) => {
// This never hits
},
(err) => {},
() => {
// This hits immediately.. Why though?
}
);
The sendInitialEmails$$ function returns an Observable and looks like this:
sendInitialEmails$$ (docs, fromUser) {
return Rx.Observable.create((observer) => {
// Emails each document to their recepients
docs.forEach((doc) => {
mailer.send({...}, (err) => {
if (err) {
observer.error(err);
} else {
observer.next(doc);
}
});
});
// When all the docs have finished sending, complete the
// stream
observer.complete();
});
});
The problem is that when I subscribe to saveThenEmailDocs$, my next handler is never called, and it goes straight to complete. I have no idea why... Inversely if I remove the observer.complete() call from sendInitialEmails$$, the next handler is called every time and the complete handler in subscribe is never called.
Why isn't the expected behaviour of next next complete happening, instead it's one or the other... Am I missing something?
I can only assume that mailer.send is an asynchronous call.
Your observer.complete() is called when all the asynchronous calls have been launched, but before any of them could complete.
In such cases I would either make an stream of observable values from the docs array rather than wrap it like this.
Or, if you would like to wrap it manually into an observable, I suggest you look into the library async and use
async.each(docs, function(doc, callback) {...}, function finalized(err){...})

How to work around amqplib's Channel#consume odd signature?

I am writing a worker that uses amqplib's Channel#consume method. I want this worker to wait for jobs and process them as soon as they appear in the queue.
I wrote my own module to abstract away ampqlib, here are the relevant functions for getting a connection, setting up the queue and consuming a message:
const getConnection = function(host) {
return amqp.connect(host);
};
const createChannel = function(conn) {
connection = conn;
return conn.createConfirmChannel();
};
const assertQueue = function(channel, queue) {
return channel.assertQueue(queue);
};
const consume = Promise.method(function(channel, queue, processor) {
processor = processor || function(msg) { if (msg) Promise.resolve(msg); };
return channel.consume(queue, processor)
});
const setupQueue = Promise.method(function setupQueue(queue) {
const amqp_host = 'amqp://' + ((host || process.env.AMQP_HOST) || 'localhost');
return getConnection(amqp_host)
.then(conn => createChannel(conn)) // -> returns a `Channel` object
.tap(channel => assertQueue(channel, queue));
});
consumeJob: Promise.method(function consumeJob(queue) {
return setupQueue(queue)
.then(channel => consume(channel, queue))
});
My problem is with Channel#consume's odd signature. From http://www.squaremobius.net/amqp.node/channel_api.html#channel_consume:
#consume(queue, function(msg) {...}, [options, [function(err, ok) {...}]])
The callback is not where the magic happens, the message's processing should actually go in the second argument and that breaks the flow of promises.
This is how I planned on using it:
return queueManager.consumeJob(queue)
.then(msg => {
// do some processing
});
But it doesn't work. If there are no messages in the queue, the promise is rejected and then if a message is dropped in the queue nothing happens. If there is a message, only one message is processed and then the worker stalls because it exited the "processor" function from the Channel#consume call.
How should I go about it? I want to keep the queueManager abstraction so my code is easier to reason about but I don't know how to do it... Any pointers?
As #idbehold said, Promises can only be resolved once. If you want to process messages as they come in, there is no other way than to use this function. Channel#get will only check the queue once and then return; it wouldn't work for a scenario where you need a worker.
just as an option. You can present your application as a stream of some messages(or events). There is a library for this http://highlandjs.org/#examples
Your code should look like this(it isn`t a finished sample, but I hope it illustrates the idea):
let messageStream = _((push, next) => {
consume(queue, (msg) => {
push(null, msg)
})
)
// now you can operate with your stream in functional style
message.map((msg) => msg + 'some value').each((msg) => // do something with msg)
This approach provides you a lot of primitives for synchronization and transformation
http://highlandjs.org/#examples

Resources