Better way to write a simple Node redis loop (using ioredis)? - node.js

So, I'm stilling learning the JS/Node way from a long time in other languages.
I have a tiny micro-service that reads from a redis channel, temp stores it in a working channel, does the work, removes it, and moves on. If there is more in the channel it re-runs immediately. If not, it sets a timeout and checks again in 1 second.
It works fine...but timeout polling doesn't seem to be the "correct" way to approach this. And I haven't found much about using BRPOPLPUSH to try to block (vs. RPOPLPUSH) and wait in Node....or other options like that. (Pub/Sub isn't an option here...this is the only listener, and it may not always be listening.)
Here's the short essence of what I'm doing:
var Redis = require('ioredis');
var redis = new Redis();
var redisLoop = function () {
redis.rpoplpush('channel', 'channel-working').then(function (result) {
if (result) {
processJob(result); //do stuff
//delete the item from the working channel, and check for another item
redis.lrem('channel-working', 1, result).then(function (result) { });
redisLoop();
} else {
//no items, wait 1 second and try again
setTimeout(redisLoop, 1000);
}
});
};
redisLoop();
I feel like I'm missing something really obvious. Thanks!

BRPOPLPUSH doesn't block in Node, it blocks in the client. In this instance I think it's exactly what you need to get rid of the polling.
var Redis = require('ioredis');
var redis = new Redis();
var redisLoop = function () {
redis.brpoplpush('channel', 'channel-working', 0).then(function (result) {
// because we are using BRPOPLPUSH, the client promise will not resolve
// until a 'result' becomes available
processJob(result);
// delete the item from the working channel, and check for another item
redis.lrem('channel-working', 1, result).then(redisLoop);
});
};
redisLoop();
Note that redis.lrem is asynchronous, so you should use lrem(...).then(redisLoop) to ensure that your next tick executes only after the item is successfully removed from channel-working.

Related

Get all messages from AWS SQS in NodeJS

I have the following function that gets a message from aws SQS, the problem is I get one at a time and I wish to get all of them, because I need to check the ID for each message:
function getSQSMessages() {
const params = {
QueueUrl: 'some url',
};
sqs.receiveMessage(params, (err, data) => {
if(err) {
console.log(err, err.stack)
return(err);
}
return data.Messages;
});
};
function sendMessagesBack() {
return new Promise((resolve, reject) => {
if(Array.isArray(getSQSMessages())) {
resolve(getSQSMessages());
} else {
reject(getSQSMessages());
};
});
};
The function sendMessagesBack() is used in another async/await function.
I am not sure how to get all of the messages, as I was looking on how to get them, people mention loops but I could not figure how to implement it in my case.
I assume I have to put sqs.receiveMessage() in a loop, but then I get confused on what do I need to check and when to stop the loop so I can get the ID of each message?
If anyone has any tips, please share.
Thank you.
I suggest you to use the Promise api, and it will give you the possibility to use async/await syntax right away.
const { Messages } = await sqs.receiveMessage(params).promise();
// Messages will contain all your needed info
await sqs.sendMessage(params).promise();
In this way, you will not need to wrap the callback API with Promises.
SQS doesn't return more than 10 messages in the response. To get all the available messages, you need to call the getSQSMessages function recursively.
If you return a promise from getSQSMessages, you can do something like this.
getSQSMessages()
.then(data => {
if(!data.Messages || data.Messages.length === 0){
// no messages are available. return
}
// continue processing for each message or push the messages into array and call
//getSQSMessages function again.
});
You can never be guaranteed to get all the messages in a queue, unless after you get some of them, you delete them from the queue - thus ensuring that the next requests returns a different selection of records.
Each request will return 'upto' 10 messages, if you don't delete them, then there is a good chance that the next request for 'upto' 10 messages will return a mix of messages you have already seen, and some new ones - so you will never really know when you have seen them all.
It maybe that a queue is not the right tool to use in your use case - but since I don't know your use case, its hard to say.
I know this is a bit of a necro but I landed here last night while trying to pull some all messages from a dead letter queue in SQS. While the accepted answer, "you cannot guarantee to get all messages" from the queue is absolutely correct I did want to drop an answer for anyone that may land here as well and needs to get around the 10 message limit per request from AWS.
Dependencies
In my case I have a few dependencies already in my project that I used to make life simpler.
lodash - This is something we use in our code for help making things functional. I don't think I used it below but I'm including it since it's in the file.
cli-progress - This gives you a nice little progress bar on your CLI.
Disclaimer
The below was thrown together during troubleshooting some production errors integrating with another system. Our DLQ messages contain some identifiers that I need in order to formulate cloud watch queries for troubleshooting. Given that these are two different GUIs in AWS switching back and forth is cumbersome given that our AWS session are via a form of federation and the session only lasts for one hour max.
The script
#!/usr/bin/env node
const _ = require('lodash');
const aswSdk = require('aws-sdk');
const cliProgress = require('cli-progress');
const queueUrl = 'https://[put-your-url-here]';
const queueRegion = 'us-west-1';
const getMessages = async (sqs) => {
const resp = await sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
}).promise();
return resp.Messages;
};
const main = async () => {
const sqs = new aswSdk.SQS({ region: queueRegion });
// First thing we need to do is get the current number of messages in the DLQ.
const attributes = await sqs.getQueueAttributes({
QueueUrl: queueUrl,
AttributeNames: ['All'], // Probably could thin this down but its late
}).promise();
const numberOfMessage = Number(attributes.Attributes.ApproximateNumberOfMessages);
// Next we create a in-memory cache for the messages
const allMessages = {};
let running = true;
// Honesty here: The examples we have in existing code use the multi-bar. It was about 10PM and I had 28 DLQ messages I was looking into. I didn't feel it was worth converting the multi-bar to a single-bar. Look into the docs on the github page if this is really a sticking point for you.
const progress = new cliProgress.MultiBar({
format: ' {bar} | {name} | {value}/{total}',
hideCursor: true,
clearOnComplete: true,
stopOnComplete: true
}, cliProgress.Presets.shades_grey);
const progressBar = progress.create(numberOfMessage, 0, { name: 'Messages' });
// TODO: put in a time limit to avoid an infinite loop.
// NOTE: For 28 messages I managed to get them all with this approach in about 15 seconds. When/if I cleanup this script I plan to add the time based short-circuit at that point.
while (running) {
// Fetch all the messages we can from the queue. The number of messages is not guaranteed per the AWS documentation.
let messages = await getMessages(sqs);
for (let i = 0; i < messages.length; i++) {
// Loop though the existing messages and only copy messages we have not already cached.
let message = messages[i];
let data = allMessages[message.MessageId];
if (data === undefined) {
allMessages[message.MessageId] = message;
}
}
// Update our progress bar with the current progress
const discoveredMessageCount = Object.keys(allMessages).length;
progressBar.update(discoveredMessageCount);
// Give a quick pause just to make sure we don't get rate limited or something
await new Promise((resolve) => setTimeout(resolve, 1000));
running = discoveredMessageCount !== numberOfMessage;
}
// Now that we have all the messages I printed them to console so I could copy/paste the output into LibreCalc (excel-like tool). I split on the semicolon for rows out of habit since sometimes similar scripts deal with data that has commas in it.
const keys = Object.keys(allMessages);
console.log('Message ID;ID');
for (let i = 0; i < keys.length; i++) {
const message = allMessages[keys[i]];
const decodedBody = JSON.parse(message.Body);
console.log(`${message.MessageId};${decodedBody.id}`);
}
};
main();

lrange returns false instead of the list

I'm trying to make my own simple Message Queue using Redis.
However, I'm having problem making a queue with Redis.
I used Redis for caching in other parts of my project so i'm sure redis connection is fine(+ I tried priting out the instance and it seems fine).
message_queue.js
const redis = require("./redis.js");
var sendMessage = async (queue) => {
var result = await redis.rpush(queue,5);
console.log(result);
var arr = await redis.lrange(queue,0,-1);
console.log(arr);
};
redis.js
const redis = require('redis');
const redisinfo = require('../secret/redisinfo.js');
const client = redis.createClient(redisinfo);
client.on("error",function(err){
console.log("Error " + err);
});
module.exports = client;
when i run sendMessage function from message_queue.js,
it prints false and false.
what am i doing wrong to pushing item into queue and printing it? Do I need to do something before this process such as declaring a list?
p.s.
I don't know if this is the proper method to use this package but it seems like i need to use a callback function to access the result...
this following code works as expected
redis.lrange(queue,0,-1,function(err,res){
if(res.length != 0){
console.log(`Something is in queue`);
}
});
You don't need to "declare" lists in Redis, RPUSH should create a list under the specified key the first time it is used.
Is it possible that you have already used the key name and stored something that is not a list? That would explain the behavior you are seeing. Run DEL <key> first in Redis to fix it, in that case.

setTimeout or child_process.spawn?

I have a REST service in Node.js with one specific request running a bunch of DB commands and other file processing that could take 10-15 seconds to run. Since I didn't want to hold up my browser request thread, I wrote a separate .js script to do the needful, called the script using child_process.spawn() in my Node.js code and immediately returned OK back to the client. This works fine, but then so does calling the same script (as a local function) by just using a simple setTimeout.
router.post("/longRequest", function(req, res) {
console.log("Started long request with id: " + req.body.id);
var longRunningFunction = function() {
// Usually runs a bunch of things that take time.
// Simulating a 10 sec delay for sample code.
setTimeout(function() {
console.log("Done processing for 10 seconds")
}, 10000);
}
// Below line used to be
// child_process.spawn('longRunningFunction.js'
setTimeout(longRunningFunction, 0);
res.json({status: "OK"})
})
So, this works for my purpose. But what's the downside ? I probably can't monitor the offline process easily as child_process.spawn which would give me a process id. But, does this cause problems in the long run ? Will it hold up Node.js processing if the 10 second processing increases to a lot more in the future ?
The actual longRunningFunction is something that reads an Excel file, parses it and does a bulk load using tedious to a MS SQL Server.
var XLSX = require('xlsx');
var FileAPI = require('file-api'), File = FileAPI.File, FileList = FileAPI.FileList, FileReader = FileAPI.FileReader;
var Connection = require('tedious').Connection;
var Request = require('tedious').Request;
var TYPES = require('tedious').TYPES;
var importFile = function() {
var file = new File(fileName);
if (file) {
var reader = new FileReader();
reader.onload = function (evt) {
var data = evt.target.result;
var workbook = XLSX.read(data, {type: 'binary'});
var ws = workbook.Sheets[workbook.SheetNames[0]];
var headerNames = XLSX.utils.sheet_to_json( ws, { header: 1 })[0];
var data = XLSX.utils.sheet_to_json(ws);
var bulkLoad = connection.newBulkLoad(tableName, function (error, rowCount) {
if (error) {
console.log("bulk upload error: " + error);
} else {
console.log('inserted %d rows', rowCount);
}
connection.close();
});
// setup your columns - always indicate whether the column is nullable
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
bulkLoad.addColumn(columnName, columnsAndDataTypes[columnName].dataType, { length: columnsAndDataTypes[columnName].len, nullable: true });
})
data.forEach(function(row) {
var addRow = {}
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
addRow[columnName] = row[columnName];
})
bulkLoad.addRow(addRow);
})
// execute
connection.execBulkLoad(bulkLoad);
};
reader.readAsBinaryString(file);
} else {
console.log("No file!!");
}
};
So, this works for my purpose. But what's the downside ?
If you actually have a long running task capable of blocking the event loop, then putting it on a setTimeout() is not stopping it from blocking the event loop at all. That's the downside. It's just moving the event loop blocking from right now until the next tick of the event loop. The event loop will be blocked the same amount of time either way.
If you just did res.json({status: "OK"}) before running your code, you'd get the exact same result.
If your long running code (which you describe as file and database operations) is actually blocking the event loop and it is properly written using async I/O operations, then the only way to stop blocking the event loop is to move that CPU-consuming work out of the node.js thread.
That is typically done by clustering, moving the work to worker processes or moving the work to some other server. You have to have this work done by another process or another server in order to get it out of the way of the event loop. A setTimeout() by itself won't accomplish that.
child_process.spawn() will accomplish that. So, if you have an actual event loop blocking problem to solve and the I/O is already as async optimized as possible, then moving it to a worker process is a typical node.js solution. You can communicate with that child process in a number of ways, but one possibility would be via stdin and stdout.

Store settimeout id from nodejs in mongodb

I am running a web application using express and nodejs. I have a request to a particular endpoint in which I use settimeout to call a particular function repeatedly after varying time intervals.
For example
router.get ("/playback", function(req, res) {
// Define callback here ...
....
var timeoutone = settimeout(callback, 1000);
var timeouttwo = settimeout(callback, 2000);
var timeoutthree = settimeout(callback, 3000);
});
The settimeout function returns an object with a circular reference. When trying to save this into mongodb i get a stack_overflow error. My aim is to be able to save these objects returned by settimeout into the database.
I have another endpoint called cancel playback which when called, will retrieve these timeout objects and call cleartimeout passing them in as an argument. How do I go about saving these timeout objects to the database ? Or is there a better way of clearing the timeouts than having to save them to the database. Thanks in advance for any help provided.
You cannot save live JavaScript objects in the database! Maybe you can store a string or JSON or similar reference to them, but not the actual object, and you cannot reload them later.
Edit: Also, I've just noticed you're using setTimeout for repeating stuff. If you need to repeat it on regular intervals, why not use setInterval instead?
Here is a simple solution, that would keep indexes in memory:
var timeouts = {};
var index = 0;
// route to set the timeout somewhere
router.get('/playback', function(req, res) {
timeouts['timeout-' + index] = setTimeout(ccb, 1000);
storeIndexValueSomewhere(index)
.then(function(){
res.json({timeoutIndex: index});
index++;
});
}
// another route that gets timeout indexes from that mongodb somehow
req.get('/playback/indexes', handler);
// finally a delete route
router.delete('/playback/:index', function(req, res) {
var index = 'timeout-' + req.params.index;
if (!timeouts[index]) {
return res.status(404).json({message: 'No job with that index'});
} else {
timeouts[index].cancelTimeout();
timeouts[index] = undefined;
return res.json({message: 'Removed job'});
}
});
But this probably would not scale to many millions of jobs.
A more complex solution, and perhaps more appropriate to your needs (depends on your playback job type) could involve job brokers or message queues, clusters and workers that subscribe to something they can listen to for their own job cancel signals etc.
I hope this helps you a little to clear up your requirements.

Execute when both(!) events fire .on('end')

I have a node app that reads two files as streams. I use event.on('end') to then work with the results. The problem is I don't really know how I can wait for BOTH events to trigger 'end'.
What I have now is:
reader1.on('end', function(){
reader2.on('end',function(){
doSomething();
});
});
With small files this works, but if one of the files is very large the app aborts.
Your execution logic is somewhat flawed. You ought to do something like this instead
var checklist = [];
// checklist will contain sort of a counter
function reader_end(){
if(checklist.length == 2 )
// doSomething only if both have been added to the checklist
doSomething();
}
reader1.on('end', function() {
checklist.push('reader1');
// increment the counter
reader_end();
});
reader2.on('end', function() {
checklist.push('reader2');
reader_end();
});
Although there are libraries to better handle this sort of stuff, like Async and Promises.
With Async you'll need to use compose
var r12_done = async.compose(reader1.on, reader2.on);
r12_done('end', function(){
doSomething();
});
Edit: I just noticed that since probably reader1.on is a Stream 'end' event which doesn't have the standard callback argument signature of (err, results), this probably won't work. In that case you should just go with Promise.
With Promise you'll need to first Promisify and then join
var reader1Promise = Promise.promisify(reader1.on)('end');
var reader2Promise = Promise.promisify(reader2.on)('end');
var reader12Promise = Promise.join(reader1Promise, reader1Promise);
reader12Promise.then(function(){
doSomething();
});

Resources