Node memory leak while using redis brpop - node.js

I'm having memory leak issue in a node application. The application is subscribed to a topic in redis and on receiving a message pops a message from a list using brpop. There are a number instances of this application running in production so one instance might be blocking for a message in the redis list. Here is the code snippet which consumes a message from redis:
private doWork(): void {
this.storage.subscribe("newRoom", (message: [any, any]) => {
const [msg] = message;
if (msg === "room") {
return new Promise( async (resolve, reject) => {
process.nextTick( async () => {
const roomIdData = await this.storage.brpop("newRoomList"); // a promisified version of brpop with timeout of 5s
if (roomIdData) {
const roomId = roomIdData[1];
this.createRoom(roomId);
}
});
resolve();
});
}
});
}
I've tried debugging the memory leaks using chrome debugger and I've observed too many closure objects getting created. I suspect that it's due to this code as I'm able to see the redis client object name in the closure object but I'm not able to figure out how I might be able to fix it. I added process.nextTick but it didn't help. I'm using node-redis client for connecting to redis. Attaching an object retainer map screenshot from the chrome debugger tool.
P.S. blk is the redis client object name used exclusively for blocking commands i.e. brpop.
Edit: Replaced brpop with rpop and we're seeing a significant drop in memory growth rate but now the distribution of messages between the workers has gone skewed.

Related

Cloud Run PubSub high latency

I'm building a microservice application consisting of many microservices build with Node.js and running on Cloud Run. I use PubSub in several different ways:
For streaming data daily. The microservices responsible for gathering analytical data from different advertising services (Facebook Ads, LinkedIn Ads, etc.) use PubSub to stream data to a microservice responsible for uploading data to Google BigQuery. There also are services that stream a higher load of data (> 1 Gb) from CRMs and other services by splitting it into smaller chunks.
For messaging among microservices about different events that don't require an immediate response.
Earlier, I experienced some insignificant latency with PubSub. I know it's an open issue considering up to several seconds latency with low messages throughput. But in my case, we are talking about several minutes latency.
Also, I occasionally get an error message
Received error while publishing: Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.
I this case a message is not sent at all or is highly delayed.
This is how my code looks like.
const subscriptions = new Map<string, Subscription>();
const topics = new Map<string, Topic>();
const listenForMessages = async (
subscriptionName: string,
func: ListenerCallback,
secInit = 300,
secInter = 300
) => {
let logger = new TestLogger("LISTEN_FOR_MSG");
let init = true;
const _setTimeout = () => {
let timer = setTimeout(() => {
console.log(`Subscription to ${subscriptionName} cancelled`);
subscription.removeListener("message", messageHandler);
}, (init ? secInit : secInter) * 1000);
init = false;
return timer;
};
const messageHandler = async (msg: Message) => {
msg.ack();
await func(JSON.parse(msg.data.toString()));
// wait for next message
timeout = _setTimeout();
};
let subscription: Subscription;
if (subscriptions.has(subscriptionName)) {
subscription = subscriptions.get(subscriptionName);
} else {
subscription = pubSubClient.subscription(subscriptionName);
subscriptions.set(subscriptionName, subscription);
}
let timeout = _setTimeout();
subscription.on("message", messageHandler);
console.log(`Listening for messages: ${subscriptionName}`);
};
const publishMessage = async (
data: WithAnyProps,
topicName: string,
options?: PubOpt
) => {
const serializedData = JSON.stringify(data);
const dataBuffer = Buffer.from(serializedData);
try {
let topic: Topic;
if (topics.has(topicName)) {
topic = topics.get(topicName);
} else {
topic = pubSubClient.topic(topicName, {
batching: {
maxMessages: options?.batchingMaxMessages,
maxMilliseconds: options?.batchingMaxMilliseconds,
},
});
topics.set(topicName, topic);
}
let msg = {
data: dataBuffer,
attributes: options.attributes,
};
await topic.publishMessage(msg);
console.log(`Publishing to ${topicName}`);
} catch (err) {
console.error(`Received error while publishing: ${err.message}`);
}
};
A listenerForMessage function is triggered by an HTTP request.
What I have already checked
PubSub client is created only once outside the function.
Topics and Subscriptions are reused.
I made at least one instance of each container running to eliminate the possibility of delays triggered by cold start.
I tried to increase the CPU and Memory capacity of containers.
batchingMaxMessages and batchingMaxMilliseconds are set to 1
I checked that the latest version of #google-cloud/pubsub is installed.
Notes
High latency problem occurs only in the cloud environment. With local tests, everything works well.
Timeout error sometimes occurs in both environments.
The problem was in my understanding of Cloud Run Container's lifecycle. I used to send HTTP response 202 while having PubSub working in the background. After sending the response, the container switched to the idling state, what looked like high latency in my logs.

Querying DB2 every 15 seconds causing memory leak in NodeJS

I have an application which checks for new entries in DB2 every 15 seconds on the iSeries using IBM's idb-connector. I have async functions which return the result of the query to socket.io which emits an event with the data included to the front end. I've narrowed down the memory leak to the async functions. I've read multiple articles on common memory leak causes and how to diagnose them.
MDN: memory management
Rising Stack: garbage collection explained
Marmelab: Finding And Fixing Node.js Memory Leaks: A Practical Guide
But I'm still not seeing where the problem is. Also, I'm unable to get permission to install node-gyp on the system which means most memory management tools are off limits as memwatch, heapdump and the like need node-gyp to install. Here's an example of what the functions basic structure is.
const { dbconn, dbstmt } = require('idb-connector');// require idb-connector
async function queryDB() {
const sSql = `SELECT * FROM LIBNAME.TABLE LIMIT 500`;
// create new promise
let promise = new Promise ( function(resolve, reject) {
// create new connection
const connection = new dbconn();
connection.conn("*LOCAL");
const statement = new dbstmt(connection);
statement.exec(sSql, (rows, err) => {
if (err) {
throw err;
}
let ticks = rows;
statement.close();
connection.disconn();
connection.close();
resolve(ticks.length);// resolve promise with varying data
})
});
let result = await promise;// await promise
return result;
};
async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};
Any ideas on where the leak is? Am i using async/await incorrectly? Or else am i creating/destroying DB connections improperly? Any help on figuring out why this code is leaky would be much appreciated!!
Edit: Forgot to mention that i have limited control on the backend processes as they are handled by another team. I'm only retrieving the data they populate the DB with and adding it to a web page.
Edit 2: I think I've narrowed it down to the DB connections not being cleaned up properly. But, as far as i can tell I've followed the instructions suggested on their github repo.
I don't know the answer to your specific question, but instead of issuing a query every 15 seconds, I might go about this in a different way. Reason being that I don't generally like fishing expeditions when the environment can tell me an event occurred.
So in that vein, you might want to try a database trigger that loads the key to the row into a data queue on add, or even change or delete if necessary. Then you can just put in an async call to wait for a record on the data queue. This is more real time, and the event handler is only called when a record shows up. The handler can get the specific record from the database since you know it's key. Data queues are much faster than database IO, and place little overhead on the trigger.
I see a couple of potential advantages with this method:
You aren't issuing dozens of queries that may or may not return data.
The event would fire the instant a record is added to the table, rather than 15 seconds later.
You don't have to code for the possibility of one or more new records, it will always be 1, the one mentioned in the data queue.
yes you have to close connection.
Don't make const data. you don't need promise by default statement.exec is async and handles it via return result;
keep setTimeout(getNewData, 2000);// check again in 2 seconds
line outside getNewData otherwise it becomes recursive infinite loop.
Sample code
const {dbconn, dbstmt} = require('idb-connector');
const sql = 'SELECT * FROM QIWS.QCUSTCDT';
const connection = new dbconn(); // Create a connection object.
connection.conn('*LOCAL'); // Connect to a database.
const statement = new dbstmt(dbconn); // Create a statement object of the connection.
statement.exec(sql, (result, error) => {
if (error) {
throw error;
}
console.log(`Result Set: ${JSON.stringify(result)}`);
statement.close(); // Clean up the statement object.
connection.disconn(); // Disconnect from the database.
connection.close(); // Clean up the connection object.
return result;
});
*async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
setTimeout(getNewData, 2000);// check again in 2 seconds
};*
change to
**async function getNewData() {
const data = await queryDB();// get new data
io.emit('newData', data)// push to front end
};
setTimeout(getNewData, 2000);// check again in 2 seconds**
First thing to notice is possible open database connection in case of an error.
if (err) {
throw err;
}
Also in case of success connection.disconn(); and connection.close(); return boolean values that tell is operation successful (according to documentation)
Always possible scenario is to pile up connection objects in 3rd party library.
I would check those.
This was confirmed to be a memory leak in the idb-connector library that i was using. Link to github issue Here. Basically there was a C++ array that never had it's memory deallocated. A new version was added and the commit can viewed Here.

What's the default timeout of ioredis send command for any redis call

I am using ioredis with a node application, and due to some issues at cluster I started getting:
Too many Cluster redirections. Last error: Error: Connection is closed.
Due to which all of my redis calls failed and after a very long time ranging from 1sec to 130secs.
Is there any default timeout for ioredis library which it uses to assert the call after sending command to execute to redis server?
Higher failure time of range 100secs on sending commands to redis server, is it because the the high queue size at redis due to cluster failure?
Sample code :
this.getData = function(bucketName, userKey) {
let cacheKey = cacheHelper.formCacheKey(userKey, bucketName);
let serviceType = cacheHelper.getServiceType(bucketName, cacheConfig.service_config);
let log_info = _.get(cacheConfig.service_config, 'logging_options.cache_info_level', true);
let startTime = moment();
let dataLength = null;
return Promise.try(function(){
validations([cacheKey], ['cache_key'], bucketName, serviceType, that.currentService);
return cacheStore.get(serviceType, cacheKey);
})
.then(function(data) {
dataLength = (data || '').length;
return cacheHelper.uncompress(data);
})
.then(function(uncompressedData) {
let endTime = moment();
let responseTime = endTime.diff(startTime, 'miliseconds');
if(!uncompressedData) {
if(log_info) logger.consoleLog(bucketName, 'getData', 'miss', cacheKey, that.currentService,
responseTime, dataLength);
} else {
if(log_info) logger.consoleLog(bucketName, 'getData', 'success', cacheKey, that.currentService,
responseTime, dataLength);
}
return uncompressedData;
})
.catch(function(err) {
let endTime = moment();
let responseTime = endTime.diff(startTime, 'miliseconds');
logger.error(bucketName, 'getData', err.message, userKey, that.currentService, responseTime);
throw cacheResponse.error(err);
});
};
Here
logger.error(bucketName, 'getData', err.message, userKey, that.currentService, responseTime);
started giving response time of range 1061ms to 109939ms.
Please provide some inputs.
As you can read from this ioredis issue, there isn't a per-command timeout configuration.
As suggested in the linked comment, you can use a Promise-based strategy as a workaround. Incidentally, this is the same strategy used by the ioredis-timeout plugin that wraps the original command in a Promise.race() method:
//code from the ioredis-timeout lib
return Promise.race([
promiseDelay(ms, command, args),
originCommand.apply(redis, args)
]);
So you can use the plugin or this nice race timeout technique to add a timeout functionality on top of the redis client. Keep in mind that the underlying command will not be interrupted.
I was facing a similar issue which I have described in detail here: How to configure Node Redis client to throw errors immediately, when connection has failed? [READ DETAILS]
The fix was actually quite simple, just set enable_offline_queue to false. This was with Node Redis, so you'll have to figure out the equivalent for IORedis. Setting this to false, will make all commands throw an exception immediately which you can process in a catch block and continue instead of waiting for some timeout.
Do keep in mind that, with enable_offline_queue set to false, the commands that you issue while there's some connection issue with the server will never be executed.

Multiple AWS Instances and Node events

I have an implementation in node where an API when called does some processing and waits for an event from another function before returning the response. This works fine when ran locally and when running in a single instance in AWS but when multiple instances are involved there are some issues which I'm assuming is because the API is being called from one instance and the emitter is being emitted in another instance. Is there any way to keep the listeners and emitters same across all instances?
Update :
After some research I found that using an application loadbalancer with some logic for routing can help with this issue. I am marking the answer below as correct because while it did not help me with AWS autoscaling, it did help me find an alernate solution to my problem.
AFAIU you think that event emitted from one process is being handled in a different process, but it never would be the case from what I know because each process has its own memory and also events would be associated with the process only.
I have added a sample code that demonstrates what I meant by it. Maybe if you post the code you are referring to, we could check what went wrong.
const cluster = require("cluster");
const EventEmitter = require("events");
if (cluster.isMaster) {
cluster.fork();
const myEE = new EventEmitter();
myEE.on("foo", arg =>
console.log("emitted from ", arg, "received in master")
);
setTimeout(() => {
myEE.emit("foo", "master");
}, 1000);
} else {
const myEE = new EventEmitter();
myEE.on("foo", arg => console.log("emitted from", arg, "received in worker"));
setTimeout(() => {
myEE.emit("foo", "client");
}, 2000);
}

How to find source of memory leak in Node.JS App

I have a memory leak in Node.js/Express app. The app dies after 3-5 days with the following log message:
FATAL ERROR: JS Allocation failed - process out of memory
I setup a server without users connecting, and it still crashes, so I know leak is originating in the following code which runs in the background to sync api changes to the db.
poll(config.refreshInterval)
function poll(refreshRate) {
return apiSync.syncDatabase()
.then(function(){
return wait(refreshRate)
})
.then(function(){
return poll(refreshRate)
})
}
var wait = function wait(time) {
return new Promise(function(resolve){
applog.info('waiting for %s ms..', time)
setTimeout(function(){
resolve(true)
},time)
})
}
What techniques are available for profiling the heap to find the source object(s) of what is taking all the memory?
This takes awhile to crash, so I would need something that logs and I can come back later and analyze.
Is there any option like Java's JVM flag -XX:HeapDumpOnOutOfMemoryError ?
Check out node-memwatch.
It provides a heap diff class:
var hd = new memwatch.HeapDiff();
// your code here ...
var diff = hd.end();
It also has event emitters for leaks:
memwatch.on('leak', function(info) {
// look at info to find out about what might be leaking
});

Resources