Node.js Cluster Shared Cache - node.js

I'm using node-cache to create a local cache, however, the problem I have is that when using the application with PM2 which creates an application cluster the cache is created multiple times, one for each process - this isn't too much of a problem as the cached data is small so memory isn't the issue.
The real problem that I have an API call to my application to flush the cache, however when calling this API it will only flush the cache for the particular process that handles that call.
Is there a way to signal all workers to perform a function?
I did think about using Redis to cache instead as that would make it simpler to only have the one cache, the problem I have with Redis is I'm not sure the best way to scale it, I've currently got 50 applications and wouldn't want to set-up a new Redis database for each application, the alternative was to use ioredis and it's transparent key prefixing for each application but this could cause some security vulnerabilities if one application was to accidentally read data from the other clients application - And I don't believe there is a way to delete all keys just for a particular prefix (i.e. one app/client) as FLUSHALL will remove all keys
What are best practices for sharing cache for clustered node instances, but where there are many instances of the application too - think SAAS application.
Currently, my workaround for this issue is using node-cron to clear the cache every 15mins, however, there are items in the cache that don't really ever change, and there are other items which should be updated as soon as an external tool signals the application to flush the cache via an API call

For anyone looking at this, for my use case, the best method was to use IPC.
I implemented an IPC messenger to pass messages to all processes, I read in the process name from the pm2 config file (app.json) to ensure we send the message to the correct application
// Sender
// The sender can run inside or outside of pm2
var pm2 = require('pm2');
var cfg = require('../app.json');
exports.IPCSend = function (topic, message) {
pm2.connect(function () {
// Find the IDs of who you want to send to
pm2.list(function (err, processes) {
for (var i in processes) {
if (processes[i].name == cfg.apps[0].name) {
console.log('Sending Message To Id:', processes[i].pm_id, 'Name:', processes[i].name)
pm2.sendDataToProcessId(processes[i].pm_id, {
data: {
message: message
},
topic: topic
}, function (err, res) {
console.log(err, res);
});
}
}
});
});
}
// Receiver
// No need to require require('pm2') however the receiver must be running inside of pm2
process.on('message', function (packet) {
console.log(packet);
});

Related

Redis publish memory leak?

I know that there is already many questions like this, but i don't find one that fits my implementation.
I'm using redis in a Node.js env, and it feels like redis.publish is leaking some memory. I expect it to be some kind of "backpressure" thing, like seen here:
Node redis publisher consuming too much memory
But to my understanding: Node needs to release that kind of pressure in a synchronous context, otherwise, the node event loop won't be called, and the GC won't be called either.
My program looks like that:
const websocketApi = new WebsocketApi()
const currentState = {}
websocketApi.connect()
websocketApi.on('open', () => {
channels.map((channel) => websocketApi.subscribeChannel(channel))
})
websocketApi.on('message', (message) => {
const ob = JSON.parse(message)
if (currentState[ob.id]) {
currentState[ob.id] = update(currentState[ob.id], ob.data)
} else {
currentState[ob.id] = ob.data
}
const payload = {
channel: ob.id,
info: currentState[ob.id],
timestamp: Date.now(),
type: 'newData'
}
// when i remove this part, the memory is stable
redisClient.publish(payload.channel, JSON.stringify(payload))
})
// to reconnect in case of error
websocketApi.on('close', () =>
websocketApi.connect())
It seems that the messages are too close from each other, so it doesn't have time to release the strings hold in the redis.publish.
Do you have any idea of what is wrong in this code ?
EDIT: More specifically, what I can observe when I do memory dumps of my application:
The memory is staturated with string that are my Stringified JSON payloads, and "chunks" of messages that are send via Redis itself. Their ref are hold inside the redis client manly in variables called chunk.
Some string payloads are still released, but I create them way faster.
When I don't publish the messages via Redis, the "currentState" variable grows until a point then don't grow anymore. It obviously has a big RAM impact, but it's expected. The rest is fine and the application is stable around 400mb, and it explodes whith the redis publisher (PM2 restarts it cause it reaches max RAM capacity)
My feeling here is that I ask redis to publish way more that it can handle, and redis doesn't have the time to finish to publish the messages. It still holds all the context, so it doesn't release anything. I may need some kind of "queue" to let redis release some context and finish publishing the messages. Is that really a possibility or am I becoming crazy ?
Basically, every loop in my program is "independent". Is it possible to have as many redis clients as I have got loops ? is it a better idea ? (IMHO, node is mono threaded, so it won't help, but it may help the V8 to better track down memory references and releasing memory)
The redis client buffers commands if the client is not connected either because it has not yet connected or its connection fails or it fails to connect.
Make sure that you can connect to the redis server. Make sure that your program is connected to the server. I would suggest adding a listener to redisClient.on('connect') if that is not emitted the client never connected.
If you are connected, the client shouldn't be buffering but to make the problem appear sooner disable the offline queue, pass the option enable_offline_queue: false to createClient this will cause attempts to send commands when not connected fail.
You should attach an error listener to the redisClient: redisClient.on('error', console.error.bind(console)). This might yield a message as to why the client is buffering.

How to properly use database when scaling a NodeJS app?

I am wondering how I would properly use MySQL when I am scaling my Node.JS app using the cluster module. Currently, I've only come up with two solutions:
Solution 1:
Create a database connection on every "worker".
Solution 2:
Have the database connection on a master process and whenever one of the workers request some data, the master process will return the data. However, using this solution, I do not know how I would be able to get the worker to retrieve the data from the master process.
I (think) I made a "hacky" workaround emitting with a unique number and then waiting for the master process to send the message back to the worker and the event name being the unique number.
If you don't understand what I mean by this, here's some code:
// Worker process
return new Promise (function (resolve, reject) {
process.send({
// Other data here
identifier: <unique number>
})
// having a custom event emitter on the worker
worker.once(<unique number>, function (data) {
// data being the data for the request with the unique number
// resolving the promise with returned data
resolve(data)
})
})
//////////////////////////
// Master process
// Custom event emitter on the master process
master.on(<eventName>, function (data) {
// logic
// Sending data back to worker
master.send(<other args>, data.identifier)
}
What would be the best approach to this problem?
Thank you for reading.
When you cluster in NodeJS, you should assume each process is completely independent. You really shouldn't be relaying messages like this to/from the master process. If you need multiple threads to access the same data, I don't think NodeJS is what you should be using. However, If you're just doing basic CRUD operations with your database, clustering (solution 1) is certainly the way to go.
For example, if you're trying to scale write ops to your database (assuming your database is properly scaled), each write op is independent from another. When you cluster, a single write request will be load balanced to one of your workers. Then in the worker, you delegate the write op to your database asynchronously. In this scenario, there is no need for a master process.
If you've not planned on using a proper microservice architecture where each process would actually have its own database (or perhaps just an in-memory storage), your best bet IMO is to use a connection pool created by the main process and have each child request a connection out of that pool. That's probably the safest approach to avoid issues in the neighborhood of threadsafety errors.

How to create Application Variable in Node Js Application

I got a requirement and in that requirement, I have to use a variable that has the same value for all running process and if I change form a process then it should be reflected in all process. I have the idea in java we have an application variable.
In Node js, I used only Heroku variable and I do not have experience of other node variables so if anyone having any idea please suggest me because we cannot update Heroku variable, These variable work as constant.
Thanks
Finally, I got an answer after doing lots of R&D. We can create a property in process object. Process a predefined object in the node server.
process.env.variable_name = 'Value';
You can access this value in any process and update in any process. Value for this process always the same for all process.
console.log(process.env.variable_name);
I would suggest using something like Redis to share data between processes, there is a great node module for this: https://github.com/NodeRedis/node_redis. You can also share data between processes that are not even authored in Node.js.
Also the data can be stored to non-volatile storage, meaning you don't lose it if the processes recycle.
Data can also be shared across machines if necessary using one Redis db.
e.g.
var redis = require("redis");
var redisConfig = {
"host": "127.0.0.1"
};
var redisClient = redis.createClient(redisConfig);
client.on("error", function (err) {
console.log("Error " + err);
});
client.set("key", "value");
client.get("key", function(err, reply) {
console.log(reply);
});

How to check the global file or directory against all deployed nodes?

I'm working on an IoT application, where I have a node that keeps a list of what devices are connected to it that updates when a new message arrives.
For now I'm using the context to save the data, which is wiped on restart.
Using the node's id, I could save the list on a global JSON file, or have a file per node, but I run into a wall when it comes to maintenance.
Whenever I delete a node, its info is now trash. Is there a better way than to just check the global file or directory against all deployed nodes and delete what I don't need? And if there isn't, how do I get all current nodes?
Nodes have a on close callback which you could use to clean up when a node is deleted. Details can be found here
There are 2 versions of the callback, one that handles async actions and one that doesn't.
this.on('close', function(done) {
doSomethingWithACallback(function() {
done();
});
});
and
this.on('close', function() {
// tidy up any state
});

How to lock (Mutex) in NodeJS?

There are external resources (accessing available inventories through an API) that can only be accessed one thread at a time.
My problems are:
NodeJS server handles requests concurrently, we might have multiple requests at the same time trying to reserve inventories.
If I hit the inventory API concurrently, then it will return duplicate available inventories
Therefore, I need to make sure that I am hitting the inventory API one thread at a time
There is no way for me to change the inventory API (legacy), therefore I must find a way to synchronize my nodejs server.
Note:
There is only one nodejs server, running one process, so I only need to synchronize the requests within that server
Low traffic server running on express.js
I'd use something like the async module's queue and set its concurrency parameter to 1. That way, you can put as many tasks in the queue as you need to run, but they'll only run one at a time.
The queue would look something like:
var inventoryQueue = async.queue(function(task, callback) {
// use the values in "task" to call your inventory API here
// pass your results to "callback" when you're done
}, 1);
Then, to make an inventory API request, you'd do something like:
var inventoryRequestData = { /* data you need to make your request; product id, etc. */ };
inventoryQueue.push(inventoryRequestData, function(err, results) {
// this will be called with your results
});

Resources