How to properly use database when scaling a NodeJS app? - node.js

I am wondering how I would properly use MySQL when I am scaling my Node.JS app using the cluster module. Currently, I've only come up with two solutions:
Solution 1:
Create a database connection on every "worker".
Solution 2:
Have the database connection on a master process and whenever one of the workers request some data, the master process will return the data. However, using this solution, I do not know how I would be able to get the worker to retrieve the data from the master process.
I (think) I made a "hacky" workaround emitting with a unique number and then waiting for the master process to send the message back to the worker and the event name being the unique number.
If you don't understand what I mean by this, here's some code:
// Worker process
return new Promise (function (resolve, reject) {
process.send({
// Other data here
identifier: <unique number>
})
// having a custom event emitter on the worker
worker.once(<unique number>, function (data) {
// data being the data for the request with the unique number
// resolving the promise with returned data
resolve(data)
})
})
//////////////////////////
// Master process
// Custom event emitter on the master process
master.on(<eventName>, function (data) {
// logic
// Sending data back to worker
master.send(<other args>, data.identifier)
}
What would be the best approach to this problem?
Thank you for reading.

When you cluster in NodeJS, you should assume each process is completely independent. You really shouldn't be relaying messages like this to/from the master process. If you need multiple threads to access the same data, I don't think NodeJS is what you should be using. However, If you're just doing basic CRUD operations with your database, clustering (solution 1) is certainly the way to go.
For example, if you're trying to scale write ops to your database (assuming your database is properly scaled), each write op is independent from another. When you cluster, a single write request will be load balanced to one of your workers. Then in the worker, you delegate the write op to your database asynchronously. In this scenario, there is no need for a master process.

If you've not planned on using a proper microservice architecture where each process would actually have its own database (or perhaps just an in-memory storage), your best bet IMO is to use a connection pool created by the main process and have each child request a connection out of that pool. That's probably the safest approach to avoid issues in the neighborhood of threadsafety errors.

Related

Node.js Cluster Shared Cache

I'm using node-cache to create a local cache, however, the problem I have is that when using the application with PM2 which creates an application cluster the cache is created multiple times, one for each process - this isn't too much of a problem as the cached data is small so memory isn't the issue.
The real problem that I have an API call to my application to flush the cache, however when calling this API it will only flush the cache for the particular process that handles that call.
Is there a way to signal all workers to perform a function?
I did think about using Redis to cache instead as that would make it simpler to only have the one cache, the problem I have with Redis is I'm not sure the best way to scale it, I've currently got 50 applications and wouldn't want to set-up a new Redis database for each application, the alternative was to use ioredis and it's transparent key prefixing for each application but this could cause some security vulnerabilities if one application was to accidentally read data from the other clients application - And I don't believe there is a way to delete all keys just for a particular prefix (i.e. one app/client) as FLUSHALL will remove all keys
What are best practices for sharing cache for clustered node instances, but where there are many instances of the application too - think SAAS application.
Currently, my workaround for this issue is using node-cron to clear the cache every 15mins, however, there are items in the cache that don't really ever change, and there are other items which should be updated as soon as an external tool signals the application to flush the cache via an API call
For anyone looking at this, for my use case, the best method was to use IPC.
I implemented an IPC messenger to pass messages to all processes, I read in the process name from the pm2 config file (app.json) to ensure we send the message to the correct application
// Sender
// The sender can run inside or outside of pm2
var pm2 = require('pm2');
var cfg = require('../app.json');
exports.IPCSend = function (topic, message) {
pm2.connect(function () {
// Find the IDs of who you want to send to
pm2.list(function (err, processes) {
for (var i in processes) {
if (processes[i].name == cfg.apps[0].name) {
console.log('Sending Message To Id:', processes[i].pm_id, 'Name:', processes[i].name)
pm2.sendDataToProcessId(processes[i].pm_id, {
data: {
message: message
},
topic: topic
}, function (err, res) {
console.log(err, res);
});
}
}
});
});
}
// Receiver
// No need to require require('pm2') however the receiver must be running inside of pm2
process.on('message', function (packet) {
console.log(packet);
});

Node JS Socket.IO Emitter (and redis)

I'll give a small premise of what I'm trying to do. I have a game concept in mind which requires multiple players sitting around a table somewhat like poker.
The normal interaction between different players is easy to handle via socket.io in conjunction with node js.
What I'm having a hard time figuring out is; I have a cron job which is running in another process which gets new information every minute which then needs to be sent to each of those players. Since this is a different process I'm not sure how I send certain clients this information.
socket.io does have information for this and I'm quoting it below:
In some cases, you might want to emit events to sockets in Socket.IO namespaces / rooms from outside the context of your Socket.IO processes.
There’s several ways to tackle this problem, like implementing your own channel to send messages into the process.
To facilitate this use case, we created two modules:
socket.io-redis
socket.io-emitter
From what I understand I need these two modules to do what I mentioned earlier. What I do not understand however is why is redis in the equation when I just need to send some messages.
Is it used to just store the messages temporarily?
Any help will be appreciated.
There are several ways to achieve this if you just need to emit after an external event. It depend on what you're using for getting those new data to send :
/* if the other process is an http post incoming you can use for example
express and use your io object in a custom middleware : */
//pass the io in the req object
app.use( '/incoming', (req, res, next) => {
req.io = io;
})
//then you can do :
app.post('/incoming', (req, res, next) => {
req.io.emit('incoming', req.body);
res.send('data received from http post request then send in the socket');
})
//if you fetch data every minute, why don't you just emit after your job :
var job = sheduledJob('* */1 * * * *', io => {
axios.get('/myApi/someRessource').then(data => io.emit('newData', data.data));
})
Well in the case of socket.io providing those, I read into that you actually need both. However this shouldn't necessarily be what you want. But yes, redis is probably just used to store data temporarily, where it also does a really good job, by being close to what a message queue does.
Your cron now wouldn't need a message queue or similar behaviour.
My suggestion though would be to run the cron with some node package from within your process as a child_process hook onto it's readable stream and then push directly to your sockets.
If the cron job process is also a nodejs process, you can exchange data through redis.io pub-sub client mechanism.
Let me know what is your cron job process in and in case further help required in pub-sub mechanism..
redis is one of the memory stores used by socket.io(in case you configure)
You must employ redis only if you have multi-server configuration (cluster) to establish a connection and room/namespace sync between those node.js instances. It has nothing to do with storing data in this case, it works as a pub/sub machine.

NodeJS/SailsJS app database block

Understand that NodeJS is a single thread process, but if I have to run a long process database process, do I need to start a web worker to do that?
For example, in a sails JS app, I can call database to create record, but if the database call take times to finish, it will block other user from access the database.
Below are a sample code i tried
var test = function(cb) {
for(i=0;i<10000;i++) {
Company.create({companyName:'Walter Jr'+i}).exec(cb);
}
}
test(function(err,result){
});
console.log("return to client");
return res.view('cargo/view',{
model:result
});
On first request, I see the return almost instant. But if I request it again, I will need to wait for all the records being entered before It will return me the view again.
What is the common practice for this kinda of blocking issue?
Node.js has non-blocking, asynchronous IO.
read the article below it will help you to restructure your code
http://hueniverse.com/2011/06/29/the-style-of-non-blocking/
Also start using Promises to help you avoid writing blocking IO.

How to lock (Mutex) in NodeJS?

There are external resources (accessing available inventories through an API) that can only be accessed one thread at a time.
My problems are:
NodeJS server handles requests concurrently, we might have multiple requests at the same time trying to reserve inventories.
If I hit the inventory API concurrently, then it will return duplicate available inventories
Therefore, I need to make sure that I am hitting the inventory API one thread at a time
There is no way for me to change the inventory API (legacy), therefore I must find a way to synchronize my nodejs server.
Note:
There is only one nodejs server, running one process, so I only need to synchronize the requests within that server
Low traffic server running on express.js
I'd use something like the async module's queue and set its concurrency parameter to 1. That way, you can put as many tasks in the queue as you need to run, but they'll only run one at a time.
The queue would look something like:
var inventoryQueue = async.queue(function(task, callback) {
// use the values in "task" to call your inventory API here
// pass your results to "callback" when you're done
}, 1);
Then, to make an inventory API request, you'd do something like:
var inventoryRequestData = { /* data you need to make your request; product id, etc. */ };
inventoryQueue.push(inventoryRequestData, function(err, results) {
// this will be called with your results
});

Difference between using getConnection() and using pool directly in node.js with node-mysql module?

The documentation states that you can either use the pool directly with:
pool.query();
or get a connection manually and then run a query:
pool.getConnection(function(err, connection) {
// Use the connection
connection.query( 'SELECT something FROM sometable', function(err, rows) {
// And done with the connection.
connection.release();
// Don't use the connection here, it has been returned to the pool.
});
});
The second option is a lot of code that has to be repeated every time you need to run a query. Is it safe to just use the pool directly? Does pool.query() release the connection back into the pool when it's done?
Question kindly answered by developer on github:
https://github.com/felixge/node-mysql/issues/857#issuecomment-47382419
Is it safe to just use the pool directly?
Yes, as long as you are doing single statement queries. The only
reason you couldn't use that in your example above is because you are
making multiple queries that need to be done in sequential order on
the same connection. Calling pool.query() may be different connections
each time, so things like FOUND_ROWS() will not work as you intended
if the connection is not the same as the one that did the
SQL_CALC_FOUND_ROWS query. Using the long method allows you to hold
onto the same connection for all your queries.
Does pool.query() release the connection back into the pool when it's
done?
Yes

Resources