How to lock (Mutex) in NodeJS? - node.js

There are external resources (accessing available inventories through an API) that can only be accessed one thread at a time.
My problems are:
NodeJS server handles requests concurrently, we might have multiple requests at the same time trying to reserve inventories.
If I hit the inventory API concurrently, then it will return duplicate available inventories
Therefore, I need to make sure that I am hitting the inventory API one thread at a time
There is no way for me to change the inventory API (legacy), therefore I must find a way to synchronize my nodejs server.
Note:
There is only one nodejs server, running one process, so I only need to synchronize the requests within that server
Low traffic server running on express.js

I'd use something like the async module's queue and set its concurrency parameter to 1. That way, you can put as many tasks in the queue as you need to run, but they'll only run one at a time.
The queue would look something like:
var inventoryQueue = async.queue(function(task, callback) {
// use the values in "task" to call your inventory API here
// pass your results to "callback" when you're done
}, 1);
Then, to make an inventory API request, you'd do something like:
var inventoryRequestData = { /* data you need to make your request; product id, etc. */ };
inventoryQueue.push(inventoryRequestData, function(err, results) {
// this will be called with your results
});

Related

Firebase Functions: How to maintain 'app-global' API client?

How can I achieve an 'app-wide' global variable that is shared across Cloud Function instances and function invocations? I want to create a truly 'global' object that is initialized only once per the lifetime of all my functions.
Context:
My app's entire backend is Firestore + Firebase Cloud Functions. That is, I use a mix of background (Firestore) triggers and HTTP functions to implement backend logic. Additionally, I rely on a 3rd-party location service to continually listen to location updates from sensors. I want just a single instance of the client on which to subscribe to these updates.
The problem is that Firebase/Google Cloud Functions are stateless, meaning that function instances don't share memory/objects/state. If I call functionA, functionB, functionC, there's going to be at least 3 instances of locationService clients created, each listening separately to the 3rd party service so we end up with duplicate invocations of the location API callback.
Sample code:
// index.js
const functions = require("firebase-functions");
exports.locationService = require('./location_service');
this.locationService.initClient();
// define callable/HTTP functions & Firestore triggers
...
and
// location_service.js
var tracker = require("third-party-tracker-js");
const self = (module.exports = {
initClient: function () {
tracker.initialize('apiKey')
.then((client)=>{
client.setCallback(async function(payload) {
console.log("received location update: ", payload)
// process the payload ...
// with multiple function instances running at once, we receive as many callbacks for each location update
})
client.subscribeProject()
.then((subscription)=>{
subscription.subscribe()
.then((subscribeMsg)=>{
console.log("subscribed to project with message: ", subscribeMsg); // success
});
// subscription.unsubscribe(); // ??? at what point should we unsubscribe?
})
.catch((err)=>{
throw(err)
})
})
.catch((err)=>{
throw(err)
})
},
});
I realize what I'm trying to do is roughly equivalent to implementing a daemon in a single-process environment, and it appears that serverless environments like Firebase/Google Cloud Functions aren't designed to support this need because each instance runs as its own process. But I'd love to hear any contrary ideas and possible workarounds.
Another idea...
Inspired by this related SO post and the official GCF docs on stateless functions, I thought about using Firestore to persist a tracker value that allows us to conditionally initialize the API client. Roughly like this:
// read value from db; only initialize the client if there's no valid subscription
let locSubscriberActive = await getSubscribeStatusFromDb();
if (!locSubscriberActive) {
this.locationService.initClient();
}
// in `location_service.js`, do setSubscribeStatusToDb(); // set flag to true when we call subscribe(). reset when we get terminated
The problem faced: at what point do I unset/reset that value? Intuitively, I would do so the moment the function instance that initialized the client gets recycled/killed. However, it appears that it is not possible to know when a Firebase Cloud Function instance is terminated? I searched everywhere but couldn't find docs on how to detect such an event...
What you're trying to do is not at all supported in Cloud Functions. It's important to realize that there may be any number of server instances allocated for each deployed function. That's how Cloud Functions scales up and down to match the load on the function in a cost-effective way. These instances might be terminated at any time for any reason. You have no indication when an instance terminates.
Also, instances are not capable of performing any computation when they are idle. CPU resources are clamped down after a function terminates, and are spun up again when the next function is invoked on that instance. You can't have any "daemon" code running when a function is not actively being invoked. I don't know what your locationService does, but it is certainly doing nothing at all after a function terminates, regardless of how it terminated.
For any sort of long-running or daemon-like code, Cloud Functions is not a suitable product. You should instead consider also using another product that lets you run code 24/7 without disruptions. App Engine and Compute Engine are viable alternatives, and you will have to think carefully about if and how you want their server instances to scale with load.

How does NodeJS process multiple GET requests from different users/browsers?

I'd like to know how does NodeJS process multiple GET requests from different users/browsers which have event emitted to return the results? I'd like to think of it as each time a user executes the GET request, it's as if a new session is started for that user.
For example if I have this GET request
var tester = require('./tester-class');
app.get('/triggerEv', async function(req, res, next) {
// Start the data processing
tester.startProcessing('some-data');
// tester has event emitters that are triggered when processing is complete (success or fail)
tester.on('success', function(data) {
return res.send('success');
}
tester.on('fail', function(data) {
return res.send('fail');
}
}
What I'm thinking is that if I open a browser and run this GET request by passing some-data and start processing. Then open another browser to execute this GET request with different data (to simulate multiple users accessing it at the same time), it will overwrite the previous startProcessing function and rerun it again with the new data.
So if multiple users execute this GET request at the same time, would it handle it separately for each user as if it was different and independent sessions then return when there's a response for each user's sessions? Or will it do as I mentioned above (this case I will have to somehow manage different sessions for each user that triggers this GET request)?
I want to make it so that each user that executes this GET request doesn't interfere with other users that also execute this GET request at the same time and the correct response is returned for each user based on their own data sent to the startProcessing function.
Thanks, I hope I'm making sense. Will clarify if not.
If you're sharing the global tester object among different requests, then the 2nd request will interfere with the first request. Since all incoming requests use the same global environment in node.js, the usual model is that any request that may be "in-flight" for awhile needs to create its own resources and keep them for itself. Then, if some other request arrives while the first one is still waiting for something to complete, then it will also create its own resources and the two will not conflict.
The server environment does not have a concept of "sessions" in the way you're using the term. There is no separate server-session or server state that each request lives in other than the request and response objects that are created for each incoming request. This is not like PHP - there is not a whole new interpreter state for each request.
I want to make it so that each user that executes this GET request doesn't interfere with other users that also execute this GET request at the same time and the correct response is returned for each user based on their own data sent to the startProcessing function.
Then, don't share any resources between requests and don't use any objects that have global state. I don't know what your tester is, but one way to keep multiple requests separate from each other is to just make a new tester object for each request so they can each use it to their heart's content without any conflict.

How to properly use database when scaling a NodeJS app?

I am wondering how I would properly use MySQL when I am scaling my Node.JS app using the cluster module. Currently, I've only come up with two solutions:
Solution 1:
Create a database connection on every "worker".
Solution 2:
Have the database connection on a master process and whenever one of the workers request some data, the master process will return the data. However, using this solution, I do not know how I would be able to get the worker to retrieve the data from the master process.
I (think) I made a "hacky" workaround emitting with a unique number and then waiting for the master process to send the message back to the worker and the event name being the unique number.
If you don't understand what I mean by this, here's some code:
// Worker process
return new Promise (function (resolve, reject) {
process.send({
// Other data here
identifier: <unique number>
})
// having a custom event emitter on the worker
worker.once(<unique number>, function (data) {
// data being the data for the request with the unique number
// resolving the promise with returned data
resolve(data)
})
})
//////////////////////////
// Master process
// Custom event emitter on the master process
master.on(<eventName>, function (data) {
// logic
// Sending data back to worker
master.send(<other args>, data.identifier)
}
What would be the best approach to this problem?
Thank you for reading.
When you cluster in NodeJS, you should assume each process is completely independent. You really shouldn't be relaying messages like this to/from the master process. If you need multiple threads to access the same data, I don't think NodeJS is what you should be using. However, If you're just doing basic CRUD operations with your database, clustering (solution 1) is certainly the way to go.
For example, if you're trying to scale write ops to your database (assuming your database is properly scaled), each write op is independent from another. When you cluster, a single write request will be load balanced to one of your workers. Then in the worker, you delegate the write op to your database asynchronously. In this scenario, there is no need for a master process.
If you've not planned on using a proper microservice architecture where each process would actually have its own database (or perhaps just an in-memory storage), your best bet IMO is to use a connection pool created by the main process and have each child request a connection out of that pool. That's probably the safest approach to avoid issues in the neighborhood of threadsafety errors.

Does the node.js express framework create a new lightweight process per client connection?

Say this code is run inside of a node.js express application. Say two different clients request the index resource. Call these clients ClientA and ClientB. Say ClientA requests the index resource before ClientB. In this case the console will log the value 1 for ClientA and the console will log the value 2 for ClientB. My main question is: Does each client request get its own lightweight process with the router being the shared code portion between those processes, the variables visible to router but not part of the router being the shared heap and of course each client then gets their own stack? My sub questions is: If yes to my main question then in this example each of these clients would have to queue waiting for the lock the global_counter before incrementing,correct?
var global_counter = 0;
router.get('/', function (req, res) {
global_counter += 1;
console.log(global_counter);
res.render('index');
});
Nope. Single thread/process. Concurrency is accomplished via a work queue. Some ways to get stuff into the work queue include setTimeout() and nexttick(). Check out http://howtonode.org/understanding-process-next-tick
Only one thing is running at a time, so no need to do any locking.
It takes a while to get your brain to warm up to the idea.

How to use newrelic.createBackgroundTransaction without a handle function

I am looking into using newrelic APM to monitor certain parts of our codebase.
I want to watch transactions that are not simple HTTP calls, but background processes. These transactions are completed by worker processes and we want to monitor them in the main part of the app.
Pseudo code:
var fork = childProcess.spawn('node', ['--harmony', 'path-to-worker.js', args]);
fork.stdout.on('data', function(data) {
// a finished transaction
// this fires most likely more than once
});
We basically need something like newrelic.createBackgroundTransaction() that can log a transaction immediately, without having to pass it a function to execute and time (I can do that myself).
Can I do something like this on the free tier of newrelic?

Resources