Node.js main thread blocking - node.js

i'm doing an artificial intelligence training and that method takes to long (5 minutes) and while my method is running my backend stay blocked waiting my method finish, is there any solution for it?
train: (answersIA, search, res) => {
const path = `api/IA/${search.id}`;
if (answersIA.length == 0) {
return null;
}
else {
answersIA = answersIA.filter(answer => answer.score);
answersIA = answersService.setOutput(answersIA);
answersIA = answersService.encodeAll(IAServiceTrain.adjustSize(answersIA));
net.train(answersIA, {
errorThresh: 0.005,
iterations: 100,
log: true,
logPeriod: 10,
learningRate: 0.3
});
mkdirp(path, (err) => {
if (err) throw 'No permission to create trained-net file';
fs.writeFileSync(`${path}/trained-net.js`, `${net.toFunction().toString()};`);
res.json({
message: 'Trained!'
});
});
}
}

Use child_process to perform training on a separate process. Here is how it can be implemented:
trainMaster.js
const fork = require('child_process').fork;
exports.train = (arguments, callback) => {
const worker = fork(`${__dirname}/trainWorker.js`);
worker.send(arguments);
worker.on('message', result => {
callback(result);
worker.kill();
});
}
trainWorker.js
process.on('message', arguments => {
const result = train(arguments)
process.send(result);
});
function train(arguments) { /* your training logic */ }
So when you call train on trainMaster.js from your main application it doesn't do actual training and doesn't block the event loop. Instead it creates a new process, waits for it to perform all the heavy lifting and then terminates it.
This approach should work fine if the number of simultaneous trainings is less than the number of CPU on your machine. From Node.js docs:
It is important to keep in mind that spawned Node.js child processes are independent of the parent with exception of the IPC communication channel that is established between the two. Each process has its own memory, with their own V8 instances. Because of the additional resource allocations required, spawning a large number of child Node.js processes is not recommended.
Otherwise you will need some different approach, like spreading workers across multiple machines and using message queue to distribute the work between them.

As https://stackoverflow.com/users/740553/mike-pomax-kamermans noted, Javascript is single threaded, meaning long running synchronous operations will block your thread while executing. It's not clear what library you're using to perform your training, but check to see if they have any asynchronous methods for training. Also, you're using the synchronous version of the fs.writeFile method, which again will block your thread while executing. To remedy this, use the async version:
fs.writeFile(`${path}/trained-net.js`, `${net.toFunction().toString()};`, function (err) {
if (err) return console.log(err);
console.log('File write completed');
}););

Related

how to run multiple infinite loops without blocking each other node.js?

I need to run multiple parallel tasks (infinite loops) without blocking each other in node.js. I'm trying now to do:
const test = async () => {
let a = new Promise(async res => {
while (true) {
console.log('test1')
}
})
let b = new Promise(async res => {
while (true) {
console.log('test2')
}
})
}
test();
But it does not work, only 'test1' appears in the console. What am I doing wrong?
You can't. You need to stop thinking in loops but start thinking in event loop instead.
There are two functions that can schedule functions to run in the future that can be used for this: setTimeout() and setInterval(). Note that in Node.js there is also setImmediate() but I suggest you avoid using it until you really know what you are doing because setImmediate() has to potential to block I/O.
Note that neither setTimeout() not setImmediate() are Promise aware.
The minimal code example that does what you want would be something like:
const test = () => {
setInterval(() => {
console.log('test1')
},
10 // execute the above code every 10ms
)
setInterval(() => {
console.log('test2')
},
10 // execute the above code every 10ms
)
}
test();
Or if you really want to run the two pieces of code as fast as possible you can pass 0 as the timeout. It won't run every 0ms but just as fast as the interpreter can. The minimal interval differs depending on OS and how busy your CPU is. For example Windows can run setInterval() down to every 1ms while Linux typically won't run any faster than 10ms. This is down to how fast the OS tick is (or jiffy in Linux terminology). Linux is a server oriented OS so sets its tick bigger to give it higher throughput (each process gets the CPU for longer thus can finish long tasks faster) while Windows is a UI oriented (some would say game oriented) OS so sets its tick smaller for smoother UI experience.
To get something closer to the style of code you want you can use a promisified setTimeout() and await it:
function delay (x) {
return new Promise((done, fail) => setTimeout(done,x));
}
const test = async () => {
let a = async (res) => {
while (true) {
console.log('test1')
await delay(0) // this allows the function to be async
}
}
let b = async (res) => {
while (true) {
console.log('test2')
await delay(0) // this allows the function to be async
}
}
a();
b();
}
test();
However, this is no longer the minimal possible working code, though not by much.
Note: After writing the promisified example above it suddenly reminded me of the programming style on early cooperative multitasking OSes. I think Windows3.1 did this though I never wrote anything on it. It reminds me of MacOS Classic. You had to periodically call the WaitNextEvent() function to pass control of the CPU back to the OS so that the OS can run other programs. If you forgot to do that (or your program gets stuck in a long loop with no WaitNextEvent() call) your entire computer would freeze. This is exactly what you are experiencing with node where the entire javascript engine "freezes" and executes only one loop while ignoring other code.

Do not process next job until previous job is completed (BullJS/Redis)?

Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &
I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.
EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.

Node.js library for executing long running background tasks

I have a architecture with a express.js webserver that accepts new tasks over a REST API.
Furthermore, I have must have another process that creates and supervises many other tasks on other servers (distributed system). This process should be running in the background and runs for a very long time (months, years).
Now the questions is:
1)
Should I create one single Node.js app with a task queue such as bull.js/Redis or Celery/Redis that basically launches this long running task once in the beginning.
Or
2)
Should I have two processes, one for the REST API and another daemon processes that schedules and manages the tasks in the distributed system?
I heavily lean towards solution 2).
Drawn:
I am facing the same problem now. as we know nodejs run in single thread. but we can create workers for parallel or handle functions that take some time that we don't want to affect our main server. fortunately nodejs support multi-threading.
take a look at this example:
const worker = require('worker_threads');
const {
Worker, isMainThread, parentPort, workerData
} = require('worker_threads');
if (isMainThread) {
module.exports = function parseJSAsync(script) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, {
workerData: script
});
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`));
});
});
};
} else {
const { parse } = require('some-js-parsing-library');
const script = workerData;
parentPort.postMessage(parse(script));
}
https://nodejs.org/api/worker_threads.html
search some articles about multi-threading in nodejs. but remember one here , the state cannot be shared with threads. you can use some message-broker like kafka, rabbitmq(my recommended), redis for handling such needs.
kafka is quite difficult to configure in production.
rabbitmq is good because you can store messages, queues and .., in local storage too. but personally I could not find any proper solution for load balancing these threads . maybe this is not your answer, but I hope you get some clue here.

Is making sequential HTTP requests a blocking operation in node?

Note that irrelevant information to my question will be 'quoted'
like so (feel free to skip these).
Problem
I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).
Now I am rather inexperienced with node, but I am aware of the
single-threaded nature of node. While I have read many times that node
isn’t built for CPU-bound tasks, I didn’t really understand what that
meant until now. If I have a correct understanding of what’s going on,
this means that what I currently have (in development) is in no way
going to scale to even more than 10 clients.
Question
Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.
Epilogue
If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).
Other closing thoughts
I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).
Thanks and all the upvotes to anyone who can help me with my problem!
The code I mentioned earlier:
var async = require('async');
var request = require('request');
...
async.waterfall([
function(cb) {
console.time('1');
request(someUrl1, function(err, res, body) {
// load and parse the given web page.
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
console.timeEnd('1');
console.time('2');
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
// more computation
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
console.timeEnd('2');
console.time('3');
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
// do more parsing + computation
// make another callback with the results
});
},
function(moreParameters, cb) {
console.timeEnd('3');
console.time('4');
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
// make final callback after some more computation.
//This part takes about ~1s to complete
});
}
], function (err, result) {
console.timeEnd('4'); //
res.status(200).send();
});
Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.
However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.
Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:
var req = require('request');
var sync = require('sync-request');
// Load example.com N times (yes, it's a real website):
var N = 10;
console.log('BLOCKING test ==========');
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
var res = sync('GET','http://www.example.com')
console.log('Downloaded ' + res.getBody().length + ' bytes');
}
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
console.log('NON-BLOCKING test ======');
var loaded = 0;
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
req('http://www.example.com',function( err, response, body ) {
loaded++;
console.log('Downloaded ' + body.length + ' bytes');
if (loaded == N) {
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
}
})
}
Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.
Additional answer:
You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:
async.waterfall([
function(cb) {
request(someUrl1, function(err, res, body) {
console.time('1');
// load and parse the given web page.
console.timeEnd('1');
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
console.time('2');
// more computation
console.timeEnd('2');
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
console.time('3');
// do more parsing + computation
console.timeEnd('3');
// make another callback with the results
});
},
function(moreParameters, cb) {
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
console.time('4');
// some more computation.
console.timeEnd('4');
// make final callback
});
}
], function (err, result) {
res.status(200).send();
});
Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.
Your code is non-blocking because it uses non-blocking I/O with the request() function. This means that node.js is free to service other requests while your series of http requests is being fetched.
What async.waterfall() does it to order your requests to be sequential and pass the results of one on to the next. The requests themselves are non-blocking and async.waterfall() does not change or influence that. The series you have just means that you have multiple non-blocking requests in a row.
What you have is analogous to a series of nested setTimeout() calls. For example, this sequence of code takes 5 seconds to get to the inner callback (like your async.waterfall() takes n seconds to get to the last callback):
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
// it takes 5 seconds to get here
}, 1000);
}, 1000);
}, 1000);
}, 1000);
}, 1000);
But, this uses basically zero CPU because it's just 5 consecutive asynchronous operations. The actual node.js process is involved for probably no more than 1ms to schedule the next setTimeout() and then the node.js process literally could be doing lots of other things until the system posts an event to fire the next timer.
You can read more about how the node.js event queue works in these references:
Run Arbitrary Code While Waiting For Callback in Node?
blocking code in non-blocking http server
Hidden threads in Javascript/Node that never execute user code: is it possible, and if so could it lead to an arcane possibility for a race condition?
How does JavaScript handle AJAX responses in the background? (written about the browser, but concept is the same)
If I have a correct understanding of what’s going on, this means that
what I currently have (in development) is in no way going to scale to
even more than 10 clients.
This is not a correct understanding. A node.js process can easily have thousands of non-blocking requests in flight at the same time. Your sequentially measured time is only a start to finish time - it has nothing to do with CPU resources or other OS resources consumed (see comments below on non-blocking resource consumption).
I still have concerns about using node for this particular
application then. I'm worried about how it will scale considering that
the work it is doing is not simple I/O but computationally intensive.
I feel as though I should switch to a platform that enables
multi-threading. Does what I'm asking/the concern I'm expressing make
sense? I could just be spitting total BS and have no idea what I'm
talking about.
Non-blocking I/O consumes almost no CPU (only a little when the request is originally sent and then a little when the result arrives back), but while the compmuter is waiting for the remove result, no CPU is consumed at all and no OS thread is consumed. This is one of the reasons that node.js scales well for non-blocking I/O as no resources are used when the computer is waiting for a response from a remove site.
If your processing of the request is computationally intensive (e.g. takes a measurable amount of pure blocking CPU time to process), then yes you would want to explore getting multiple processes involved in running the computations. There are multiple ways to do this. You can use clustering (so you simply have multiple identical node.js processes each working on requests from different clients) with the nodejs clustering module. Or, you can create a work queue of computationally intensive work to do and have a set of child processes that do the computationally intensive work. Or, there are several other options too. This not the type of problem that one needs to switch away from node.js to solve - it can be solved using node.js just fine.
You can use queue to process concurrent http calls in nodeJs
https://www.npmjs.com/package/concurrent-queue
var cq = require('concurrent-queue');
test_queue = cq();
// request action method
testQueue: function(req, res) {
// queuing each request to process sequentially
test_queue(req.user, function (err, user) {
console.log(user.id+' done');
res.json(200, user)
});
},
// Queue will be processed one by one.
test_queue.limit({ concurrency: 1 }).process(function (user, cb) {
console.log(user.id + ' started')
// async calls will go there
setTimeout(function () {
// on callback of async, call cb and return response.
cb(null, user)
}, 1000);
});
Please remember that it needs to implement for sensitive business calls where the resource needs to be accessed or update at a time by one user only.
This will block your I/O and make your users to wait and response time will be slow.
Optimization:
You can make it faster and optimize it by creating resource dependent queue. So that the there is a separate queue for each shared resource and synchronous calls for same resource can only be execute for same resource and for different resources the calls will be executed asynchronously
Let suppose that you want to implement that on the base of current user. So that for the same user http calls can only execute synchronously and for different users the https calls will be asynchronous
testQueue: function(req, res) {
// if queue not exist for current user.
if(! (test_queue.hasOwnProperty(req.user.id)) ){
// initialize queue for current user
test_queue[req.user.id] = cq();
// initialize queue processing for current user
// Queue will be processed one by one.
test_queue[req.user.id].limit({ concurrency: 1 }).process(function (task, cb) {
console.log(task.id + ' started')
// async functionality will go there
setTimeout(function () {
cb(null, task)
}, 1000)
});
}
// queuing each request in user specific queue to process sequentially
test_queue[req.user.id](req.user, function (err, user) {
if(err){
return;
}
res.json(200, user)
console.log(user.id+' done');
});
},
This will be fast and block I/O for only that resource for which you want.

child_process spawn Race condition possibility in nodejs

I'm starting to learn and use node and I like it but I'm not really sure how certain features work. Maybe you can help me resolve one such issue:
I want to spawn local scripts and programs from my node server upon rest commands. looking at the fs library I saw the example below of how to spawn a child process and add some pipes/event handlers on it.
var spawn = require('child_process').spawn,
ps = spawn('ps', ['ax']),
grep = spawn('grep', ['ssh']);
ps.stdout.on('data', function (data) {
grep.stdin.write(data);
});
ps.stderr.on('data', function (data) {
console.log('ps stderr: ' + data);
});
ps.on('close', function (code) {
if (code !== 0) {
console.log('ps process exited with code ' + code);
}
grep.stdin.end();
});
grep.stdout.on('data', function (data) {
console.log('' + data);
});
grep.stderr.on('data', function (data) {
console.log('grep stderr: ' + data);
});
grep.on('close', function (code) {
if (code !== 0) {
console.log('grep process exited with code ' + code);
}
});
What's weird to me is that I don't understand how I can be guaranteed that the event handler code will be registered before the program starts to run. It's not like there's a 'resume' function that you run to start up the child. Isn't this a race condition? Granted the condition would be minisculy small and would almost never hit because its such a short snipping of code afterward but still, if it is I'd rather not code it this way out of good habits.
So:
1) if it's not a race condition why?
2) if it is a race condition how could I write it the right way?
Thanks for your time!
Given the slight conflict and ambiguity in the accepted answer's comments, the sample and output below tells me two things:
The child process (referring to the node object returned by spawn) emits no events even though the real underlying process is live / executing.
The pipes for the IPC are setup before the child process is executed.
Both are obvious. The conflict is w.r.t. interpretation of the OP's question:-
Actually 'yes', this is the epitome of a data race condition if one needs to consider the real child process's side effects. But 'no', there's no data race as far as IPC pipe plumbing is concerned. The data is written to a buffer and retrieved as a (bigger) blob as and when (as already well described) the context completes allowing the event loop to continue.
The first data event seen below pushes not 1 but 5 chunks written to stdout by the child process whilst we were blocking.. thus nothing is lost.
sample:
let t = () => (new Date()).toTimeString().split(' ')[0]
let p = new Promise(function (resolve, reject) {
console.log(`[${t()}|info] spawning`);
let cp = spawn('bash', ['-c', 'for x in `seq 1 1 10`; do printf "$x\n"; sleep 1; done']);
let resolved = false;
if (cp === undefined)
reject();
cp.on('error', (err) => {
console.log(`error: ${err}`);
reject(err);
});
cp.stdout.on('data', (data) => {
if (!resolved) {
console.log(`[${t()}|info] spawn succeeded`);
resolved = true;
resolve();
}
process.stdout.write(`[${t()}|data] ${data}`);
});
let ts = parseInt(Date.now() / 1000);
while (parseInt(Date.now() / 1000) - ts < 5) {
// waste some cycles in the current context
ts--; ts++;
}
console.log(`[${t()}|info] synchronous time wasted`);
});
Promise.resolve(p);
output:
[18:54:18|info] spawning
[18:54:23|info] synchronous time wasted
[18:54:23|info] spawn succeeded
[18:54:23|data] 1
2
3
4
5
[18:54:23|data] 6
[18:54:24|data] 7
[18:54:25|data] 8
[18:54:26|data] 9
[18:54:27|data] 10
It is not a race condition. Node.js is single threaded and handles events on a first come first serve basis. New events are put at the end of the event loop. Node will execute your code in a synchronous manner, part of which will involve setting up event emitters. When these event emitters emit events, they will be put to the end of the queue, and will not be handled until Node finishes executing whatever piece of code its currently working on, which happens to be the same code that registers the listener. Therefore, the listener will always be registered before the event is handled.

Resources