so currently I have this function
app.get('/requestItems', function(req, res){
//Check if Bots are online
if(botQueue.length == 0){
//If there are no bots in the queue to take the order, then we can't process it.
console.log("Sorry no bots available");
res.send("No Bots available ATM");
return;
} else {
var currentBot = botQueue.shift();
}
requestItems(currentBot.offerInstance, steamIDtoTrade, itemID, userAccessToken);
eventEmitter.on('requestOfferExpired', function(){
console.log("Request offer has timed out/been cancelled");
res.send("Request offer has timed out/been cancelled");
botQueue.push(currentBot);
});
eventEmitter.on('requestOfferAccepted', function(){
console.log("Request offer has completed");
res.send("Request offer has completed");
botQueue.push(currentBot);
});
});
When I call it, It takes about 5 minutes to run. While its running, I can't seem to make requests to the URL. I know node is a single threaded, but is there a way to run it parrallelly/concurrently? Or do I simply need to switch up my design strategy?
EDIT: requestItems function : http://pastebin.com/Eif5CeEv
If the 5 minutes per request is 100% CPU of node.js running with no IO, then you will not get any other requests processed during that time and you will need to run multiple node processes (probably in a cluster) in order to process multiple requests during that time.
If, on the other hand, much of that 5 minutes is doing IO (disk, database or socket), then a proper async IO design will allow your node server to process many requests during that time.
Since it seems unlikely that you're number crunching the CPU for 5 minutes in a row with no IO, my guess is that you need to switch to a proper async IO design and then you can use the advantages of node.js and can have many different requests processing at the same time.
You will need to disclose a lot more about what is going on in that 5 minutes for us to help more specifically. Some questions about those 5 minutes:
What types of operations are taking the 5 minutes of time?
Are you using node.js async disk IO and async database IO in a way that allows other node.js events to flow while a given request is waiting for IO?
Is your request handling strategy designed to have multiple requests being processed at the same time?
Can you show us pieces of the code that are taking 5 minutes?
Related
I have a server-client based NODE.JS application.
server.js
...
socket.on('message', function(message) {
if(message.code == 103)
{
process_some_data()
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
client.js
.. sending responses back to server.js
the process_some_data() function takes about 4 seconds to complete, and when i have just one client it is not a problem, but if i have 10, they all choke and wait till the the last finishes.
I found out that the entire socket event waits till he finishes the current job, for example if i comment process_some_data(), it will not be frozen
I have tried 2 tweaks but the didn't worked :
...
socket.on('message', function(message) {
if(message.code == 103)
{
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
And even used http://caolan.github.io/async/ ,but no use :
...
socket.on('message', function(message) {
if(message.code == 103)
{
// Array to hold async tasks
var asyncTasks = [];
async.series([
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
], function (err, results) {
console.log(results);
});
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
How to make this ASYNC? Really need this.
Thank you.
You need multiple processes to solve this with Javascript, because Javascript engines are single-threaded.
What?
When it comes to handling I/O events, such as reading a socket, writing to a file or waiting for a signal, Javascript engines give the appearance of doing multiple things at the same time.
They are actually not: it's just that, under most conditions, processing these events takes so little computation, and the events themselves occur with so much time in between (a microsecond is an eternity for a CPU), that the engine can just process them one after another with plenty of time to spare.
In human time-scale, it looks like the engine is doing a lot of stuff in parallel, but it's just working serially at great speed.
No matter how you schedule your code to run, using setTimeout or Promise, it will still block other events from being processed during the time it's actively computing. Long-running computations (in the scale of seconds, instead of milliseconds) expose the single-threaded nature of the engine: it cannot actually do multiple things at the same time.
Multiple processes
Your computer, however, probably has multiple CPU cores. Unlike the Javascript engine, your hardware is capable of tackling multiple tasks at the same time, at least 1 per core. Even with a single core, your operating system can solve the problem if you run multiple processes.
Since a single Javascript process is single-threaded, you need multiple Javascript processes for this. An easy and time-proven architecture to solve your problem is this:
One Javascript program, running in one process, reads from the socket. Instead of calling process_some_data(), however, it puts all incoming messages in a queue.
This program then sends items from the queue to another Javascript program, running in a different process, that performs the computation using another CPU core. There may be multiple copies of this second process. In a modern computer, it makes sense to have twice as many active processes as you have CPU cores.
A simple approach for Node is to write an HTTP server, using express, that runs the computationally-intensive task. The main program can then use HTTP to delegate tasks to the workers, while still being able to read from the socket.
This is a good article on the topic of multi-processing with Node, using the cluster API.
I'm running a Node.js app with express and want to start increasing its performance. Several routes are defined. Let's have an basic example:
app.get('/users', function (req, res) {
User.find({}).exec(function(err, users) {
res.json(users);
}
});
Let's assume we have 3 clients A, B and C, who try to use this route. Their requests arrive on the server in the order A, B, C with 1 millisecond difference in between.
1. If I understand the node.js architecture correctly, every request will be immediately handled, because Users.find() is asynchronous and there is non-blocking code?
Let's expand this example with a synchronous call:
app.get('/users', function (req, res) {
var parameters = getUserParameters();
User.find({parameters}).exec(function(err, users) {
res.json(users);
}
});
Same requests, same order. getUserParameters() takes 50 milliseconds to complete.
2. A will enter the route callback-function and blocks the node.js thread for 50 milliseconds. B and C won't be able to enter the function and have to wait. When A finishes getUsersParameters() it will continue with the asynchronous User.find() function and B will now enter the route callback-function. C will still have to wait for 50 more milliseconds. When B enters the asynchronous function, C's requests can be finally handled. Taken together: C has to wait 50 milliseconds for A to finish, 50 milliseconds for B to finish and 50 milliseconds for itself to finish (for simplicity, we ignore the waiting time for the asynchronous function)?
Assuming now, that we have one more route, which is only accessible by an admin and will be called every minute through crontab.
app.get('/users', function (req, res) {
User.find({}).exec(function(err, users) {
res.json(users);
}
});
app.get('/admin-route', function (req, res) {
blockingFunction(); // this function takes 2 seconds to complete
});
3. When a request X hits admin-route and blockingFunction() is called, will A,B and C, who will call /users right after X's request have to wait 2 seconds until they even enter the route callback-function?
4. Should we make every self defined function, even if it takes only 4 milliseconds, as an asynchronous function with a callback?
The answer is "Yes", on #3: blocking means blocking the event loop, meaning that any I/O (like handling an HTTP request) will be blocked. In this case, the app will seem unresponsive for those 2 seconds.
However, you have to do pretty wild things for synchronous code to take 2 seconds (either very heavy calculations, or using a lot of the *Sync() methods provided by modules like fs). If you really can't make that code asynchronous, you should consider running it in a separate process.
Regarding #4: if you can easily make it asynchronous, you probably should. However, just having your synchronous function accept a callback doesn't miraculously make it asynchronous. It depends on what the function does if, and how, you can make it async.
The ground principle is anything locking up the CPU (long-running for loops for instance) or anything using I/O or the network must be asynchronous. You could also consider moving out CPU-intensive logic out of node JS, perhaps into a Java/Python module which exposes a WebService which node JS can call.
As an aside, take a look at this module (might not be production-ready). It introduces the concept of multithreading in NodeJS: https://www.npmjs.com/package/webworker-threads
#3 Yes
#4 Node.js is for async programming and hence its good to follow this approach to avoid surprises in performance
Meanwhile, you can use cluster module of Node.js to improve performance and throughput of your app.
You may need to scale your application vertically first. Check out Node.js cluster module. You may utilize all the cores of the machine by spawning up workers on each core. A cluster is a pool of similar workers running under a parent Node process. Workers are spawned using the fork() method of the child_processes module. This means workers can share server handles and use inter-process communication to communicate with the parent Node process.
var cluster = require('cluster')
var http = require('http')
var os = require('os')
var numCPUs = os.cpus().length
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork()
}
} else {
// Define express routes and listen here
}
When I start nodeJS application with below code and fire http://localhost:3222/ from my browser ,app takes more than 2000ms to return the response.
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end(fib(40).toString());
}).listen(3222, '127.0.0.1');
function fib(n) {
if (n <= 1) {
return n;
}
return fib(n-1) + fib(n-2);
}
then I tried to access the url concurrently using Apache bench , I noticed request are executed in serious manner rather than concurrent manner.
>ab -n 3 -c 1 http://localhost:3222/
1724ms (longest request)
Fired 3 request in serious manner ,each taken 1713ms on average and max time is 1724ms
>ab -n 3 -c 3 http://localhost:3222/
5195ms (longest request)
Fired 3 request in concurrent manner ,each taken 3483ms on average and max time is 5195ms.
Above timings clearly shows that requests are not executed concurrently by node server.
My questions is,how these kind of scenarios should be handled in production nodeJS applications?
Note: code given is just for representational purpose, I'm interested in knowing how to handle time consuming , complex business logic in nodeJS app with out blocking the main thread.
The problem with your test case is: Node.js is single threaded. Asynchronous functions are used for non-blocking I/O: you can wait for a database query or file write etc. to finish and do some other things meanwhile. But if you got some code that takes a long time to execute, all other actions are postponed until this code execution finishs.
You could start multiple instances of your node.js application and do some loadbalancing with haproxy or similar programs. But the only real solution would be to decouple the calculation from your request processing.
I am trying out simple NodeJS app so that I could to understand the async nature.
But my problem is as soon as I hit "/home" from browser it waits for response and simultaneously when "/" is hit, it waits for the "/home" 's response first and then responds to "/" request.
My concern is that if one of the request needs heavy processing, in parallel we can't request another one? Is this correct?
app.get("/", function(request, response) {
console.log("/ invoked");
response.writeHead(200, {'Content-Type' : 'text/plain'});
response.write('Logged in! Welcome!');
response.end();
});
app.get("/home", function(request, response) {
console.log("/home invoked");
var obj = {
"fname" : "Dead",
"lname" : "Pool"
}
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
response.writeHead(200, {'Content-Type' : 'application/json'});
response.write(JSON.stringify(obj));
response.end();
});
Good question,
Now, although Node.js has it's asynchronous nature, this piece of code:
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
Is not asynchronous actually blocking the node main thread. And therefore, all other requests has to wait until this big for loop will end.
In order to do some heavy calculations in parallel I recommend using setTimeout or setInterval to achieve your goal:
var i=0;
var interval = setInterval(function() {
if(i++>=999999999){
clearInterval(interval);
}
//do stuff here
},5);
For more information I recommend searching for "Node.js event loop"
As Stasel, stated, code running like will block the event loop. Basically whenever javascript is running on the server, nothing else is running. Asynchronous I/O events such as disk I/O might be processing in the background, but their handler/callback won't be call unless your synchronous code has finished running. Basically as soon as it's finished, node will check for pending events to be handled and call their handlers respectively.
You actually have couple of choices to fix this problem.
Break the work in pieces and let the pending events be executed in between. This is almost same as Stasel's recommendation, except 5ms between a single iteration is huge. For something like 999999999 items, that takes forever. Firstly I suggest batch process the loop for about sometime, then schedule next batch process with setimmediate. setimmediate basically will schedule it after the pending I/O events are handled, so if there is not new I/O event to be handled(like no new http requests) then it will executed immediately. It's fast enough. Now the question comes that how much processing should we do for each batch/iteration. I suggest first measure how much does it on average manually, and for schedule about 50ms of work. For example if you have realized 1000 items take 100ms. Then let it process 500 items, so it will be 50ms. You can break it down further, but the more broken down, the more time it takes in total. So be careful. Also since you are processing huge amount of items, try not to make too much garbage, so the garbage collector won't block it much. In this not-so-similar question, I've explained how to insert 10000 documents into MongoDB without blocking the event loop.
Use threads. There are actually a couple nice thread implementations that you won't shoot yourself in foot with them. This is really a good idea for this case, if you are looking for performance for huge processings, since it would be tricky as I said above to implement CPU bound task playing nice with other stuff happening in the same process, asynchronous events are perfect for data-bound task not CPU bound tasks. There's nodejs-threads-a-gogo module you can use. You can also use node-webworker-threads which is built on threads-a-gogo, but with webworker API. There's also nPool, which is a bit more nice looking but less popular. They all support thread pools and should be straight forward to implement a work queue.
Make several processes instead of threads. This might be slower than threads, but for huge stuff still way better than iterating in the main process. There's are different ways. Using processes will bring you a design that you can extend it to using multiple machines instead of just using multiple CPUs. You can either use a job-queue(basically pull the next from the queue whenever finished a task to process), a multi process map-reduce or AWS elastic map reduce, or using nodejs cluster module. Using cluster module you can listen to unix domain socket on each worker and for each job just make a request to that socket. Whenever the worker finished processing the job, it will just write back to that particular request. You can search about this stuff, there are many implementations and modules existing already. You can use 0MQ, rabbitMQ, node built-in ipc, unix domain sockets or a redis queue for multi process communications.
With Node.js, or eventlet or any other non-blocking server, what happens when a given request takes long, does it then block all other requests?
Example, a request comes in, and takes 200ms to compute, this will block other requests since e.g. nodejs uses a single thread.
Meaning your 15K per second will go down substantially because of the actual time it takes to compute the response for a given request.
But this just seems wrong to me, so I'm asking what really happens as I can't imagine that is how things work.
Whether or not it "blocks" is dependent on your definition of "block". Typically block means that your CPU is essentially idle, but the current thread isn't able to do anything with it because it is waiting for I/O or the like. That sort of thing doesn't tend to happen in node.js unless you use the non-recommended synchronous I/O functions. Instead, functions return quickly, and when the I/O task they started complete, your callback gets called and you take it from there. In the interim, other requests can be processed.
If you are doing something computation-heavy in node, nothing else is going to be able to use the CPU until it is done, but for a very different reason: the CPU is actually busy. Typically this is not what people mean when they say "blocking", instead, it's just a long computation.
200ms is a long time for something to take if it doesn't involve I/O and is purely doing computation. That's probably not the sort of thing you should be doing in node, to be honest. A solution more in the spirit of node would be to have that sort of number crunching happen in another (non-javascript) program that is called by node, and that calls your callback when complete. Assuming you have a multi-core machine (or the other program is running on a different machine), node can continue to respond to requests while the other program crunches away.
There are cases where a cluster (as others have mentioned) might help, but I doubt yours is really one of those. Clusters really are made for when you have lots and lots of little requests that together are more than a single core of the CPU can handle, not for the case where you have single requests that take hundreds of milliseconds each.
Everything in node.js runs in parallel internally. However, your own code runs strictly serially. If you sleep for a second in node.js, the server sleeps for a second. It's not suitable for requests that require a lot of computation. I/O is parallel, and your code does I/O through callbacks (so your code is not running while waiting for the I/O).
On most modern platforms, node.js does us threads for I/O. It uses libev, which uses threads where that works best on the platform.
You are exactly correct. Nodejs developers must be aware of that or their applications will be completely non-performant, if long running code is not asynchronous.
Everything that is going to take a 'long time' needs to be done asynchronously.
This is basically true, at least if you don't use the new cluster feature that balances incoming connections between multiple, automatically spawned workers. However, if you do use it, most other requests will still complete quickly.
Edit: Workers are processes.
You can think of the event loop as 10 people waiting in line to pay their bills. If somebody is taking too much time to pay his bill (thus blocking the event loop), the other people will just have to hang around waiting for their turn to come.. and waiting...
In other words:
Since the event loop is running on a single thread, it is very
important that we do not block it’s execution by doing heavy
computations in callback functions or synchronous I/O. Going over a
large collection of values/objects or performing time-consuming
computations in a callback function prevents the event loop from
further processing other events in the queue.
Here is some code to actually see the blocking / non-blocking in action:
With this example (long CPU-computing task, non I/O):
var net = require('net');
handler = function(req, res) {
console.log('hello');
for (i = 0; i < 10000000000; i++) { a = i + 5; }
}
net.createServer(handler).listen(80);
if you do 2 requests in the browser, only a single hello will be displayed in the server console, meaning that the second request cannot be processed because the first one blocks the Node.js thread.
If we do an I/O task instead (write 2 GB of data on disk, it took a few seconds during my test, even on a SSD):
http = require('http');
fs = require('fs');
buffer = Buffer.alloc(2*1000*1000*1000);
first = true;
done = false;
write = function() {
fs.writeFile('big.bin', buffer, function() { done = true; });
}
handler = function(req, res) {
if (first) {
first = false;
res.end('Starting write..')
write();
return;
}
if (done) {
res.end("write done.");
} else {
res.end('writing ongoing.');
}
}
http.createServer(handler).listen(80);
here we can see that the a-few-second-long-IO-writing-task write is non-blocking: if you do other requests in the meantime, you will see writing ongoing.! This confirms the well-known non-blocking-for-IO features of Node.js.