I want to make a bunch of post requests at once from my node.js script I have the same payload for every one of the requests and would want to simply spin them all off at once in some kind of loop and then make them write to a DB upon completion.
Sample Code to demonstrate the logic:
for (var i = 0; i < 10; i++) {
makePostRequest(updateDB);
}
function makePostRequest (callback) {
http.post("test", function (result) {callback(result)});
}
function updateDB(result) {
db.put({result: result});
}
The question is, is it better to do this using multithreading or does the fact that node will run these post request asynchronously take care of it and result in the same performance? In other words, will these post requests all happen asynchronously in a same way you would expect a multhreaded program behave and result in the same performance?
My goal is to minimize execution time.
I think the improvement in execution time you'll see from multi-threading through child processes here will be minimal. As you said, the request are already executed asynchronously.
In fact, you may see a decrease in performance as n, the number of concurrent requests, increases.
To be sure, you could run a benchmark.
In your example, you send 10 requests in parallel. I think that this number is small and will not cause a problem.
There are some things that you have to pay attention to.
There is a maximum of concurrent outgoing connections for your server
How many concurrent requests is the remote server able to serve without slowing down?
If you launch the requests with a for loop, the for loop runs to completion before the requests are effectively sent out.
If you have to launch a really big number of requests, they will be faster completed, if you limit the maximal number of concurrent requests. Also think about about throttling the launch, by using setTimeout between calls to makePostRequest.
I don't know what are the thresholds where you need to take these precautions.
I decided to go with the logic that I described earlier in the question. Reasoning being is that node runs its async calls separately on a different thread so that its not blocking the main thread. his means that running POST async will behave similarly to what I wanted from multithreading
Related
As Node is using java-script engine internally it means Node has only one thread but i was checking online,some people saying two thread some saying one thread per core and some are saying something else,Can anyone clarify?
And if it is single threaded then how many request it can handle concurrently because if 1000 request are coming at the same time and say each request taking 100ms then the one at the 1001th will take 100 sec (100×1000=100,000 ms=100 sec).
So it means Node is not good for large number of user?
Based on your writing, I can tell you know something about programming, some logic and, some basic math, but you are getting into javascript (not java-script) now.
That background makes you an excellent fit for this new javascript on the server-side with nodejs, and I foresee you becoming great at it.
What is clear to me is that you are confusing parallelism with concurrency and this post here is going to be very useful for you.
But a TLDR could be something like this:
Parallelism: 2 or more Process/Threads running at the same time
Concurrency: The main single Process/Thread will not be waiting for an IO operation to end, it will keep doing other things and get back to it whenever the IO operation ends
IO (Input/Output) operations involve interactions with the OS (Operating System) by using the DISK or the NETWORK, for example.
In nodejs, those tasks are asynchronous, and that's why they ask for a callback, or they are Promise based.
const fs = require('fs')
function myCallbackFunc (err, data) {
if (err) {
return console.error(err)
}
console.log(data)
}
fs.readFile('./some-large-file', myCallbackFunc)
fs.readFile('./some-tiny-file', myCallbackFunc)
A simple way you could theoretically say that in the example above you'll have a
main thread taking care of your code and another one (which you don't control at all) observing the asynchronous requests, and the second one will call the myCallBackFunc whenever the IO process, which will happen concurrently, ends.
So YES, nodejs is GREAT for a large number of requests.
Of course, that single process will still share the same computational power with all the concurrent requests it is taking care.
Meaning that if within the callback above you are doing some heavy computational tasks that take a while to execute, the second call would have to wait for the first one to end.
In that case, if you are running it on your own Server/Container, you can make use of real parallelism by forking the process using the Cluster native module that already comes with nodejs :D
Node has 1 main thread(that's why called single threaded) which receives all request and it then give that request to internal threads(these are not main thread these are smaller threads) which are present in thread pool. Node js put all its request in event queue. And node server continuously runs internal event loop which checks any request is placed in Event Queue. If no, then wait for incoming requests for indefinitely else it picks one request from event queue.
And checks Threads availability from Internal Thread Pool if it's available then it picks up one Thread and assign this request to that thread.
That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop
Event Loop in turn, sends that Response to the respective Client.
Click here for Node js architecture image
In nodejs the main critics are based on its single threaded event loop model.
The biggest disadvantage of nodejs is that one can not perform CPU intensive tasks in the application. For demonstration purpose, lets take the example of a while loop (which is perhaps analogous to a db function returning hundred thousand of records and then processing those records in nodejs.)
while(1){
x++
}
Such sort of the code will block the main stack and consequently all other tasks waiting in the Event Queue will never get the chance to be executed. (and in a web Applications, new users will not be able to connect to the App).
However, one could possibly use module like cluster to leverage the multi core system and partially solve the above issue. The Cluster module allows one to create a small network of separate processes which can share server ports, which gives the Node.js application access to the full power of the server. (However, one of the biggest disadvantage of using Cluster is that the state cannot be maintained in the application code).
But again there is a high possibility that we would end up in the same situation (as described above) again if there is too much server load.
When I started learning the Go language and had a look at its architecture and goroutines, I thought it would possibly solve the problem that arises due to the single threaded event loop model of nodejs. And that it would probably avoid the above scenario of CPU intensive tasks, until I came across this interesting code, which blocks all of the GO application and nothing happens, much like a while loop in nodejs.
func main() {
var x int
threads := runtime.GOMAXPROCS(0)
for i := 0; i < threads; i++ {
go func() {
for { x++ }
}()
}
time.Sleep(time.Second)
fmt.Println("x =", x)
}
//or perhaps even if we use some number that is just greater than the threads.
So, the question is, if I have an application which is load intensive and there would be lot of CPU intensive tasks as well, I could probably get stuck in the above sort of scenario. (where db returns numerous amount of rows and then the application need to process and modify some thing in those rows). Would not the incoming users would be blocked and so would all other tasks as well?
So, how could the above problem be solved?
P.S
Or perhaps, the use cases I have mentioned does not make much of the sense? :)
Currently (Go 1.11 and earlier versions) your so-called
tight loop will indeed clog the code.
This would happen simply because currently the Go compiler
inserts code which does "preemption checks" («should I yield
to the scheduler so it runs another goroutine?») only in
prologues of the functions it compiles (almost, but let's not digress).
If your loop does not call any function, no preemption checks
will be made.
The Go developers are well aware of this
and are working on eventually alleviating this issue.
Still, note that your alleged problem is a non-issue in
most real-world scenarious: the code which performs long
runs of CPU-intensive work without calling any function
is rare and far in between.
In the cases, where you really have such code and you have
detected it really makes other goroutines starve
(let me underline: you have detected that through profiling—as
opposed to just conjuring up "it must be slow"), you may
apply several techniques to deal with this:
Insert calls to runtime.Gosched() in certain key points
of your long-running CPU-intensive code.
This will forcibly relinquish control to another goroutine
while not actually suspending the caller goroutine (so it will
run as soon as it will have been scheduled again).
Dedicate OS threads for the goroutines running
those CPU hogs:
Bound the set of such CPU hogs to, say, N "worker goroutines";
Put a dispatcher in front of them (this is called "fan-out");
Make sure that N is sensibly smaller than runtime.GOMAXPROCS
or raise the latter so that you have those N extra threads.
Shovel units of work to those dedicated goroutines via the dispatcher.
At times my app does need to do some blocking operations (simple math calculations). As a result other requests will be blocked till the calculation is completed. Is there a way to count the number of requests that are waiting ?
EDIT: Adding more details - The blocking operations are very small (perhaps just simple addition of numbers, or loops of additions/ multiplication Eg: var foo = 3 + y;), but the point is that anything that is not IO, is probably a blocking operation. When there aren't many users accessing the site, this may not be noticeable, because the calculation is very quick. But as more users access the site, the experience gets much worse.
EDIT: If this feature isn't there, I think it would be useful. Because then we would know when to start up a new server (e.g.: an AWS server or other cloud provider). Running at full capacity is ok if we can meet all the requests, but not ok if requests start to pile up. XD
You can not count the number of waiting requests, because they are really just queued instructions, and by time your code to "count" ran, the queue would be empty again. JavaScript is single threaded, so unless you're running async, you'll never even get swapped-in in time.
If you want to manage this, implement your own queueing mechanism, and manage the work yourself. The async package can provide this for you.
First of all, I am starter trying to understand what is Node.Js. I have two questions.
First Question
From the article of Felix, it said "there can only be one callback firing at the same time. Until that callback has finished executing, all other callbacks have to wait in line".
Then, consider about the following code (copied from nodejs official website)
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
If two client requests are received simultaneously, it means the following workflow:
First http request event received, Second request event received.
As soon as first event received, callback function for first event is executing.
At while, callback function for second event has to be waiting.
Am I right? If I am right, how Node.js control if there are thousands of client request within very short-time duration.
Second Question
The term "Event Loop" is mostly used in Node.js topic. I have understood "Event Loop" as the following from http://www.wisegeek.com/what-is-an-event-loop.htm;
An event loop - or main loop, is a construct within programs that
controls and dispatches events following an initial event.
The initial event can be anything, including pushing a button on a
keyboard or clicking a button on a program (in Node.js, I think the
initial events will be http request, db queries or I/O file access).
This is called a loop, not because the event circles and happens
continuously, but because the loop prepares for an event, checks the
event, dispatches an event and repeats the process all over again.
I have a conflict about second paragraph especially the phrase"repeats the process all over again". I accepted that the above http.createServer code from above question is absolutely "event loop" because it repeatedly listens the http request events.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
database.query("SELECT * FROM table", function(rows) {
var result = rows;
});
Please, let me hear your opinions and answers.
Answer one, your logic is correct: second event will wait. And will execute as far as its queued callbacks time comes.
As well, remember that there is no such thing as "simultaneously" in technical world. Everything have very specific place and time.
The way node.js manages thousands of connections is that there is no need to hold thread idling while there is some database call blocking the logic, or another IO operation is processing (like streams for example). It can "serve" first request, maybe creating more callbacks, and proceed to others.
Because there is no way to block the execution (except nonsense while(true) and similar), it becomes extremely efficient in spreading actual resources all over application logic.
Threads - are expensive, and server capacity of threads is directly related to available Memory. So most of classic web applications would suffer just because RAM is used on threads that are simply idling while there is database query block going on or similar. In node that's not a case.
Still, it allows to create multiple threads (as child_process) through cluster, that expands even more possibilities.
Answer Two. There is no such thing as "loop" that you might thinking about. There will be no loop behind the scenes that does checks if there is connections or any data received and so on. It is nowadays handled by Async methods as well.
So from application point of view, there is no 'main loop', and everything from developer point of view is event-driven (not event-loop).
In case with http.createServer, you bind callback as response to requests. All socket operations and IO stuff will happen behind the scenes, as well as HTTP handshaking, parsing headers, queries, parameters, and so on. Once it happens behind the scenes and job is done, it will keep data and will push callback to event loop with some data. Once event loop ill be free and will come time it will execute in node.js application context your callback with data from behind the scenes.
With database request - same story. It ill prepare and ask stuff (might do it even async again), and then will callback once database responds and data will be prepared for application context.
To be honest, all you need with node.js is to understand the concept, but not implementation of events.
And the best way to do it - experiment.
1) Yes, you are right.
It works because everything you do with node is primarily I/O bound.
When a new request (event) comes in, it's put into a queue. At initialization time, Node allocates a ThreadPool which is responsible to spawn threads for I/O bound processing, like network/socket calls, database, etc. (this is non-blocking).
Now, your "callbacks" (or event handlers) are extremely fast because most of what you are doing is most likely CRUD and I/O operations, not CPU intensive.
Therefore, these callbacks give the feeling that they are being processed in parallel, but they are actually not, because the actual parallel work is being done via the ThreadPool (with multi-threading), while the callbacks per-se are just receiving the result from these threads so that processing can continue and send a response back to the client.
You can easily verify this: if your callbacks are heavy CPU tasks, you can be sure that you will not be able to process thousands of requests per second and it scales down really bad, comparing to a multi-threaded system.
2) You are right, again.
Unfortunately, due to all these abstractions, you have to dive in order to understand what's going on in background. However, yes, there is a loop.
In particular, Nodejs is implemented with libuv.
Interesting to read.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
Event-driven is a term you normally use when there is an event-loop, and it means an app that is driven by events such as click-on-button, data-arrived, etc. Normally you associate a callback to such events.
Have been following this fabulous tutorial. Being new to Javascript and functional programming, I wanted to understand what non-blocking essentially means. I intentionally added a "sleep" of 10 seconds in my JS code, to achieve blocking behavior.
function route(pathname, handle)
{
console.log("About to route a request for :"+pathname);
if(typeof handle[pathname]==='function')
{
handle[pathname]();
}
else
{
console.log("No request handler for "+pathname);
}
sleep(10000);
console.log("Exiting router");
}
function sleep(milliSeconds)
{
var startTime = new Date().getTime(); // get the current time
while (new Date().getTime() < startTime + milliSeconds); // hog cpu
}
exports.route=route;
This code is being used as a callback from another "server" script, which I am calling from a browser. I expected that once I fire simultaneous 100 requests to my server script, I would get parallel 100 responses after 10 seconds. But this code runs through the request one by one. This certainly fails the philosophy behind node.js right ?? This doesn't even happen when I do such bad code in a Java servlet and run on Tomcat !
Another observation in this scenario, was that the requests were not handled chronologically - they are executed randomly. This doesn't sound good to me !!
I believe there is some issue with my code - please help me understand the concepts here, with the answers to my 2 queries (the other one on chronology).
Thanks !
I expected that once I fire simultaneous 100 requests to my server script, I would get parallel 100 responses after 10 seconds. But this code runs through the request one by one.
Yes. Node is strictly single-threaded so each request will be run in serial. There is no parallelism in the JavaScript code (although the underlying I/O subsystem of the computer may be doing things in parallel).
This certainly fails the philosophy behind node.js right??
No. The philosophy of node.js is to execute your event handlers as quickly as possible once I/O events are ready for action.
Note that your "sleep" function doesn't really sleep, instead it pegs the CPU - since node is single-threaded all other actions will block on the CPU crunching code - the same would happen if your code was doing some actual CPU intensive actions. However, if your code was instead performing I/O operations (and it was designed properly), then node will schedule other actions around your I/O blocking code. Think of it this way - node.js prevents code from blocking on I/O, not from blocking on CPU usage. If your code is CPU intensive and you're worried about it blocking other handlers then you must design it in a way to yield to the event loop to let other handlers run.
Another observation in this scenario, was that the requests were not handled chronologically - they are executed randomly.
Yes, this is possible. You must remember that node.js is essentially just a way to attach "event handlers" to I/O events of interest. These I/O events are triggered by actions in the underlying operating system (e.g. a socket connection has been established, a file read has completed, etc.) and node.js calls your handlers in response.
Since the operating system does its own "internal bookkeeping" about when things actually happen and when it thinks they are available for "user space" processes, there may be a difference between when you expect them to happen and when the computer says they actually happen. Moreover, I don't think that node (or perhaps even the OS) guarantee "fairness" when scheduling events.