node js performance for the individual user - node.js

A lot has been said about the high performance of node js, when compared to the multi-threaded model. But from the perspective of a single user, an http request is made, a db operation is issued, and the user will have to wait till the response from the database arrives. Sure, the event loop can in the meantime listen to other requests, but how does the individual user feel the speed benefit when it is the IO operation that takes most of the time?

From de user perspective the performance will be the same. But imagine a site that receive millions of request, the server would stuck, because the resources of the server are limited. With node it is possible improve the power of the server, because node will optmize the way that resources will be accessed, using queues with events, saving memory usage and put to cpu to work heavy.

Related

Are Rails Thread.current variables isolated to a single user request?

I am using Thread.current to store a current user id so that I can see who did various updates to our database. However, after some usage in production, it is returning other user ids than those who could be updating this data. Locally and on lesser-used QA instances, the user ids saved are appropriate.
We are using Rails 5.1, ruby 2.5.1 with Puma. RAILS_MAX_THREADS=1, but we do have a RAILS_POOL_SIZE=5. Any ideas what might cause this issue or how to fix it? Specifically, does a single Thread.current variable last longer than a single user request?
Why would Thread.current be limited to a request?
The same thread(s) are used for multiple requests.
Threads aren't killed at the end of the request, they just picks up the next request from the queue (or wait for the a request to arrive in the queue).
It would be different if you used the Timeout middleware, since timeouts actually use a thread to count the passage of time (and stop processing)... but creating new threads per request introduces performance costs.
Sidenote
Depending on your database usage (blocking IO), RAILS_MAX_THREADS might need to be significantly higher. The more database calls / data you have the more time threads will spend blocking on database IO (essentially sleeping).
By limiting the thread pool to a single thread, you are limiting request concurrency in a significant way. The CPU could be handling other requests while waiting for the database to return the data.

Server constantly running a function to update a cache, will it block all other server functions?

About once a minute, I need to cache all orderbooks from various cryptocurrency exchanges. There are hundreds of orderbooks, so this update function will likely never stop running.
My question is: If my server is constantly running this orderbook update function, will it block all other server functionality? Will users ever be able to interact with my server?
Do I need to create a separate service to perform the updating, or can Node somehow prioritize API requests and pause the caching function?
My question is: If my server is constantly running this orderbook
update function, will it block all other server functionality? Will
users ever be able to interact with my server?
If you are writing asynchronously, these actions will go into your eventloop and your node server would pick next event from eventloop while these actions are being performed. If you have too many events like this, your event queue would be long and user would face really slow response or may even get a timeout
Do I need to create a separate service to perform the updating, or can
Node somehow prioritize API requests and pause the caching function?
Node only consumes event from the event queue. There are no priorities.
From the design perspective, you should look for options which can reduce this write load like bulkCreate/edit or if you are using redis for cache, consider redis pipeline
This is a very open ended question much of which depends on your system. In general your server should be able to handle concurrent requests, but there are some things to watch out for.
Performance costs. If the operation to retrieve and store data requires too much computational power, then it will cause strain on all requests processed by the server.
Database connections. The server spends a lot of time waiting for database queries to complete. If you have one database connection for the entire application, and this connection is busy, they will have to wait until the database connection is free. You may want to look into database connection 'pooling'.

Nodejs using child_process runs a function until it returns

I want this kind of structure;
express backend gets a request and runs a function, this function will get data from different apis and saves it to db. Because this could takes minutes i want it to run parallel while my web server continues to processing requests.
i want this because of this scenario:
user has dashboard, after logs in app starts to collect data from apis and preparing the dashboard for user, at that time user can navigate through site even can close the browser but the function has to run until it finishes fetching data.Once it finishes, all data will be saved db and the dashboard is ready for user.
how can i do this using child_process or any kind of structure in nodejs?
Since what you're describing is all async I/O (networking or disk) and is not CPU intensive, you don't need multiple child processes in order to effectively serve multiple requests. This is the beauty of node.js. With async I/O, node.js can be working on many different requests over the same period of time.
Let's supposed part of your process is downloading an image. Your node.js code sends a request to fetch an image. That request is sent off via TCP. Immediately, there is nothing else to do on that request. It's winging it's way to the destination server and the destination server is preparing the response. While all that is going on, your node.js server is completely free to pull other events from it's event queue and start working on other requests. Those other requests do something similar (they start async operations and then wait for events to happen sometime later).
Your server might get 10 different async operations started and "in flight" before the first one actually starts getting a response. When a response starts coming in, the system puts an event into the node.js event queue. When node.js has a moment between other requests, it pulls the next event out of the event queue and processes it. If the processing has further async operations (like saving it to disk), the whole async and event-driven process starts over again as node.js requests a write to disk and node.js is again free to serve other events. In this manner, events are pulled from the event queue one at a time as they become available and lots of different operations can all get worked on in the idle time between async operations (of which there is a lot).
The only thing that upsets the apple cart and ruins the ability of node.js to juggle lots of different things all at once is an operation that takes a lot of CPU cycles (like say some unusually heavy duty crypto). If you had something like that, it would "hog" too much of the CPU and the CPU couldn't be effectively shared among lots of other operations. If that were the case, then you would want to move the CPU-intensive operations to a group of child processes. But, just doing async I/O (disk, networking, other hardware ports, etc...) does not hog the CPU - in fact it barely uses much node.js CPU.
So, the next question is often "how do I know if I have too much stuff that uses the CPU". The only way to really know is to just code your server properly using async I/O and then measure its performance under load and see how things go. If you're doing async things appropriately and the CPU still spikes to 100%, then you have too much CPU load and you'll want to either use generic clustering or move specific CPU-heavy operations to a group of child processes.

What are the node.js patterns for returning response to client via worker process [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If I have want to handle client requests using a web server, messaging queue and x N worker processes, what are the usual patterns?
What I can think of are:
Worker take the job from queue, process it, save the result for client to poll. So there is no communication between the web server and worker
Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server(how to do this with express?), then web server sends it back to client. Meanwhile client will be waiting (is this still called a long polling?)
Web server return a response immediately, push the job to worker queue. Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server, then web server send it to clients via websocket. In this case the client doesn't have to.
Which of these are mostly common used? Or are there better solutions?
Worker take the job from queue, process it, save the result for client
to poll. So there is no communication between the web server and
worker
This is rare because typically, the client communicates only with the web server. So, unless you are going to also make easy worker be a web server on some additional port and enable CORS communication with them, this is typically not done.
Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server(how to do this with
express?), then web server sends it back to client. Meanwhile client
will be waiting (is this still called a long polling?)
This is relatively common in node.js if you have a compute intensive (e.g. CPU bound) operation because this is one way to allow node.js to scale better when you have compute intensive operations. It is probably more common to use the clustering module and just fire up N identical node.js processes and share the load among them just because that's probably easier to implement than separate worker processes that communicate back the result to the web server.
Web server return a response immediately, push the job to worker queue. Worker take the job from queue, process it, send it to a result
queue, which will be consumed by the web server, then web server send
it to clients via websocket. In this case the client doesn't have to.
Whether or not you communicate back to the client via a long running HTTP connection or via a webSocket depends upon a bunch of factors. If the compute time is long (e.g. multiple minutes, then you may run into browser timeout issues so it may be simpler to either have the client poll back in several minutes or use the webSocket). I would tend to not use a webSocket if this was the only thing it was being used for. But, if there were other reasons to have a webSocket for pushing notifications from server to client, I'd absolutely use it for this purpose.
Which of these are mostly common used? Or are there better solutions?
Clustering (which matches none of your options) is probably used the most. But, having a pool of worker processes that serve a queue of requests is a common design pattern in node.js if you have a lot of compute-intensive operations.
Keep in mind that if the operation is long running just because things like multiple database operations take awhile to complete, but these are mostly asynchronous operations, then you don't necessarily need workers at all. node.js scales just fine to lots of simultaneous connections if each one is just doing a series of asynchronous operations. So, you only typically need to go to clustering or worker processes when you have CPU intensive things to do (things that would block the node.js thread from being able to do anything else while they were running). A series of 20 asynchronous database operations that might take several minutes to complete does not need clustering of the main node.js process or workers in order to scale in node.js. In fact, it probably needs scaling of your database server more than anything else.
So, in order of how common, I'd say you'd see this:
Code with async I/O and you get good scalability for lots of simultaneous requests with a single node.js process.
Use node.js clustering to add a cluster per CPU. This allows to you at least maximize the utilization of your CPU for the CPU bound parts of your operations. You should obviously still code with asynchronous I/O to maximize scalability.
If you have specific operations that are either already implemented in external processes or which are CPU bound, then you can go with the queue and worker architecture. If things are CPU bound and you're using async I/O, then it won't really help to have many more workers than you have CPUs without going to multiple servers.
Whether to communicate back the result as a long running HTTP response or via a webSocket is a completely orthogonal question and depends more on how long the result takes (whether it's practical to use a long running HTTP response), whether you also want to communicate back progress along the way and whether you already have a webSocket connection for other reasons. This part of the question shouldn't drive the other choice about how to implement multiple processes.

How can NodeJS scale an enterprise application?

Suppose I have an enterprise Java application that basically does the following:
gather user input, query the backend databases (maybe multiple), run some algorithm (say do some in-memory calculation of the queried data sets to produce some statistics etc.), then return the data in some html pages.
My question is: If the bottleneck of the application is on the db query, how can NodeJS helps me in this scenarios since I still need to do all those post-db algorithm before I render the page? How an application architecture looks like?
Of course node can't speed up your storage layer and make that single request that's incurring so much backend processing satisfy that request any faster to the end user. But what it can do is not tie up a thread in the application server thread pool. The single thread can continue on it's loop while that work is going on and accept another request.
That other request might be a cheaper request that will return when it's work is done. That can also happen in an application server with a thread pool model ... that is unless all the threads in the thread pool model are tied up blocked on I/O requests - along with the overhead of each thread. The cheaper request will get queued waiting on a thread out of the thread pool because they are all blocking. Nodes single thread would loop and server the cheap request.
This works because node mandates that all I/O is async and the only work that blocks the loop is your code. That's why the saying "everything in node runs in parallel except your code". While it's possible to write async code in other application servers and achieve similar results, many offer non-async thread pool models where the coding is easier but sometimes less scalable.
For example, this hanselman post illustrates how asp.net is capable of doing async requests but it's not the common model that most have used.

Resources