Server constantly running a function to update a cache, will it block all other server functions? - node.js

About once a minute, I need to cache all orderbooks from various cryptocurrency exchanges. There are hundreds of orderbooks, so this update function will likely never stop running.
My question is: If my server is constantly running this orderbook update function, will it block all other server functionality? Will users ever be able to interact with my server?
Do I need to create a separate service to perform the updating, or can Node somehow prioritize API requests and pause the caching function?

My question is: If my server is constantly running this orderbook
update function, will it block all other server functionality? Will
users ever be able to interact with my server?
If you are writing asynchronously, these actions will go into your eventloop and your node server would pick next event from eventloop while these actions are being performed. If you have too many events like this, your event queue would be long and user would face really slow response or may even get a timeout
Do I need to create a separate service to perform the updating, or can
Node somehow prioritize API requests and pause the caching function?
Node only consumes event from the event queue. There are no priorities.
From the design perspective, you should look for options which can reduce this write load like bulkCreate/edit or if you are using redis for cache, consider redis pipeline

This is a very open ended question much of which depends on your system. In general your server should be able to handle concurrent requests, but there are some things to watch out for.
Performance costs. If the operation to retrieve and store data requires too much computational power, then it will cause strain on all requests processed by the server.
Database connections. The server spends a lot of time waiting for database queries to complete. If you have one database connection for the entire application, and this connection is busy, they will have to wait until the database connection is free. You may want to look into database connection 'pooling'.

Related

If Redis is single Threaded, how can it be so fast?

I'm currently trying to understand some basic implementation things of Redis. I know that redis is single-threaded and I have already stumbled upon the following Question: Redis is single-threaded, then how does it do concurrent I/O?
But I still think I didn't understood it right. Afaik Redis uses the reactor pattern using one single thread. So If I understood this right, there is a watcher (which handles FDs/Incoming/outgoing connections) who delegates the work to be done to it's registered event handlers. They do the actual work and set eg. their responses as event to the watcher, who transfers the response back to the clients. But what happens if a request (R1) of a client takes lets say about 1 minute. Another Client creates another (fast) request (R2). Then - since redis is single threaded - R2 cannot be delegated to the right handler until R1 is finished, right? In a multithreade environment you could just start each handler in a single thread, so the "main" Thread is just accepting and responding to io connections and all other work is carried out in own threads.
If it really just queues the io handling and handler logic, it could never be as fast it is. What am I missing here?
You're not missing anything, besides perhaps the fact that most operations in Redis complete in less than a ~millisecond~ couple of microseconds. Long running operations indeed block the server during their execution.
Let’s say if there were 10,000 users doing live data pulling with 10 seconds each on hmget, and on the other side, server were broadcasting using hmset, redis can only issue the set at the last available queue.
Redis is only good for queuing and handle limited processing like inserting lazy last login info, but not for live info broadcasting, in this case, memcached will be the right choice. Redis is single threaded, like FIFO.

Nodejs using child_process runs a function until it returns

I want this kind of structure;
express backend gets a request and runs a function, this function will get data from different apis and saves it to db. Because this could takes minutes i want it to run parallel while my web server continues to processing requests.
i want this because of this scenario:
user has dashboard, after logs in app starts to collect data from apis and preparing the dashboard for user, at that time user can navigate through site even can close the browser but the function has to run until it finishes fetching data.Once it finishes, all data will be saved db and the dashboard is ready for user.
how can i do this using child_process or any kind of structure in nodejs?
Since what you're describing is all async I/O (networking or disk) and is not CPU intensive, you don't need multiple child processes in order to effectively serve multiple requests. This is the beauty of node.js. With async I/O, node.js can be working on many different requests over the same period of time.
Let's supposed part of your process is downloading an image. Your node.js code sends a request to fetch an image. That request is sent off via TCP. Immediately, there is nothing else to do on that request. It's winging it's way to the destination server and the destination server is preparing the response. While all that is going on, your node.js server is completely free to pull other events from it's event queue and start working on other requests. Those other requests do something similar (they start async operations and then wait for events to happen sometime later).
Your server might get 10 different async operations started and "in flight" before the first one actually starts getting a response. When a response starts coming in, the system puts an event into the node.js event queue. When node.js has a moment between other requests, it pulls the next event out of the event queue and processes it. If the processing has further async operations (like saving it to disk), the whole async and event-driven process starts over again as node.js requests a write to disk and node.js is again free to serve other events. In this manner, events are pulled from the event queue one at a time as they become available and lots of different operations can all get worked on in the idle time between async operations (of which there is a lot).
The only thing that upsets the apple cart and ruins the ability of node.js to juggle lots of different things all at once is an operation that takes a lot of CPU cycles (like say some unusually heavy duty crypto). If you had something like that, it would "hog" too much of the CPU and the CPU couldn't be effectively shared among lots of other operations. If that were the case, then you would want to move the CPU-intensive operations to a group of child processes. But, just doing async I/O (disk, networking, other hardware ports, etc...) does not hog the CPU - in fact it barely uses much node.js CPU.
So, the next question is often "how do I know if I have too much stuff that uses the CPU". The only way to really know is to just code your server properly using async I/O and then measure its performance under load and see how things go. If you're doing async things appropriately and the CPU still spikes to 100%, then you have too much CPU load and you'll want to either use generic clustering or move specific CPU-heavy operations to a group of child processes.

node js performance for the individual user

A lot has been said about the high performance of node js, when compared to the multi-threaded model. But from the perspective of a single user, an http request is made, a db operation is issued, and the user will have to wait till the response from the database arrives. Sure, the event loop can in the meantime listen to other requests, but how does the individual user feel the speed benefit when it is the IO operation that takes most of the time?
From de user perspective the performance will be the same. But imagine a site that receive millions of request, the server would stuck, because the resources of the server are limited. With node it is possible improve the power of the server, because node will optmize the way that resources will be accessed, using queues with events, saving memory usage and put to cpu to work heavy.

What are the node.js patterns for returning response to client via worker process [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If I have want to handle client requests using a web server, messaging queue and x N worker processes, what are the usual patterns?
What I can think of are:
Worker take the job from queue, process it, save the result for client to poll. So there is no communication between the web server and worker
Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server(how to do this with express?), then web server sends it back to client. Meanwhile client will be waiting (is this still called a long polling?)
Web server return a response immediately, push the job to worker queue. Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server, then web server send it to clients via websocket. In this case the client doesn't have to.
Which of these are mostly common used? Or are there better solutions?
Worker take the job from queue, process it, save the result for client
to poll. So there is no communication between the web server and
worker
This is rare because typically, the client communicates only with the web server. So, unless you are going to also make easy worker be a web server on some additional port and enable CORS communication with them, this is typically not done.
Worker take the job from queue, process it, send it to a result queue, which will be consumed by the web server(how to do this with
express?), then web server sends it back to client. Meanwhile client
will be waiting (is this still called a long polling?)
This is relatively common in node.js if you have a compute intensive (e.g. CPU bound) operation because this is one way to allow node.js to scale better when you have compute intensive operations. It is probably more common to use the clustering module and just fire up N identical node.js processes and share the load among them just because that's probably easier to implement than separate worker processes that communicate back the result to the web server.
Web server return a response immediately, push the job to worker queue. Worker take the job from queue, process it, send it to a result
queue, which will be consumed by the web server, then web server send
it to clients via websocket. In this case the client doesn't have to.
Whether or not you communicate back to the client via a long running HTTP connection or via a webSocket depends upon a bunch of factors. If the compute time is long (e.g. multiple minutes, then you may run into browser timeout issues so it may be simpler to either have the client poll back in several minutes or use the webSocket). I would tend to not use a webSocket if this was the only thing it was being used for. But, if there were other reasons to have a webSocket for pushing notifications from server to client, I'd absolutely use it for this purpose.
Which of these are mostly common used? Or are there better solutions?
Clustering (which matches none of your options) is probably used the most. But, having a pool of worker processes that serve a queue of requests is a common design pattern in node.js if you have a lot of compute-intensive operations.
Keep in mind that if the operation is long running just because things like multiple database operations take awhile to complete, but these are mostly asynchronous operations, then you don't necessarily need workers at all. node.js scales just fine to lots of simultaneous connections if each one is just doing a series of asynchronous operations. So, you only typically need to go to clustering or worker processes when you have CPU intensive things to do (things that would block the node.js thread from being able to do anything else while they were running). A series of 20 asynchronous database operations that might take several minutes to complete does not need clustering of the main node.js process or workers in order to scale in node.js. In fact, it probably needs scaling of your database server more than anything else.
So, in order of how common, I'd say you'd see this:
Code with async I/O and you get good scalability for lots of simultaneous requests with a single node.js process.
Use node.js clustering to add a cluster per CPU. This allows to you at least maximize the utilization of your CPU for the CPU bound parts of your operations. You should obviously still code with asynchronous I/O to maximize scalability.
If you have specific operations that are either already implemented in external processes or which are CPU bound, then you can go with the queue and worker architecture. If things are CPU bound and you're using async I/O, then it won't really help to have many more workers than you have CPUs without going to multiple servers.
Whether to communicate back the result as a long running HTTP response or via a webSocket is a completely orthogonal question and depends more on how long the result takes (whether it's practical to use a long running HTTP response), whether you also want to communicate back progress along the way and whether you already have a webSocket connection for other reasons. This part of the question shouldn't drive the other choice about how to implement multiple processes.

Is this MEAN stack design-pattern suitable at the 1,000-10,000 user scale?

Let's say that when a user logs into a webapp, he sees a list of information.
Let's say that list of information is served by one of two dynos (via heroku), but that the list of information originates from a single mongo database (i.e., the nodejs dynos are just passing the mongo information to a user when he logs into the webapp).
Question: Suppose I want to make it possible for a user to both modify and add to that list of information.
At a scale of 1,000-10,000 users, is the following strategy suitable:
User modifies/adds to data; HTTP POST sent to one of the two nodejs dynos with the updated data.
Dyno (whichever one it may be) takes modification/addition of data and makes a direct query into the mongo database to update the data.
Dyno sends confirmation back to the client that the update was successful.
Is this OK? Would I have to likely add more dynos (heroku)? I'm basically worried that if a bunch of users are trying to access a single database at once, it will be slow, or I'm somehow risking corrupting the entire database at the 1,000-10,000 person scale. Is this fear reasonable?
Short answer: Yes, it's a reasonable fear. Longer answer, depends.
MongoDB will queue the responses, and handle them in the order it receives. Depending on how much of it is being served from memory, it may or maybe not be fast enough.
NodeJS has the same design pattern, where it will queue responses it doesn't process, and execute them when the resources become available.
The only way to tell if performance is being hindered is by monitoring it, and seeing if resources consistently hit a threshold you're uncomfortable with passing. On the upside, during your discovery phase your clients will probably only notice a few milliseconds of delay.
The proper way to implement that is to spin up a new instance as the resources get consumed to handle the traffic.
Your database likely won't corrupt, but if your data is important (and why would you collect it if it isn't?), you should be creating a replica set. I would probably go with a replica set of data before I go with a second instance of node.

Resources