My scenario is pretty common:
User has made a request with a pretty intensive job. This might take 1 hour or more, so we don't want to block other requests.
What are my options if the server has only 1 CPU and 512MB of RAM? What if there were made another 3 similar requests? Can Node.js handle such case?
Note: It's fine if user will have to wait for the results one day or more.
Note 2: I am hosting my app on Heroku (Hobby Plan).
Related
I have a very simple express server with 1 endpoint listening on a certain port.
I'd like a separate scheduled script to make an API call every 10 minutes and save the data locally (in memory).
When the endpoint on my server is hit, I'd like to serve the locally cached data from memory.
How can I make sure that the scheduled cron job does not ever block the processing of requests coming in i.e. both continue to happen at the same time.
I've read about child processes and worker threads, but I'm not sure if this would be a good use case for it.
JavaScript is single-threaded, so worker threads and forking notwithstanding, every line of JavaScript code that executes blocks other lines of code from executing. Only one can execute at a time. So forgetting about the scheduled job for a moment, in an express server, multiple requests happening concurrently will compete with each other for CPU resources and block each other.
However, Node.js has asynchronous I/O. A typical request often looks a bit like this:
Receive request and run a little bit of JavaScript code (maybe 1ms).
Query an external REST API (let's say 50ms)
Run a little more JavaScript code and send a response (maybe 1ms).
In steps 1 and 3, all other requests are blocked from executing JavaScript code. They have to wait until the current request is done executing its code. In step 2 however, other requests are free to execute JavaScript code, one after the other, since the network request is non-blocking.
I explain all this because it sounds like your scheduled job might go through the same three steps mentioned above. If that's the case, then functionally it's not going to block anything any more than simultaneous requests to the server are already blocking each other, and there's little reason to worry about threading or multi-processing unless your server is under such heavy load that it cannot serve all incoming requests.
So this is not a good use case for child processes and worker threads unless the scheduled job has different characteristics than I'm assuming (for example if it spends multiple seconds crunching heavy computations in JavaScript, that would noticeably impact request response time when it runs, and might make sense to break out into a separate process or worker thread).
I want to ask some clarifying questions about NodeJS, which I think are poorly described on the resources I studied.
Many sources say that NodeJS is not suitable for complex calculations, because it is single-threaded and queries are executed sequentially
I created the simplest server on Node and wrote an endpoint that executes a request for about 10 seconds (a cycle). Next, I made 10 consecutive requests via Postman, and indeed, each subsequent request began execution only after the previous one gave a response.
Do I understand correctly that in this case, if the execution time of one endpoint is approximately 300ms, and 700 users will access the server at the same time, then the waiting time for the last user will be a critical 210,000 ms?
I also heard that the advantage of NodeJS is the ability to support a large number of simultaneous connections, then what does this mean and why is it a plus if the answer for the last person from the last question will still be very long
Another statement I came across is that libuv allows you to do many I/O operations at the same time, how does it work if NodeJS processes requests sequentially anyway?
Thank you very much!
TL;DR: I/O operations don't block the single execution thread. CPU intensive tasks DO block the thread and a NodeJS web server is not a good option in that case.
Yes, if your endpoint needs 300ms of synchronous work (cpu) to complete the operation, the last user will wait 210,000ms.
NodeJS is good at handling a large number of connections when the work it needs to do is i/o bound. It is not a good choice if the endpoint needs a lot of CPU time.
I/O operations operate at a different layer and take ZERO CPU time. That means that once the I/O operation is fired, NodeJS can accept new calls to the endpoint. NodeJS then polls the Operating System for completed I/O calls whenever its not using CPU and executes the callbacks. This is what allows it to handle a large number of concurrent requests without one user needing to wait for others to finish.
so my requirement is to run 90 concurrent user doing mutiple scenario (15 scenario)simultenously for 30 minutes in virtual macine.so some of the threads i use concurrent thread group and normal thread group.
now my issue is
1)after i execute all 15 scenarios, my max response for each scenario displayed very high (>40sec). is there any suggestion to reduce this high max response?
2)one of the scenario is submit web form, there is no issue if submit only one, however during the 90 concurrent user execution, some of submit web form will get 500 error code. is the error is because i use looping to achieve 30 min duration?
In order to reduce the response time you need to find the reason for this high response time, the reasons could be in:
lack of resources like CPU, RAM, etc. - make sure to monitor resources consumption using i.e. JMeter PerfMon Plugin
incorrect configuration of the middleware (application server, database, etc.), all these components need to be properly tuned for high loads, for example if you set maximum number of connections on the application server to 10 and you have 90 threads - the 80 threads will be queuing up waiting for the next available executor, the same applies to the database connection pool
use a profiler tool to inspect what's going on under the hood and why the slowest functions are that slow, it might be the case your application algorithms are not efficient enough
If your test succeeds with single thread and fails under the load - it definitely indicates the bottleneck, try increasing the load gradually and see how many users application can support without performance degradation and/or throwing errors. HTTP Status codes 5xx indicate server-side errors so it also worth inspecting your application logs for more insights
I've changed a long running process on an Node/Express route to use Bull/Redis.
I've pretty much copied this tutorial on Heroku docs.
The gist of that tutorial is: The Express route schedules the Job, immediately returns 200 to the client, and browser long polls for the job status (a ping route on Express). When the client gets a completed status, it displays it in the UI. The Worker is a separate file and is run with an addtional yarn run worker.js.
Notice the end where it recommends using Throng for clustering your workers.
I'm using this Bull Dashboard to monitor jobs/queues. This dashboard shows the available Workers, and their status (Idle when not running/ Not in Idle when running).
I've got the MVP working, but the operation is super slow. The average time to complete is 1 minute 30 second. Whereas before (before adding Bull) it was seconds to complete.
Another strange it seems to take at least 30 seconds for a Workers status to change from Idle to not Idle. Seems that a lot of the latency is waiting for the worker.
Being that the new operation is a separate file (worker.js) and throng is enabling clustering, I was expecting this operation to be super fast, but it is just the opposite.
Does anyone have an experience with this? Or pointers to help figure out what is causing this to be so slow?
So I'm starting to use node.js for a project I'm doing.
When a client makes a request, My node.js server fetches from another server a json and then reformats it into a new json that gets served to this client. However, the json that the node server got from the other server can potentially be pretty big and so that "massaging" of data is pretty cpu intensive.
I've been reading for the past few hours how node.js isn't great for cpu tasks and the main response that I've seen is to spawn a child-process (basically a .js file running through a different instance of node) that deals with any cpu intensive tasks that might block the main event loop.
So let's say I have 20,000 concurrent users, that would mean it would spawn 20,000 os-level jobs as it's running these child-processes.
Does this sound like a good idea? (A different web server would just create 20,000 threads on the same process.)
I'm not sure if I should be running a child-process. But I do need to make a non-blocking cpu intensive task. Any ideas of what I should do?
The people who say that don't know how to architect solutions.
NodeJS is exactly what it says, It is a node, and should be treated like such.
In your example, your node instance connects to an external api and grabs json to process and send back.
i.e.
1. Get // server.com/getJSON
2. Process the json
3. Post // server.com/postJSON
So what do you do?
Ask yourself is time an issue? if so then node isnt the solution
However if you are more interested in raw processing power so instead of 1 request done in 4 seconds
You are interested in 200 requests finishing in 10 seconds, but each individual one taking about the full 10 seconds.
Diagnose how long your JSON should take to massage, if it is less than 1 second.
Just run 4 node instances instead of 1.
However if its more complex than that, Break the json into segments to process. And use asynchronous callbacks to process each segment
process.nextTick(function( doprocess(segment1); process.nextTick(function() {doprocess(segment2)
each doProcess calls the next doProcess
Node js will trade time between requests.
Now Take that solution and scale it too 4 node instances per server, and 2-5 servers
and suddenly you have an extremely scaleable and cost effective solution.