Node.js Express server: running res.render() / ejs.render() using Node.js threadpool

Node.js Express server: running res.render() / ejs.render() using Node.js threadpool - node.js

We have an app that uses server-side rendering for SEO purposes using EJS templating.
I am well-versed with Node.js and know that it's probably possible to tap into the Node.js threadpool for asynchronous I/O for whatever purpose you want, whether it's a good idea or a bad idea. Currently I am wondering if it is possible to run ejs.render() or res.render() with a thread in the threadpool instead of the main thread in Node.js?
We are doing a lot of heavy computational lifting in the render functions and we definitely want that off the main thread, otherwise we will be paying $$$ for more servers.

Is it just the rendering that is concerning you? There are other template engines which should produce better results; being that template rendering should be an idempotent operation, you could additionally distribute across a cluster.
V8 will compile your code to assembly and, if your not hitting any deoptimizations or getting stalled by the garbage collector, I believe you should be in the neighborhood of your network I/O limits. I would definitely recommend you try other template engines, adding a caching HTTP reverse proxy at the front and running some benchmarks first.

EJS is known to be synchronous, and that's not going to change, so basically it's an inefficient rendering engine for Node.js since it blocks the JS thread whenever it renders a view, which degrades your overall throughput, especially if your rendering is CPU heavy.
You should definitely think about some other options. E.g. https://github.com/ericf/express-handlebars
If you really have CPU-heavy computation in your webserver, then Node.js is definitely not the right tool for the job anyway. There are much better servers to handle multi-threading and parallel processing. You could just setup Node to be a controller and forward your CPU-heavy requests to a backend service/server that can do the heavy-lifting.
It would be helpful to see what kind of computation you are doing during render to provide a better answer.
Tapping into the thread-pool (which is handled by libuv) would probably be a bad idea, but it is possible of course.. you just need some C++ skills and the uv_queue_work() method of the libuv library to schedule stuff on a worker thread.

I have experimented with building a scripting engine that is run in a forked process (Read on node's child process module here). I find that to be an attractive proposition for implementing rendering engines. Yes there are issues of passing parameters (post/get query strings, session status, etc) but they are easy to deal with, especially if you use the fork option (as opposed to exec or spawn). There are standard messaging methods to communicate between the child and parent.
The only overhead is the generation of the additional instance of node (the rendering engine itself). If you are doing extensive computation in the scripting engine then this constant, the one-time per rendering request overhead of forking a new process will be minor compared to the time taken to render.
If EJS rendering blocks the main node thread, then that alone is sufficient reason NOT to use it if you are doing any significant computation during rendering.

Related

Why React `renderToString` method doesn’t use clusters?

Why this question?
I’m learning things about js performances and web rendering. This post was very useful.
If you follow some links you will land here and read this:
User code in Node.js runs in a single thread, so for compute operations (as opposed to I/O), you can execute them concurrently, but not in parallel.
So I’ve read things about concurrency and parallelism in nodeJS. What I’ve learned is that nodeJS is:
Parallel for I/O bound tasks since it’s handled by libuv
Concurrent for CPU bound tasks
This explains why renderToString is a slow operation since it is CPU bound. But it seems that there is a way to enable parallelism of CPU bound tasks in nodejs: clustering.
The question
This is why I’m here. Do you know why renderToString isn’t clustered (don’t know if this is valid English)?
Maybe it’s too complicated?
Maybe it just can’t be done in parallel?
Maybe it doesn't improve performances for some reason?
I would like to understand why. Because after those readings, I tend to think that nodeJS is very performant when it comes to dealing with I/O, but it seems to also be performant for CPU bound tasks since you can create clusters. Nevertheless it doesn’t seems to be trivial and it’s a choice to consider in some specific cases.
So this leads us to one bonus question: what are the limitations/drawbacks of nodejs’s clusters ? (Excepting the fact that it seems to be complicated to setup and to maintain on large projects?)

It would not make sense to place the abstraction at this level.
It is not hard to run a renderToString() as a cluster. For instance, you can easily use the worker-farm library.
The problem is this becomes hard to use in a beneficial way, because the "store" of data built for each incoming request must be in scope for the entire component tree that renderToString() works on.
Perhaps though with the experimental worker threads node.js library, we might get some multi-threaded renderToString. But, the work on SSR (Server Side Rendering) of React is not nearly as active as the client.
Maybe with the work to allow React to suspend a tree its rendering, and start it up, we'll eventually have a thread that can continue rendering while primary thread acts on incoming request/action.

Can Node.js become the bottleneck in a MEAN stack?

If I were writing an application using the MEAN stack, and the database is optimized sufficiently enough to almost never be the bottleneck, can Node.js itself become a bottleneck due to the site traffic and/or number of concurrent users? This is purely from the perspective of Node.js being an asynchronous, single threaded event loop. One of the first tenets of Node.js development is to avoid writing code that performs CPU intensive tasks.
Say, if I had to post-process the data returned from MongoDB and that was even moderately CPU intensive, it sounds like that should be handled by a service layer sitting in between Node.js and MongoDB, that is not pounding the same CPU dedicated to Node.js. Techniques such as process.nextTick() are harder to comprehend and more importantly to realize when to use them.
Forgive me for this borderline rant, but I really do want to have a better idea of Node.js' strengths and weaknesses.

How should I implement server push so that the browser is updated with DB updates?

I am reading on various ways of doing server push to client side(broswer).I would like to understand the best approach out of these.
Long polling -- To be avoided as it holds up resources longer on server side.
Node JS async delegation using callbacks.--cons that it is single threaded.
Write callbacks in java , use threads to do task in background and later use callback to push it to server like node.js does.
The advantage here is that we will have multiple threads running in parallel and utilizing CPU efficiently.
Can anyone suggest the best way of implementation? Any other way is also appreciated.

You seem to misunderstand few things. You cannot compare for example long polling to the server side technology.
Long polling means that the client (i.e. browser) makes an AJAX request to the server. Then the server keeps that request alive until there is a notification. Then it responds to that request and the client after receiveing the update immediatly calls the server with new AJAX request.
You can chose whatever technology you want to handle that on the server side. People made NodeJS with this on they're mind and thus I would recommend using it for that. But use whatever suits you better.
Now another misunderstanding is that threads increase performance and thus they are better then single threaded applications. Actually the opposite is true: with threads the performance gets worse (here I assume that we are working on one core CPU). Threads increase responsivness, but not performance. There is however a problem (with single threaded apps) that if a thing you're trying to do is CPU intensive, then it will block your server. However if you are talking about simple notifications, then that's not an issue at all (you don't need CPU power for that). SIDE NOTE: You can fire as many instances of NodeJS as you have cores to take advantage of multiple cores (you will need a bit more complex code, though).
Also you should consider using WebSockets (thus implementing a simple TCP server on the server side). Long polling is inefficient and most modern browsers do support WebSockets (in particular IE10+ as it was always an issue with IE).
Concluding: on the server side use the technology you're most familiar with.

Simulate website load on Node.JS

I am thinking of creating my own simple load test, where I can hit my website with multiple requests (like 100-1000 concurrent users) to see how it performs. I want to try Node.js out, but I don't know if it is the wrong technology for the job, since Node.js don't use threads?
Can I with the async model that Node.js uses, simulate the many user requests, or would that be more appropiate to do in another language like Ruby/.NET/Python?

Node.js ought to be perfect for the task. I do this at work. The one crucial piece that you will have to change is the http socket pool. The following code snipped will disable pooling entirely, letting you starve your Node.js process if you want to.
var http = require('http');
var req = http.request(..., agent: false)
You can read about this more at the http.Agent documentation.
Your concern about threads is astute, but even if you hit that limit (Node is very good at keeping your resources efficient) the solution is simple: start multiple instances (processes) of your load test. As it is, you may have to use multiple machines entirely to correctly simulate load.
In any case, you will not win automatically by using Ruby or Python for this. Asynchronous programming is ideal for I/O and network-bound tasks, and Node excels at this. Similarly, while Ruby and Python have third-party asynchronous frameworks, they're by definition more obscure than the standard asynchronous framework given in Node.

Node can fire off pretty much as many requests as you want it to (though you may have to change the defaults for http:Agent). You're more likely to be limited by what your OS can do than by anything inherent in node (and of course such limitations will apply in any other language you use).

It's simple to create load tests with nodeload.

Node.js event vs thread programming on server side

We are planning to start a fairly complex web-portal which is expected to attract good local traffic and I've been told by my boss to consider/analyse node.js for the serve side.
I think scalability and multi-core support can be handled with an Nginx or Cherokee in front.
1) Is this node.js ready for some serious/big business?
2) Does this 'event/asynchronous' paradigm on server side has the potential to support the heavy traffic and data operation ? considering the fact that 'everything' is being processed in a single thread and all the live connections would be lost if it got crashed (though its easy to restart).
3) What are the advantages of event based programming compared to thread based style ? or vice-versa.
(I know of higher cost associated with thread switching but hardware can be squeezed with event model.)
Following are interesting but contradicting (to some extent) papers:-
1) http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html
2) http://pdos.csail.mit.edu/~rtm/papers/dabek:event.pdf

Node.js is developing extremely rapidly, and most of its functionality is sturdy and ready for business. However, there are a lot of places where its lacking, like database drivers, jquery and DOM, multiple http headers, etc. There are plenty of modules coming up tackling every aspect, but for a production environment you'll have to be careful to pick ones that are stable.
Its actually much MUCH more efficient using a single thread than a thousand (or even fifty) from an operating system perspective, and benchmarks I've read (sorry, don't have them on hand -- will try to find them and link them later) show that it's able to support heavy traffic -- not sure about file-system access though.
Event based programming is:
Cleaner-looking code than threaded code (in JavaScript, that is)
The JavaScript engine is extremely efficient with processing events and handling callbacks, and its easily one of the languages seeing the most runtime optimization right now.
Harder to fit when you are thinking in terms of control flow. With events, you can never be sure of the flow. However, you can also come to think of it as more dynamic programming. You can treat each event being fired as independent.
It forces you to be more security-conscious when programming, for the above reason. In that sense, its better than linear systems, where sometimes you take sanitized input for granted.
As for the two papers, both are relatively old. The first benchmarks against this, which as you can see, has a more recent note about these studies:
http://www.eecs.harvard.edu/~mdw/proj/seda/
It also cites the second paper you linked about what they have done, but refuses to comment on its relevance to the comparison between event-based systems and thread-based ones :)

Try yourself to discover the truth

See What is Node.js? where we cover exactly that:
Node in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the docs. With Node v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Rails/Node behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like "Unicorn" that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string