JavaScript is single-threaded (besides web workers and spawning multiple processes), and it's best not to wait for long-running operations as it blocks the thread. But still, when taking a peek into several modules in Github, they actually use these syncrhonous operations, most of the time in file operations.
Am I staring at bad code/practice? Or is there some real need for synchronous operations in JavaScript that I am not aware of?
Can you post an example? You are most likely looking at:
a call that is actually asynchronous such as fs.read. Note that all synchronous calls in the node core API end with the word "Sync" like fs.readSync.
synchronous code such as require('somemodule') that runs before the application begins accepting network requests
And yes, if you are seeing code doing something like fs.readSync while responding to an HTTP request, that is bad code/practice and that application will lock up while that synchronous operation happens.
Node.js it's not single-threaded, uses a pool of threads but they are exposed as a single thread to the javascript layer, otherwise would be impossible to write asynchronous code. Any I/O call blocks the current thread.
Threads are used internally to fake the asynchronous nature of all the
system calls. libuv also uses threads to allow you, the application,
to perform a task asynchronously that is actually blocking, by
spawning a thread and collecting the result when it is done.
http://nikhilm.github.com/uvbook/threads.html
Node.js has decided to include synchronous functions to maintain a similitude with other common languages, but they shouldn't be used, never, never, never!
Node.js in its nature is asynchronous, it's pure javascript. Javascript is synonym of closure, of callback. If you want to write synchronous code with Node.js perhaps you should try another scripting language like python.
There's an excellent module called async that eases the pain of nested callbacks. Then, why should I use synchronous code? Silly. If I use synchronous code I lose all the benefits that Node provides to me. The only exception are CLI apps, but again, I'll prefer to write all the code asynchronously. It's not really hard.
At some level, a synchronous operation has to occur. Node.js puts those in threads so that the whole server isn't blocked.
Related
As I understand, the libuv thread pool is only used for a few things, one of them being IO that is blocking in nature. This encompasses file system operations (most of the fs module), which of course includes their async counterparts.
Given that both network I/O and async file I/O are OS-dependent AND non-blocking in nature, why is the former handled by the event loop but the latter handled by the thread pool?
As I understand it, this has to do with the libuv implementation which has not chosen to pursue using OS-native asynchronous file I/O across all the different operating systems that it supports. As such, it uses the thread pool to present an asynchronous interface to the blocking file I/O that is used internal to the thread.
This article discusses some of the possible reasons to avoid OS-level asynchronous file I/O on Windows and could be part of the reason that libuv didn't choose to go that way.
And, this note from Microsoft discusses some of the situations where, even when using a Microsoft asynchronous API, the call ends up being synchronous. In other words, if you want to guarantee that it's always asynchronous, you have to hide it in an OS thread like libuv does.
As for networking interfaces, there is better uniformity of asynchronous support across the various OS platforms that libuv has chosen to rely on.
The Node.js addons can be evoked in two ways: synchronous and asynchronous.
The two ways are possible, so I guess that the synchronous way could be more advantageous in some situation. What is that particular situation?
Pretty much the ONLY time I use the synchronous version of an IO function is in the startup phase of the server. It simplifies the startup code, but does not impact the ultimate scalability or performance of the server. For example, the built-in require() is synchronous for this reason.
During runtime when processing a request, you pretty much never want to use any synchronous function if there is an asynchronous alternative because it significantly reduces the scalability and performance of your server. It can add some coding complexity to always use the async version, but that extra complexity is what gives node.js it's performance and scalability.
We have an app that uses server-side rendering for SEO purposes using EJS templating.
I am well-versed with Node.js and know that it's probably possible to tap into the Node.js threadpool for asynchronous I/O for whatever purpose you want, whether it's a good idea or a bad idea. Currently I am wondering if it is possible to run ejs.render() or res.render() with a thread in the threadpool instead of the main thread in Node.js?
We are doing a lot of heavy computational lifting in the render functions and we definitely want that off the main thread, otherwise we will be paying $$$ for more servers.
Is it just the rendering that is concerning you? There are other template engines which should produce better results; being that template rendering should be an idempotent operation, you could additionally distribute across a cluster.
V8 will compile your code to assembly and, if your not hitting any deoptimizations or getting stalled by the garbage collector, I believe you should be in the neighborhood of your network I/O limits. I would definitely recommend you try other template engines, adding a caching HTTP reverse proxy at the front and running some benchmarks first.
EJS is known to be synchronous, and that's not going to change, so basically it's an inefficient rendering engine for Node.js since it blocks the JS thread whenever it renders a view, which degrades your overall throughput, especially if your rendering is CPU heavy.
You should definitely think about some other options. E.g. https://github.com/ericf/express-handlebars
If you really have CPU-heavy computation in your webserver, then Node.js is definitely not the right tool for the job anyway. There are much better servers to handle multi-threading and parallel processing. You could just setup Node to be a controller and forward your CPU-heavy requests to a backend service/server that can do the heavy-lifting.
It would be helpful to see what kind of computation you are doing during render to provide a better answer.
Tapping into the thread-pool (which is handled by libuv) would probably be a bad idea, but it is possible of course.. you just need some C++ skills and the uv_queue_work() method of the libuv library to schedule stuff on a worker thread.
I have experimented with building a scripting engine that is run in a forked process (Read on node's child process module here). I find that to be an attractive proposition for implementing rendering engines. Yes there are issues of passing parameters (post/get query strings, session status, etc) but they are easy to deal with, especially if you use the fork option (as opposed to exec or spawn). There are standard messaging methods to communicate between the child and parent.
The only overhead is the generation of the additional instance of node (the rendering engine itself). If you are doing extensive computation in the scripting engine then this constant, the one-time per rendering request overhead of forking a new process will be minor compared to the time taken to render.
If EJS rendering blocks the main node thread, then that alone is sufficient reason NOT to use it if you are doing any significant computation during rendering.
As I've done some more research into web server software, I've begun to question if Apache's thread/process based method is the way to go vs. the the asynchronous request handling provided by servers like Nginx a Lighttpd, which tend to scale better with heavier loads.
I understand there are many other differences between these latter two and Apache. My question is under what circumstances would I pick a thread/process based method over the asynchronous handling.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.
This is old, but worth answering. Let's first start by saying how each model works.
In threaded, you have a request come in to a handler, the handler spawns a new OS thread to handle that request, and all work for that request happens in that thread until a response is sent and the thread is ended. This model supports as many concurrent requests as threads that your server can spawn (but threads can be somewhat heavyweight).
When doing async a request comes in to a handler but instead of creating a thread to deal with it, it adds the connection to what's known as an event loop. The event loop listens for data/state changes on the connection and fires callbacks each time "something" happens. Once the connection is added to the event loop, the handler immediately listens for new connections to add. This allows you to have many (sometimes 100K) concurrent connections at the same time.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
Yes, when you're doing number crunching. The architecture of an async (or "evented") system is such that it is great at passing data around but not processing data. It can handle thousands of concurrent operations, but because it only runs on one OS thread, the callbacks it fires need to do as little as possible to get the most throughput. This is because if one of your callbacks does some number crunching that takes 5 seconds, your entire server is frozen for 5 seconds until that operation completes. The idea is to get data, send it to where it's going (database, API, etc) and send a response all with minimal processing.
Async is good for network I/O: passing data between multiple sources/destinations (and also user interfaces, but that's beyond this post).
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
See above, but any time you're doing more CPU work than network I/O, you should switch to a threaded model. However, there are architecture workarounds...for instance, you could have an async app, and anytime it needs to do real work, it sends a job to a worker queue. However, if every request requires CPU processing then that architecture is overkill and you might as well just use a threaded server.
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.
Programming in async is generally more complicated than threaded. That said, if you're not doing the programming yourself (ie you're choosing between nginx and apache) then I usually recommend you go async (nginx) because you'll generally be able to squeeze more juice out of your server that way. I'm always in favor of using as much async in the stack as possible.
That said, if you're programming an app and trying to decide whether to use a threaded or async model, you will have to take developer time into account. Unless you're using a language that has green threads over an event loop (like scheme), expect to tear your hair out quite a bit over rogue exceptions crashing your entire app and in general wrapping your head around CPS/using callbacks for everything. Futures/promises are your friend, but are only a bandaid to make async nicer.
TL;DR
Async, when used in a server, can squeeze (a lot) more concurrent operations than threading if you're doing network IO and nothing else.
If you're doing any kind of number crunching, either use a threaded app server or use an async app with a background queuing system.
Async is a lot harder to program in unless your language supports "fake" threading over it (ie green threads). Once you get past the initial hump you're fine, generally. If you don't have green threads, use promises.
If you have the choice between threaded and async as a component in your stack (apache vs nginx), and they provide the exact same features, slightly favor async. Don't just pick it because you think it will make everything 20x faster though.
Processes have several advantages compared to threads and async models related to security and reliability. Most websites don't need these particular advantages, but sometimes they're indispensable.
Security: you can run your worker processes in a sandbox, as a low privileged user, and handle only one request per worker process. This mitigates against some kinds of security vulnerabilities: even if an attacker takes over your entire worker process, as long as you sandboxed it tightly based on request metadata (i.e. it doesn't have write access to all your data), then it can't harm system stability or affect the responses made to requests.
Security #2: sometimes you need to sandbox untrusted code, or to enforce segregation between different code or different requests, and the only way to do this is with a separate one-shot process. (Think running user-provided code.)
Reliability: memory leaks and memory corruption are much less severe if you teardown and replace worker processes regularly (or for each request).
It's easy to enforce hard limits on CPU time, disk and network quota, etc. spent on handling a user request in a separate process. Even if the request-handling code goes into an infinite loop, the master process (or the OS) can enforce a timeout.
When should I use asynchronous operations in boost::asio instead of synchronous operations in seperate threads?
Does the Rationale section help?
Most programs interact with the outside world in some way, whether it be via a file, a network, a serial cable, or the console. Sometimes, as is the case with networking, individual I/O operations can take a long time to complete. This poses particular challenges to application development.
Boost.Asio provides the tools to manage these long running operations, without requiring programs to use concurrency models based on threads and explicit locking.
I would strongly urge you to use a asynchronous approach whenever possible. A asynchronous call doesn't necessarily create a thread, so by sticking with an asynchronous operation you may reduce the overhead that is associated with threads. In addition threads are usually harder to develop and maintain.
Hope it helps.