It's few times harder to program using continuations (callbacks) rather than in model of straight sequential execution. Can NodeJs do blocking calls?
Yes, it can. For example you can read a file with fs.readFileSync() rather than fs.readFile(). Each library usually provides a xxxSync method for synchronous/blocking methods.
But you should NOT use the sync method very often. Remember that Node.js uses a single thread of execution for JavaScript code. If you block this thread you block it for everybody (unlike C#/Java where a new thread will be created for each request.)
If the asynch approach is too much for you you might want to use another platform (Ruby, Python, PHP).
Yes, node.js 0.11.x can do blocking calls inside if a generator. By "blocking call" I mean stopping execution of current function for a while. Look at co library.
This is the only recommended way to do blocking calls.
Other than that, you can look at fibers, but it needs to be used carefully, and not in general-purpose libraries.
There are also a couple of *Sync calls mentioned before, but please avoid using them entirely unless you're writing one-liner.
Related
I was going through Node.js documentation, and couldn't understand the line :
A Node.js app runs in a single process, without creating a new thread for every request. Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking and generally, libraries in Node.js are written using non-blocking paradigms, making blocking behavior the exception rather than the norm.
Source : Introduction to node js
I couldn't understand specifically:
[...] Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking [..]
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
If not then what are these set of asynchronous I/O primitives? If anyone could provide me some link for better understanding or getting started with Node.js, that would be really great.
P.S : I have practical experience with Nodejs where I understand how it's code will work but don't understand why it will work, so I want understand its theoretical part, so I can understand what actually is going on in the background.
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
Yes, that's basically what it means.
In a "traditional" one-thread-per-connection* model you accept a connection and then hand off the handling of that request to a thread (either a freshly started one or from a pool, doesn't really change much) and do all work related to that connection on that one thread, including sending the response.
This can easily done with synchronous/blocking I/O: have a read method that simply returns the read bytes and a write method that blocks until the writing is done.
This does, however mean that the thread handling that request can not do anything else and also that you need many threads to be able to handle many concurrent connections/requests. And since I/O operations take a relatively huge time (when measured in the speed of memory access and computation), that means that most of these threads will be waiting for one I/O operation or another for most of the time.
Having asynchronous I/O and an event-based core architecture means that when a I/O operation is initiated the CPU can immediately go on to processing whatever action needs to be done next, which will likely be related to an entirely different request.
Therefore one can handle many more requests on a single thread.
The "primitives" just means that basic I/O operations such as "read bytes" and "write bytes" to/from network connections or files are provided as asynchronous operations and higher-level operations need to be built on top of those (again in an asynchronous way, to keep the benefits).
As a side node: many other programming environments have either had asynchronous I/O APIs for a long time or have gotten them in recent years. The one thing that sets Node.js apart is that it's the default option: If you're reading from a socket or a file, then doing it asynchronously is what is "normal" and blocking calls are the big exception. This means that the entire ecosystem surrounding Node.js (i.e. almost all third-party libraries) works with that assumption in mind and is also written in that same manner.
So while Java, for example, has asynchronous I/O you lose that advantage as soon as you use any I/O related libraries that only support blocking I/O.
* I use connection/request interchangeably in this answer, under the assumption that each connection contains a single request. That assumption is usually wrong these days (most common protocols allow multiple request/response pairs in a single connnection), but handling multiple requests on a single connection doesn't fundamentally change anything about this answer.
It means node js doesn't halts on input/output operations. Suppose you need to do some task and it have some blocking condition e.g if space key is pressed then do this or while esc key isn't pressed keep taking input as node js is single threaded this will stop all operations and it will focus of doing the blocking condition job until its finishes it asyncronous will allow application not halt other tasks while doing one it will do other task until than task finishes and thats why we use await to get value out of promises in node js async function when the data is processed then the pointer will comes to the line where await line is present and get the value or process the information from it.
I'm writing an event-driven program with the event scheduling written in C. The program uses Python for extension modules. I would like to allow extension modules to use async/await syntax to implement coroutines. Coroutines will just interact with parts of my program, no IO is involved. My C scheduler is single-threaded and I need coroutines to execute in its thread.
In a pure Python program, I would just use asyncio as is, and let my program use its event loop to drive all events. This is however not an option; my event loop needs to serve millions of C-based events per second and I cannot afford Python's overheads.
I tried to write my own event loop implementation, that delegates all scheduling to my C scheduler. I tried a few approaches:
Re-implement EventLoop, Future, Task etc to imitate how asyncio works (minus IO), so that call_soon delegates scheduling to my C event loop. This is safe, but requires a bit of work and my implementation will always be inferior to asyncio when it comes to documentation, debugging support, intricate semantic details, and correctness/test coverage.
I can use vanilla Task, Future etc from asyncio, and only create a custom implementation of AbstractEventLoop, delegating scheduling to my C event loop in the same way. This is pretty straightforward, but I can see that the vanilla EventLoop accesses non-obvious internals (task._source_traceback, _asyncgen_finalizer_hook, _set_running_loop), so my implementation is still second class. I also have to rely on the undocumented Handle._run to actually invoke callbacks.
Things appeared to get simpler and better if I subclassed from BaseEventLoop instead of AbstractEventLoop (but docs say I shouldn't do that). I still need Handle._run, though.
I could spawn a separate thread that run_forever:s a vanilla asyncio.DefaultEventLoop and run all my coroutines there, but coroutines depend on my program's extension API, which does not support concurrent calls. So I must somehow make DefaultEventLoop pause my C event loop while calling Handle._run(). I don't see a reasonable way to achieve that.
Any ideas on how to best do this? How did others solve this problem?
I found that trio, a third-party alternative to asyncio, provides explicit support for integration with alien event loops through something called guest mode. Solves my problem!
The documentation of node.js describes the so called phases of its underlying event loop.
It explicitly states also that idle and prepare phases are only used internally.
For the event loop of node.js is the one of libuv, it goes without saying that those phases are probably mapped on the idle and prepare handles of libuv.
They would help to have greater granularity while organizing the tasks in a software. In particular, they are the only way to schedule something between the execution of the I/O callbacks and the poll phase.
Anyway, they are not exported from the underlying environment.
What's the reason for which those phases have been forbidden, actually giving to the users an apparently poorest event loop than the one offered by libuv?
Is there any other way to schedule tasks the way mentioned above?
Side note: it's just curiosity.
I used to work with both libuv and nodejs and I noticed it, so I want to know if there is a technical reason for that or... Well, that is how it has been designed and that's all, no particular reason.
I don't think there is a specific reason to "forbid" them. Moreover, they are not really forbidden, they are just not exposed. You could create a Node addon which allows you to create idle and prepare handles and there would be no problem at all. There are some things you must be aware of:
Idle handles have a terrible name: they don't run when the loop is actually idle. They run once per loop iteration, after the timers, and if any idle timer is active, the loop will block for i/o for zero seconds. So they can be dangerous because the CPU will spin if you don't stop it.
Callbacks registered with process.nextTick are called when the C++ <-> JS boundary is crossed (see calls to MakeCallback) so i/o callbacks could be deferred and run a bit later. If you exposed prepare handles to JS you would use MakeCallback in the C++ code, so some of the process.nextTick callbacks would also be called alongside your prepare callbacks.
As a general note: idle, check and prepare handles were somehow inherited from libev (which libuv used to use internally). Check and prepare can be used when embedding libuv with other libraries and idle handles are a bit weird, as I mentioned above. Also, libuv follows its own path these days, so not everything libuv has will end up exposed in Node land.
You could ask a reverse question "why do you need idle phase, for example, to be exposed"? You can just use setImmediate().
Also, why do you want to execute something in between I/O callbacks and polling phase, as you don't control explicitly those things anyways?
I've read that node.js uses both treads and an event loop.
I'm curious to know how does it know how to treat a call back... Is it specified by the EventEmitter (and does the engineer know if it is going to be blocking or not)?
Or is the core itself that chooses it at runtime?
If it's this one how does it detect if it has to be run async or threaded?
I've already read a lot of resources but i didn't find about this. Im reading the source code but it's hard since it is a lot of time since the last time i coded with C++.
thanks
Your JavaScript code always runs in a single thread. That's because the V8 JavaScript engine is not threadsafe.
However, as an implementation detail of some of the C++ code, there may be threads. For example, suppose you write some JavaScript code that connects to a database. Your JavaScript code will of course be async, like any good Node code. But async coding is very uncommon in the C/C++ world, so the database vendor probably didn't write an async C/C++ API.
So when someone is writing a Node package for database access, they have to write a shim that adapts between the "blocking" C++ behavior and the "non-blocking, event-driven" Node behavior. When you call, say, a "connect" method, that goes to C++ code that spawns a new thread, and that thread issues a (blocking) "connect" call to the database, which blocks the thread until the connection is done. Then the C++ code will communicate the "connection done" back to the event queue, and the next time the main (JavaScript) thread polls the event queue, your callback will fire.
So yes, there are threads, but their use should be completely transparent to you. When you're writing Node.js code in JavaScript, you don't need to worry about threads -- you just care that things happen when they're supposed to. Package authors may use threads, but that's purely an implementation detail and you should never have to worry about it. Your JavaScript code never explicitly uses threads.
I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.
Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.
Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.
To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!