I'm trying to connect to a 3rd party library, that has a function that can block. I would like to use it, but without blocking. Is it possible to wrap a blocking call that I don't have the control of, to make it async?
// calling this function will block the nodejs thread
blockingCall();
What I would like would be something like this.
// wrapper for the blocking call
var wrapper = wrapBlockingCall(blockingCall);
wrapper.on('complete', function() {});
Is this possible? Does this make sense?
There is no way to make a blocking JavaScript code non-blocking in Node.js - the mechanism which Node uses for its non-blocking behaviour is implemented in the C/C++ layer, which in turn is used only when doing I/O operations (reading from disk, networking etc.).
In reality, every line of JavaScript your program uses will be executed one-by-one, because it is always executed on the same thread, no matter what you do.
The only option I see is to execute the offending code in a separate Node process using the built-in Child Process module. However, this will have significant performance impact, even bigger one if the code needs to be executed frequently.
Note:
After reading the comments under your question, it seems that you are actually the author of the blocking function, which in turn calls a C API which performs blocking I/O. There are ways of calling C functions which would normally block in a manner which does not block the upper JavaScript layer.
While I am not a C expert, I think this is accomplished using the libuv library included in Node - have a look at the addons documentation for more info.
Related
I'm pretty new to node and am trying to setup an express server. The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
Is there anything like a synchronized block in Node? I believe node is single threaded and has a few helper threads to handle asyncronous/callback code. Is it at all possible that the same line of code could run at exactly the same time in Node?
Thanks!
Can the same line of Node.js code run at the same time? Is it at all possible that the same line of code could run at exactly the same time in Node?
No, it is not. Your Javascript in node.js is entirely single threaded. An event is pulled from the event queue. That calls a callback associated with that event. That callback runs until it returns. No other events can be processed until that first one returns. When it returns, the interpreter pulls the next event from the event queue and then calls the callback associated with it.
This does not mean that there are not concurrency issues in node.js. There can be, but it is caused not by code running at the same physical time and creating conflicting access to shared variables (like can happen in threaded languages like Java). Concurrency issues can be caused by the asynchronous nature of I/O in node.js. In the asynchronous case, you call an asynchronous function, pass it a callback (or expect a promise in return). Your code then continues on and returns to the interpreter. Some time later an event will occur inside of node.js native code that will add something to the event queue. When the interpreter is free from running other Javascript, it will process that event and then call your callback which will cause more of your code to run.
While all this is "in process", other events are free to run and other parts of your Javascript can run. So, the exposure to concurrency issues comes, not from simultaneous running of two pieces of code, but from one piece of your code running while another piece of your code is waiting for a callback to occur. Both pieces of code are "in process". They are not "running" at the same time, but both operations are waiting from something else to occur in order to complete. If these two operations access variables in ways that can conflict with each other, then you can have a concurrency issue in Javascript. Because none of this is pre-emptive (like threads in Java), it's all very predictable and is much easier to design for.
Is there anything like a synchronized block in Node?
No, there is not. It is simply not needed in your Javascript code. If you wanted to protect something from some other asynchronous operation modifying it while your own asynchronous operation was waiting to complete, you can use simple flags or variables in your code. Because there's no preemption, a simple flag will work just fine.
I believe node is single threaded and has a few helper threads to handle asyncronous/callback code.
Node.js runs your Javascript as single threaded. Internally in its own native code, it does use threads in order to do its work. For example, the asynchronous file system access code internal to node.js uses threads for disk I/O. But these threads are only internal and the result of these threads is not to call Javascript directly, but to insert events in the event queue and all your Javascript is serialized through the event queue. Pull event from event queue, run callback associated with the event. Wait for that callback to return. Pull next event from the event queue, repeat...
The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
We'd really have to see what your code looks like to understand what exact problem you're trying to solve. I'd guess that you can just use a simple boolean (regular variable) in node.js, but we'd have to see your code to really understand what you're doing.
I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.
I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?
Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.
Doesn't code take an efficiency hit by being synchronous? Why is coding synchronously a win?
I found these two links in doing some research: http://bjouhier.wordpress.com/2012/03/11/fibers-and-threads-in-node-js-what-for/, https://github.com/Sage/streamlinejs/
If the goal is to prevent spaghetti code, then clearly you can have asynchronous code, with streamline.js for example, that isn't a callback pyramid, right?
You have to distinguish two things here:
Synchronous functions like node's fs.readFileSync, fs.statSync, etc. All these functions have a Sync in their names (*). These functions are truly synchronous and blocking. If you call them, you block the event loop and you kill node's performance. You should only use these functions in your server's initialization script (or in command-line scripts).
Libraries and tools like fibers or streamline.js. These solutions allow you to write your code in sync-style but the code that you write with them will still execute asynchronously. They do not block the event loop.
(*) require is also blocking.
Meteor uses fibers. Its code is written in sync-style but it is non-blocking.
The win is not on the performance side (these solutions have their own overhead so they may be marginally slower but they can also do better than raw callbacks on specific code patterns like caching). The win, and the reason why these solutions have been developed, is on the usability side: they let you write your code in sync-style, even if you are calling asynchronous functions.
Jan 25 2017 edit: I created 3 gists to illustrate non-blocking fibers:
fibers-does-not-block.js, fibers-sleep-sequential.js, fibers-sleep-parallel.js
The code is not "synchronous" when using something like streamlinejs. The actual code will still run asynchronously. It's not very pretty to write lots of anonymous callback functions, thats where these things helps.
Redis is very fast. For most part on my machine it is as fast as say native Javascript statements or function calls in node.js. It is easy/painless to write regular Javascript code in node.js because no callbacks are needed. I don't see why it should not be that easy to get/set key/value data in Redis using node.js.
Assuming node.js and Redis are on the same machine, are there any npm libraries out there that allow interacting with Redis on node.js using blocking calls? I know this has to be a C/C++ library interfacing with V8.
I suppose you want to ensure all your redis insert operations have been performed. To achieve that, you can use the MULTI commands to insert keys or perform other operations.
The https://github.com/mranney/node_redis module queues up the commands pushed in multi object, and executes them accordingly.
That way you only require one callback, at the end of exec call.
This seems like a common bear-trap for developers who are trying to get used to Node's evented programming model.
What happens is this: you run into a situation where the async/callback pattern isn't a good fit, you figure what you need is some way of doing blocking code, you ask Google/StackExchange about blocking in Node, and all you get is admonishment on how bad blocking is.
They're right - blocking, ("wait for the result of this before doing anything else"), isn't something you should try to do in Node. But what I think is more helpful is to realize that 99.9% of the time, you're not really looking for a way to do blocking, you're just looking for a way to make your app, "wait for the result of this before going on to do that," which is not exactly the same thing.
Try looking into the idea of "flow control" in Node rather than "blocking" for some design patterns that could be a clearer fit for what you're trying to do. Here's a list of libraries to check out:
https://github.com/joyent/node/wiki/modules#wiki-async-flow
I'm new to Node too, but I'm really digging Async: https://github.com/caolan/async
Blocking code creates a MASSIVE bottleneck.
If you use blocking code your server will become INCREDIBLY slow.
Remember, node is single threaded. So any blocking code, will block node for every connected client.
Your own benchmarking shows it's fast enough for one client. Have you benchmarked it with a 1000 clients? If you try this you will see why blocking code is bad
Whilst Redis is quick it is not instantaneous ... this is why you must use a callback if you want to continue execution ensuring your values are there.
The only way I think you could (and am not suggesting you do) achieve this use a callback with a variable that is the predicate for leaving a timer.