If nodejs uses non blocking IO, how is fs.readFileSync implemented? - node.js

I see a lot of synchronous functions in the file system library. such as fs.readFileSync(filename, [options]).
How (and why) are these functions implemented if node has async/non-blocking IO and no sleep method - and can I use the same mechanism to implement other synchronous functions?

fs.readFileSync()
is really just a wrapper for the
fs.readSync()
function. So the question is how is fs.readSync() implemented compared to fs.read(). If you look at the implementations of these two functions they both take advantage of the bindings module. Which in this case is intialized to
var binding = process.binding('fs').
and the calls are
binding.read(fd, buffer, offset, length, position, wrapper);//async
var r = binding.read(fd, buffer, offset, length, position);//sync
Respectively. Once we're in the "binding" module, we are out in v8, node_#####.cc land. The implementation of binding('fs') can be found in the node repository code, in node_file.cc. The node engine offers overloads for the C++ calls, one taking a callback, one that does not. The node_file.cc code takes advantage of the req_wrap class. This is a wrapper for the v8 engine. In node_file.cc we see this:
#define ASYNC_CALL(func, callback, ...) \
FSReqWrap* req_wrap = new FSReqWrap(#func); \
int r = uv_fs_##func(uv_default_loop(), &req_wrap->req_, \
__VA_ARGS__, After); \
req_wrap->object_->Set(oncomplete_sym, callback); \
req_wrap->Dispatched(); \
if (r < 0) { \
uv_fs_t* req = &req_wrap->req_; \
req->result = r; \
req->path = NULL; \
req->errorno = uv_last_error(uv_default_loop()).code; \
After(req); \
} \
return scope.Close(req_wrap->object_);
#define SYNC_CALL(func, path, ...) \
fs_req_wrap req_wrap; \
int result = uv_fs_##func(uv_default_loop(), &req_wrap.req, __VA_ARGS__, NULL); \
if (result < 0) { \
int code = uv_last_error(uv_default_loop()).code; \
return ThrowException(UVException(code, #func, "", path)); \
}
Notice that the SYNC_CALL uses a different req-wrap. Here is the code for the relevant req_wrap constructor for the ASYNC method, found in req_wrap.h
ReqWrap() {
v8::HandleScope scope;
object_ = v8::Persistent<v8::Object>::New(v8::Object::New());
v8::Local<v8::Value> domain = v8::Context::GetCurrent()
->Global()
->Get(process_symbol)
->ToObject()
->Get(domain_symbol);
if (!domain->IsUndefined()) {
// fprintf(stderr, "setting domain on ReqWrap\n");
object_->Set(domain_symbol, domain);
}
ngx_queue_insert_tail(&req_wrap_queue, &req_wrap_queue_);
}
Notice that this function is creating a new v8 scope object to handle the running of this event. This is where the asynchronous portion of async stuff happens. The v8 engine launches a new javascript interpreting environment to handle this particular call separately. In short, without building/modifying your own version of node, you cannot implement your own asynchronous/synchronous versions of calls, in the same way that node does. That being said, asynchronous really only applies to I/O operations. Perhaps a description of why you think you need things to be more synchronous would be in order. In general, if you believe node doesn't support something you want to do, you just aren't embracing the callbacks mechanism to it's full potential.
That being said, you could consider using the events node module to implement your own event handlers if you need async behavior. And you can consider native extensions if there are things you desperately need to do synchronously, however, I highly recommend against this. Consider how you can work within the asynchronous event loop to get what you need to do done this way. Embrace this style of thinking, or switch to another language.
Forcing a language to handling things a way it doesn't want to handle them is an excellent way to write bad code.

Related

duktape js - have multiple contexts with own global and reference to one common 'singleton'

We are in the process of embedding JS in our application, and we will use a few dozen scripts each assigned to an event. Inside these scripts we provide a minimal callback api,
function onevent(value)
{ // user javascript code here
}
which is called whenever that event happens. The scripts have to have their own global, since this funtion has always the same name and we access it from cpp code with
duk_get_global_string(js_context_duk, "onevent");
duk_push_number(js_context_duk, val);
if (duk_pcall(js_context_duk, 1) != 0)
{
printf("Duk error: %s\n", duk_safe_to_string(js_context_duk, -1));
}
duk_pop(js_context_duk); /* ignore result */
Then again we want to allow minimal communication between scripts, e.g.
Script 1
var a = 1;
function onevent(val)
{
log(a);
}
Script 2
function onevent(val)
{
a++;
}
Is there a way we achieve this? Maybe by introducing an own 'ueber-' global object, that is defined once and referencable everywhere? It should be possible to add properties to this 'ueber-global object' from any script like
Script 1
function onevent(val)
{
log(ueber.a);
}
Script 2
function onevent(val)
{
ueber.a=1;
}
Instead of simple JS files you could use modules. duktape comes with a code example to implement a module system (including its code isolation) like in Node.js. Having that in place you can export variables that should be sharable.
We have an approach that seems to work now. After creating the new context with
duk_push_thread_new_globalenv(master_ctx);
new_ctx = duk_require_context(master_ctx, -1);
duk_copy_element_reference(master_ctx, new_ctx, "ueber");
we issue this call sequence in for all properties/objects/functions created in the main context:
void duk_copy_element_reference(duk_context* src, duk_context* dst, const char* element)
{
duk_get_global_string(src, element);
duk_require_stack(dst, 1);
duk_xcopy_top(dst, src, 1);
duk_put_global_string(dst, element);
}
It seems to work (because everything is in the same heap and all is single threaded). Maybe someone with deeper insight into duktape can comment on this? Is this a feasible solution with no side effects?
edit: mark this as answer. works as expected, no memory leaks or other issues.

Dispatching up to max parallel REST calls in node.js / how does await work in node

I'm using node.js, have a graph of dependent REST calls and am trying to dispatch them in parallel. It's part of a testing/load testing script.
My graph, has "connected components", and each component is directed and acyclic. I toposort each component, so I end up with a graph that looks like this
Component1 = [Call1, Call2...., Callm] (Call2 possibly dependent on call1 etc)
Component2 = [Call1, Call2... Calln]
...
Componentp
The number of components, and calls in each component m, n and p are dynamic
I want to round robin over the components, and each of it's calls, dispatching up to "x" calls concurrently.
Whilst I understand a little about Promises, async/await and Node's event loop I'm NOT an expert.
PSEUDO CODE ONLY
maxParallel = x
runningCallCount = 0
while(components.some(calls => calls.some(call => noResponseYet(call)) {
if (runningCallCount < maxParallel) {
runningCallCount++
var result = await axios(call)
runningCallCount--
}
}
This doesn't work - I never dispatch the calls.
Remove the await and i fall through to the runningCallCount-- straight away.
Other approaches I've tried and comments
Wrapping every call in an async function, and using Promise.All on a chunk of x number at a time - a chunking style of approach. This may work, but It doesn't acheive the result of allways trying to have x parallel calls going
Used RxJs - tried merge on all components with a max number of parallelism - but this parallelises the components, not the calls within the components, and i couldn't work out how to
make it work the way i wanted based on the poor doco. I'd used the .NET version before so this was a bit disappointing.
I haven't yet tried recursion
Can anyone chime in with an idea as to how to do this ?
How does await work in node ? I've seen it explained like generator functions and yield statements (https://medium.com/siliconwat/how-javascript-async-await-works-3cab4b7d21da)
Can anyone add detail - how is the event loop checked when code strikes an await call ? Again I'm guessing either the entire stack unrolls, or a call to run the event loop is somehow inserted by
the await call.
I'm not interested in using a load testing package, or other load testing tools - I just want to understand the best way to do this, but also understand what's going on in node and await.
I'll update this if i understand this or find a solution, but
Help appreciated.
I would think something like this would work to achieve always having n parallel calls going.
const delay = time => new Promise(r=>setTimeout(r,time));
let maxJobs = 4;
let jobQueue = [
{time:1000},{time:3000},{time:1000},{time:2000},
{time:1000},{time:1000},{time:2000},{time:1000},
{time:1000},{time:5000},{time:1000},{time:1000},
{time:1000},{time:7000},{time:1000},{time:1000}
];
jobQueue.forEach((e,i)=>e.id=i);
const jobProcessor = async function(){
while(jobQueue.length>0){
let job = jobQueue.pop();
console.log('Starting id',job.id);
await delay(job.time);
console.log('Finished id',job.id);
}
return;
};
(async ()=>{
console.log("Starting",new Date());
await Promise.all([...Array(maxJobs).keys()].map(e=>jobProcessor()))
console.log("Finished",new Date());
})();

Convert asynchronous/callback method to blocking/synchronous method

Is is possible to convert an asynchronous/callback based method in node to blocking/synchronous method?
I'm curious, more from a theoretical POV, than a "I have a problem to solve" POV.
I see how callback methods can be converted to values, via Q and the like, but calling Q.done() doesn't block execution.
The node-sync module can help you do that. But please be careful, this is not node.js way.
To turn asynchronous functions to synchronous in 'multi-threaded environment', we need to set up a loop checking the result, therefore cause blocking.
Here’s the example code in JS:
function somethingSync(args){
var ret; //the result-holding variable
//doing something async here...
somethingAsync(args,function(result){
ret = result;
});
while(ret === undefined){} //wait for the result until it's available,cause the blocking
return ret;
}
OR
synchronize.js also helps.
While I would not recommend it, this can easy be done using some sort of busy wait. For instance:
var flag = false;
asyncFunction( function () { //This is a callback
flag = true;
})
while (!flag) {}
The while loop will continuously loop until the callback has executed, thus blocking execution.
As you can imagine this would make your code very messy, so if you are going to do this (which I wouldn't recommend) you should make some sort of helper function to wrap your async function; similar to Underscore.js's Function functions, such as throttle. You can see exactly how these work by looking at the annotated source.

Sequence of code execution in Node.js app

I have always wondered about this and have never found a convincing answer.
Please consider the following case:
var toAddress = '';
if(j==1)
{
toAddress="abc#mydomain.com";
}
else
{
toAddress="xyz#mydomain.com";
}
sendAlertEmail(toAddress);
Can I be certain that by the time my sendAlertEmail() function is called, I will have 'toAddress' populated?
For code like the sample you provided:
var toAddress = '';
if(j==1)
{
toAddress="abc#mydomain.com";
}
else
{
toAddress="xyz#mydomain.com";
}
sendAlertEmail(toAddress);
You can definitely be certain that it is strictly sequential. That is to say that the value of toAddress is either "abc#mydomain.com" or "xyz#mydomain.com".
But, for code like the following:
var toAddress = '';
doSomething(function(){
if(j==1)
{
toAddress="abc#mydomain.com";
}
else
{
toAddress="xyz#mydomain.com";
}
});
sendAlertEmail(toAddress);
Then it depends on whether the function doSomething is asynchronous or not. The best place to find out is the documentation. The second best is looking at the implementation.
If doSomething is not asynchronous then the code execution is basically sequential and you can definitely be certain that toAddress is properly populated.
However, if doSomething is asynchronous then you can generally be certain that the code execution is NOT sequential. Since that is one of the basic behavior of asynchronous functions - that they return immediately and execute the functions passed to them at a later time.
Not all functions that operate on functions are asynchronous. An example of synchronous function is the forEach method of arrays. But all asynchronous functions accept functions as arguments. That's because it's the only way to have some piece of code executed at the end of the asynchronous operation. So whenever you see functions taking functions as arguments you should check if it's asynchronous or not.
Node.js is single threaded (or at least the JS execution is) so since all the above code is synchronous and lined up to all occur during the same tick it will run in order and thus toAddress must be populated.
Things get complicated once you introduce an asynchronous function. In the asynchronous case it it possible for variable to shift between lines, since ticks occur between them.
To clarify during each tick the code is simply evaluated from the top of the execution to the bottom. During the first tick the scope of execution is the whole file, but after that it's callbacks and handlers.
The code that you wrote was pretty simple to point out the asynchronous behavior. Take a look at this code :
var toAddress = 'abc#mydomain.com';
if(j==1)
{ func1(toAddress); }
else
{ func2(toAddress); }
sendAlertEmail(toAddress);
There is no guarantee that sendAlertEmail will execute only after func1 or func2 (the if else conditional) has been executed. In node functions return immediately when they are called and execute the next function called. If you want to make sure they execute sequentially use callbacks or use a library like async.

Is there any advantage/disadvantage to using function delegates over lambdas in a collection?

I am at that really boring part in the development of my library, where I want to construct a class based on a windows message identifier and the WPARAM and LPARAM arguments. The prototype for such functions is trivial:
boost::shared_ptr<EventArgs>(const UINT& _id, const WPARAM& _w, const LPARAM& _l);
For each windows message, I will have a function of this nature.
Now, what I am currently doing is using the FastDelegate library to do my delegates. These are stored in a map thusly:
typedef fastdelegate::FastDelegate3<const UINT&, const WPARAM&, const LPARAM&, boost::shared_ptr<EventArgs> > delegate_type;
typedef std::map<int, delegate_type> CreatorMap;
And when a windows message needs to have an EventArg-derived class created, it's a simple case of looking up the appropriate delegate, invoking it and returning the newly-created instance nicely contained in a shared_ptr.
boost::shared_ptr<EventArgs> make(const UINT& _id, const WPARAM& _w, const LPARAM& _l) const
{
MsgMap::const_iterator cit(m_Map.find(_id));
assert(cit != m_Map.end());
boost::shared_ptr<EventArgs> ret(cit->second(_w, _l));
return ret;
}; // eo make
All is working fine. But then I was thinking, rather than having all these delegates around, why not take advantage of lambdas in C++0x? So, I quickly prototyped the following:
typedef std::map<int, std::function<boost::shared_ptr<EventArgs>(const WPARAM&, const LPARAM&)> > MapType;
typedef MapType::iterator mit;
MapType map;
map[WM_WHATEVER] = [](const WPARAM& _w, const LPARAM& _l) { /* create appropriate eventargs class given parameters */ };
map[WM_ANOTHER] = ....;
// and so on
Once again, it's simple to look up and invoke:
mit m = map.find(WM_PAINT);
boost::shared_ptr<EventArgs> e(m->second(_wParam, _lParam));
// dispatch e
Is there an advantage to using lambdas in this way? I know the overhead of looking up the right delegate/lambda will be the same (as both types of map are keyed with an int), but I am aiming to dispatch my messages from the wndProc in my nice C++ friendly-way as efficiently as possible. My gut-feeling is that lambdas will be faster, but unfortunately i lack the experience in understanding compiler-optimizations to make a judgement call on this, hence my question here :) Oh, and in keeping with the topic of the question, are there any gotchas/something I haven't thought about?
I see two real differences here:
The first is how you store the callback: Either the fast-delegate or std::function. The latter is based on boost::function, and fast delegates were specifically designed to outperform those. However, std::function is standard conforming, while fast delegates are not.
The other difference is the way you setup your code, and I see a clear advantage for lambdas here. You can write the actual code exactly where it matters, you don't need to define a separate function that only serves a niche-purpose.
If you want raw speed - fast-delegates probably win (but you should benchmark, if that is an argument), but if you want readability and standard conformity, go with std::function and lambdas.

Resources