Testing background processes in nodejs (using tape)

Testing background processes in nodejs (using tape) - node.js

This is a general question about testing, but I will frame it in the context of Node.js. I'm not as concerned with a particular technology, but it may matter.
In my application, I have several modules that are called upon to do work when my web server receives a request. In the case of some of these modules, I close the request before I call upon them.
What is a good way to test that these modules are doing what they are supposed to do?
The advice here for RSpec is to mock out the work these modules are doing and just ensure that the appropriate methods are being called. This makes sense to me, but in Node.js, since my modules are not global, I don't think I cannot mock out functions without changing my program architecture so that every instance receives instances of objects that it needs1.
[1] This is a well known programming paradigm, but I cannot remember its name right now.
The other option I see is to use setTimeout and take my best guess at when these modules are done with their work.
Neither of these seems ideal.
Am I missing something? Are background processes not tested?

Since you are speaking of integration tests of these background components, a few strategies come to mind.
Take all the asynchronicity out of their operation for test mode. I'm imagining you have some sort of queueing process (that could be a faulty assumption), you toss work into the queue, and then your modules pick up that work and do their task. You could rework your test harness such that the test harness stands in as the queuing mechanism and you effectively get direct control over when the modules execute.
Refactor your modules to take some sort of next callback function. They would end up functioning a bit like Express's middleware layer or how async's each function works, but into each module you'd pass some callback that you call when that module's task is complete. Once all of the modules have reported in, then you can check the state of the program.
Exactly what you already suggested-- wait some amount of time, and if it still isn't done, consider that a failure. Mocha sort of does that, in that if a given test is over a definable threshold, then it's a failure. I don't like this way though, because if you add more tests, they all have to wait the same amount of time.

Related

Do multiple node.js "requires" impact production run time?

We are integrating Amazon's node.js SDK into our project and while I do not think it matters due to require's cache and the fact that everything is compiled, I could not find a site that definitively states that multiple requires will not affect performance in run time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.

Well, require() is a synchronous operation so it should ONLY be used during server initialization, never during an actual request. Therefore, the performance of require() should only affect your server startup time, not your request handling time.
Second, require() does have a cache behind it. It matches the fully resolved path of the module you are attempting to load. So, if you call require(somePath) and a module at that same path has previously been loaded, then the module handle is just immediately returned from the cache. No module is loaded from disk a second time. The module code is not executed a second time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
If you are requiring a module for the first time, it WILL block the event loop while loading that module because require() uses blocking, synchronous I/O when the module is not yet cached. That's why you should be doing this at server initialization time, not during a request handler.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Breaking code into logical modules is good for ease of maintenance, ease of testing and ease of reuse, so it's definitely a good thing.
I have seen people go too far where there are so many modules each with only a few lines of code in them that it backfires and makes the project unwieldly to work on, find things in, design test suites for, etc... So, there is a balance.

V8: How to correctly handle microtasks?

When extending V8, how involved do I/we have to be in making sure microtasks are correctly managed? V8, in general, has almost no documentation outside of the code itself, but I'm finding absolutely nothing on microtasks. Specifically, I'd like to learn about MicrotasksScope and how I need to implement it.

You generally shouldn't need to use MicrotasksScope.
Usually you will either be using MicrotasksPolicy::kExplicit or MicrotasksPolicy::kAuto.
With a kAuto policy, any time the script evaluation stack is emptied, microtasks will be run. With kExplicit, you have to do it yourself, using Isolate::RunMicrotasks.
In most situations, the default (kAuto) will work. If you are chromium or node, using kExplicit will make more sense since you need to time your microtask queue with all the other platform stuff like timers and networking.
As for MicrotasksScope, I personally am not aware of any project that uses it, but it will behave the same as kAuto, except the microtask runs when the stack of MicrotasksScope objects empties, instead of Scripts.

NodeJS: Most efficient way to use "require"

What is the best way to use NodeJS's require function? By this, I'm referring to the placement of the require statement. Isf it better to load all dependencies from the beginning of the script, as you need them, or does it not make a notable difference whatsoever?
This article has a lot useful information regarding how require works, though I still can't come to a definitive conclusion as to which method would be most efficient.

Assuming you're using node.js for some sort of server environment, several things are generally true about that server environment:
You want fast response time to any given request.
The code that runs for processing requests should not use synchronous I/O operations because that seriously lessens the scalability of the server.
Server startup time is generally not something you need to optimize for (within reason) so if you're going to pay an initialization cost somewhere, it is usually better paid once at server startup time.
So, given that require() uses synchronous I/O when the module has not yet been cached, that means you really don't generally want to be doing require() operations inside a request handler. And, you want fast response times for your request handlers so you don't want require() calls inside your handler anyway.
All of these leads to a general rule of thumb that you load necessary modules at startup time into a module level variable that you can reuse from one request to the next and you don't load modules inside your request handlers.
In addition to all of this, if you put all your require() statements in a block near the top of your module, it makes your module a lot more self-documenting about what other modules it depends on and how it initializes those modules. If require() statements are sprinkled all over the code, then it makes it a lot harder for a developer to see what this module is using without a lot more study of the code.

It depends what performance characteristics you're looking for.
require() is not cheap; it has to read the JS file from disk, parse it, and execute any top-level code (and do all of that recursively for all files require()d by that file).
If you put all of your require()s on top, your code may take more time to start, but it won't suddenly slow down later. (note that moving the require() further down in the synchronous top-level code will make no difference beyond order of execution).
If you only require() other modules when first used asynchronously, your first request may be noticeably slower, as Node parses all of your dependencies. This also means that any errors from dependencies won't be caught until later. (note that all require() calls are cached, so the second request won't do any more work)
The other disadvantage to scattering require() calls throughout your code is that it makes it less readable; it's very nice to easily see exactly what each file depends on up top.

Node Background Threads - When Do These Get Created?

I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?

Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.

Is there a blocking redis library for node.js?

Redis is very fast. For most part on my machine it is as fast as say native Javascript statements or function calls in node.js. It is easy/painless to write regular Javascript code in node.js because no callbacks are needed. I don't see why it should not be that easy to get/set key/value data in Redis using node.js.
Assuming node.js and Redis are on the same machine, are there any npm libraries out there that allow interacting with Redis on node.js using blocking calls? I know this has to be a C/C++ library interfacing with V8.

I suppose you want to ensure all your redis insert operations have been performed. To achieve that, you can use the MULTI commands to insert keys or perform other operations.
The https://github.com/mranney/node_redis module queues up the commands pushed in multi object, and executes them accordingly.
That way you only require one callback, at the end of exec call.

This seems like a common bear-trap for developers who are trying to get used to Node's evented programming model.
What happens is this: you run into a situation where the async/callback pattern isn't a good fit, you figure what you need is some way of doing blocking code, you ask Google/StackExchange about blocking in Node, and all you get is admonishment on how bad blocking is.
They're right - blocking, ("wait for the result of this before doing anything else"), isn't something you should try to do in Node. But what I think is more helpful is to realize that 99.9% of the time, you're not really looking for a way to do blocking, you're just looking for a way to make your app, "wait for the result of this before going on to do that," which is not exactly the same thing.
Try looking into the idea of "flow control" in Node rather than "blocking" for some design patterns that could be a clearer fit for what you're trying to do. Here's a list of libraries to check out:
https://github.com/joyent/node/wiki/modules#wiki-async-flow
I'm new to Node too, but I'm really digging Async: https://github.com/caolan/async

Blocking code creates a MASSIVE bottleneck.
If you use blocking code your server will become INCREDIBLY slow.
Remember, node is single threaded. So any blocking code, will block node for every connected client.
Your own benchmarking shows it's fast enough for one client. Have you benchmarked it with a 1000 clients? If you try this you will see why blocking code is bad

Whilst Redis is quick it is not instantaneous ... this is why you must use a callback if you want to continue execution ensuring your values are there.
The only way I think you could (and am not suggesting you do) achieve this use a callback with a variable that is the predicate for leaving a timer.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string