How to call custom asnyc code to initialize a Node.io Job once (before successive calls to input())? - node.js

Just discovered Node.io, gone though the docs, api, etc. and it looks great. However, building my first job exports.job = new nodeio.Job(..), with methods like input, run,output, reduce, complete I'm in need of some kind of initialize() method which is called once, before successive calls to input() are done. (Similar how complete is called once just before the job is finished)
Any such method around?
For completeness:
This code imho has to be part of the node.io flow (through some dedicated method) since initializing my async code outside of the node.io scope doesn't guarentee the data is already there before the node.io job is executed.

I don't know if there is such a method, have you browsed through the source? It looks like there is an 'init' method that is called by the processor if it is on the job. If you try that and it isn't what you're looking for, you could suggest this as a feature on the node.io github site.
Otherwise, this would be a very simple thing to add for yourself. Just add an 'initialize' method to your object, and then put the following lines at the top of your 'input' or 'run' method (which I think would probably work better if you need the data to be ready already):
if (!this.initialized) {
this.initialized = true;
this.initialize();
}
Note that there is a tiny performance hit here, of course. But in most cases, it's only the amount of time it takes to check the value of one variable, which is probably quite minimal compared to the amount of processing you actually need.

Related

Overriding a built-in class method that another class I want to inherit from relies on

So I'm writing a script/application that uses pythons multiprocessing BaseManager class. Now for the most part it works great, the only issue I have is that I am using the serve_forever as a blocking statement and then continue onwards however when I want to terminate or exit out of the serve_forever() function(ality) it automatically exits out and terminates the application, but like I mentioned I have some more things I want to take care of before I completely exit out.
I can exit out of serve_forever() by setting a stop event with stop_event.set(). Now this is all well and dandy however according to the source (https://github.com/python/cpython/blob/3.6/Lib/multiprocessing/managers.py#L147) serve_forever explicitly states sys.exit(0) and is part of the Server class that BaseManager uses within it's definition. Essentially I would like to remove that line (sys.exit(0)). How would I accompolish this?
When I search I'm coming up with results such as monkey patching? Can I just Subclass the Server class, explicitly define serve_forever to be the exact same code but without the sys.exit(0) line and call it a day? Something tells me that is not going to work. Do I subclass Server AND BaseManager?
Thanks!
Attempting to monkey-patch or inherit internal classes will result in code that will not be compatible across Python releases, not even patches.
Atop of that, these solutions will be unnecessarily complex and complicated, and are overall frowned upon.
I highly suggest re-implementing serve_forever() by using the start() method together with an event. Waiting for the event to be called or, if impossible, a loop checking if the manager is still alive, will be much easier and a better solution in almost all aspects that I can think of.
After discussing in chat, we realised the easiest approach is to just suppress the SystemExit being thrown from sys.exit(). I'm opening a bug report on CPython bug tracker accordingly to prevent sys.exit(). Do keep in mind the server will not actually shut down as it is run on a different thread. The whole recommendation of using .server().serve_forever() in the stdlib looks dubious at best.
If you wish to immediately shut down the server, call Server.listener.close() after catcing the exception.

NodeJS event equivalent of thread local variables?

I'm experiment with the design of a framework but I'm new to NodeJS and its implementation of event based programming. In other environments I would use thread local variables for what I'm trying to do, is there something equivalent in NodeJS?
Obviously in NodeJS you rarely use threads, but events. What I'm searching for is a global variable that once something set it, will continue to have that value as callbacks are called from the original one that set, but not from others. Does that make sense?
What I'm trying to do is design a library/framework, so, I don't have any code at the moment, but for example, imagine there's a function like this:
async function doSomething() {
await connect()
await doSomethingElse()
await doSomethingElseAgain()
}
I want connect() to store the connection on a thread local variable, so that doSomethingElse() and doSomethingElseAgain() use that same connection, even when there are two threads of doSomething() running (but I know they are not threads, but events running one after another from both executions of doSomething()).
I don't want to pass any parameters from connect() to other functions. I know how to do that already, but breaks the design I'm trying to come up with.
I'm not 100% sure as I'm still exploring, but I believe I have found the solution and it's the package named cls-hooked. I'll post more if I find more one way or another.

Testing background processes in nodejs (using tape)

This is a general question about testing, but I will frame it in the context of Node.js. I'm not as concerned with a particular technology, but it may matter.
In my application, I have several modules that are called upon to do work when my web server receives a request. In the case of some of these modules, I close the request before I call upon them.
What is a good way to test that these modules are doing what they are supposed to do?
The advice here for RSpec is to mock out the work these modules are doing and just ensure that the appropriate methods are being called. This makes sense to me, but in Node.js, since my modules are not global, I don't think I cannot mock out functions without changing my program architecture so that every instance receives instances of objects that it needs1.
[1] This is a well known programming paradigm, but I cannot remember its name right now.
The other option I see is to use setTimeout and take my best guess at when these modules are done with their work.
Neither of these seems ideal.
Am I missing something? Are background processes not tested?
Since you are speaking of integration tests of these background components, a few strategies come to mind.
Take all the asynchronicity out of their operation for test mode. I'm imagining you have some sort of queueing process (that could be a faulty assumption), you toss work into the queue, and then your modules pick up that work and do their task. You could rework your test harness such that the test harness stands in as the queuing mechanism and you effectively get direct control over when the modules execute.
Refactor your modules to take some sort of next callback function. They would end up functioning a bit like Express's middleware layer or how async's each function works, but into each module you'd pass some callback that you call when that module's task is complete. Once all of the modules have reported in, then you can check the state of the program.
Exactly what you already suggested-- wait some amount of time, and if it still isn't done, consider that a failure. Mocha sort of does that, in that if a given test is over a definable threshold, then it's a failure. I don't like this way though, because if you add more tests, they all have to wait the same amount of time.

Nodejs require: once or on demand?

What is currently the best practice for loading models (and goes for all required files I guess)?
I'm thinking these two ways to achieve the solution (nonsense code to illustrate follows):
var Post = require('../models/post');
function findById(id) {
return new Post(id);
}
function party() {
return Post.getParty();
}
vs
function findById(id) {
return new require('../models/post')(id);
}
function party() {
return require('../models/post').getParty();
}
Is one of these snippets preferred? Are there considerable memory and time tradeoffs? Or is it just a premature optimization?
It's a premature optimization (calls to require() are cached and idempotent), but I'd personally call your first style better (loading dependencies during initialization rather than subsequent processing) since it's easier to get your head around what you're doing. Loading everything at the start will slightly slow down your startup (which is hardly ever an issue) in return for making most requests run slightly faster (which you shouldn't worry about unless you've identified a bottleneck and done some hardcore profiling).
You should definititely use the version with the single call to requireat the beginning. Although it does not make any difference regarding to how often the modules are loaded (they are only loaded once however you do it), there are performance issues in the second way of doing it.
The problem is that require is one of only a few functions in Node.js that is blocking. That means, as long as it runs, Node.js is not able to fulfill any incoming requests. On startup this is no problem: It only takes a while until your application is up and running.
But for sure you don't want to have blocking moments while your application is already running for a while.
So, if you do not have VERY special reasons for the second option, go with the first one.
I believe the second case is useful to avoid circular module dependencies, since the require() happens at run-time rather than load time.
Otherwise, I believe the first is (slightly?) faster and to me, quite a bit more readable.

ASIO strand::wrap does it not have to serialize in order?

I'm lost on the distinction between posting using strand::wrap and a strand::post? Seems like both guarantee serialization yet how can you serialize with wrap and not get consistent order? Seems like they both would have to do the same thing. When would I use one over the other?
Here is a little more detail pseudo code:
mystrand(ioservice);
mystrand.post(myhandler1);
mystrand.post(myhandler2);
this guarantees my two handlers are serialized and executed in order even in a thread pool.
Now, how is that different from below?
ioservice->post(mystrand.wrap(myhandler1));
ioservice->post(mystrand.wrap(myhandler2));
Seems like they do the same thing?
Why use one over the other? I see both used and am trying to figure out when
one makes more sense than the other.
wrap creates a callable object which, when called, will call dispatch on a strand. If you don't call the object returned by wrap, nothing much will happen at all. So, calling the result of wrap is like calling dispatch. Now how does that compare to post? According to the documentation, post differs from dispatch in that it does not allow the passed function to be invoked right away, within the same context (stack frame) where post is called.
So wrap and post differ in two ways: the immediacy of their action, and their ability to use the caller's own context to execute the given function.
I got all this by reading the documentation.
This way
mystrand(ioservice);
mystrand.post(myhandler1);
mystrand.post(myhandler2);
myhandler1 is guaranteed by mystrand to be executed before myhandler2
but
ioservice->post(mystrand.wrap(myhandler1));
ioservice->post(mystrand.wrap(myhandler2));
the execution order is the order of executing wrapped handlers, which io_service::post does not guarantee.

Resources