multiprocessing.pool.apply_async: do I need to `get` results? - python-3.x

I am trying to achieve a fire-and-forget call of an output function with can take any time.
The caller is not interested at all whether this function is successful or not. It should just dispatch a "message", and move on.
I am thinking to use a multiprocessing.Pool for this task and "dispatch the messages" with apply_async. Usually one would use get() to get the result, but in my case that would complicate the code slightly. So I am thinking of never actually calling get(). Is that legal, or is this going to cause a cache (with None returns) somewhere to blow up after a while?

Related

Recalculation of exchanges done twice

When using the function ParameterManager.recalculate() to get the actualised values of all the parameterized exchanges of my database, the functions ActivityParameter.recalculate() and ActivityParameter.recalculate_exchanges() are applied to all the parameters groups. But it seems that the function ActivityParameter.recalculate_exchanges() is ran twice because it is also used inside the function ActivityParameter.recalculate(). When deleting one, I get the same results but twice faster (that is what I was looking for because otherwise my calculation is a bit long). Is there a reason for running the function twice ? Is it right to delete one to get faster results ? Would there be a way to reduce the duration of this calculation ?
You are totally right - ParameterManager.recalculate calls both ActivityParameter.recalculate and ActivityParameter.recalculate_exchanges, but ActivityParameter.recalculate already calls ActivityParameter.recalculate_exchanges. This makes things slower, but doesn't break anything.
This duplication has been removed; you can safely do the same.

Writing async javascript functions

I'm fairly familiar with nodejs now, but I have never tried to build a module before. I was curious to a bit abut async functions.
If you are writing a function that just returns a value, if it worth it to make it async for example, should this be written async?:
exports.getFilename = function () {
return filename;
}
Next, when writing a async function, is writing a function with a callback enough for performance, or is it recommended to thread it using a threading library as well?
Sorry for the somewhat obvious question, I noramlly am the one calling these functions
Callbacks and asynchronousness are two separate things though they are related since in javascript callbacks is the only mechanism to allow you to manage control flow in asynchronous code.
Weather or not non-asynchronous functions should accept callbacks depend on what the function does. One example of a type of function that is not asynchronous but is useful to supply a callback is iteration functions. Array.each() is a good example. Javascript doesn't allow you to pass code blocks so you pass functions to your iteration function.
Another example is filter functions that modify incoming data and return the modified version. Array.sort() is a good example. Passing a function to it allows you to apply your own conditions for how the array should be sorted.
Actually, filtering functions have a stronger reason for accepting functions/callbacks since it alters the behavior of the algorithm. Iteration functions are just nice syntactic sugar around for loops and are therefore a bit redundant. Though, they do make code nicer to read.
Weather or not a function should be asynchronous is a different matter. If it does something that takes a long time to compute (like I/O operations or large matrix calculations) then it should be made asynchronous. How long is "long" depends on your own tolerance. Generally for a moderately busy website a request shouldn't take more than 100ms to complete (in other words, you should be able to handle 10 hits per second at minimum). If an operation takes longer than that then you should split it up and make it async otherwise you'll risk making the site unresponsive to other users. For really busy websites you shouldn't tolerate operations that take longer than 10ms.
From the above explanation it should be obvious that just accepting a function or callback as an argument does not make a function asynchronous. The simplest pure-js way to make something async is to use setTimeout to break long calculations. Of course, the operation still happens in the same thread as the main Node process but at least it doesn't block other requests. To utilize multi-core CPUs on your servers you can use one of the threading libraries on NPM or clusters to make your function async.

How to call custom asnyc code to initialize a Node.io Job once (before successive calls to input())?

Just discovered Node.io, gone though the docs, api, etc. and it looks great. However, building my first job exports.job = new nodeio.Job(..), with methods like input, run,output, reduce, complete I'm in need of some kind of initialize() method which is called once, before successive calls to input() are done. (Similar how complete is called once just before the job is finished)
Any such method around?
For completeness:
This code imho has to be part of the node.io flow (through some dedicated method) since initializing my async code outside of the node.io scope doesn't guarentee the data is already there before the node.io job is executed.
I don't know if there is such a method, have you browsed through the source? It looks like there is an 'init' method that is called by the processor if it is on the job. If you try that and it isn't what you're looking for, you could suggest this as a feature on the node.io github site.
Otherwise, this would be a very simple thing to add for yourself. Just add an 'initialize' method to your object, and then put the following lines at the top of your 'input' or 'run' method (which I think would probably work better if you need the data to be ready already):
if (!this.initialized) {
this.initialized = true;
this.initialize();
}
Note that there is a tiny performance hit here, of course. But in most cases, it's only the amount of time it takes to check the value of one variable, which is probably quite minimal compared to the amount of processing you actually need.

How do I make a non-IO operation synchronous vs. asynchronous in node.js?

I know the title sounds like a dupe of a dozen other questions, and it may well be. However, I've read those dozen questions, and Googled around for awhile, and found nothing that answers these questions to my satisfaction.
This might be because nobody has answered it properly, in which case you should vote me up.
This might be because I'm dumb and didn't understand the other answers (much more likely), in which case you should vote me down.
Context:
I know that IO operations in Node.js are detected and made to run asynchronously by default. My question is about non-IO operations that still might block/run for a long time.
Say I have a function blockingfunction with a for loop that does addition or whatnot (pure CPU cycles, no IO), and a lot of it. It takes a minute or more to run.
Say I want this function to run whenever someone makes a certain request to my server.
Question:
Obviously, if I explicitly invoke this loop at the outer level in my code, everything will block until it completes.
Most suggestions I've read suggest pushing it off into the future by starting all of my other handlers/servers etc. first, and deferring invocation of the function via process.nextTick or setTimeout(blockingfunction, 0).
But won't blockingfunction1 then just block on the next spin around the execution loop? I may be wrong, but it seems like doing that would start all of my other stuff without blocking the app, but then the first time someone made the request that results in blockingfunction being called, everything would block for as long as it took to complete.
Does putting blockingfunction inside a setTimeout or process.nextTick call somehow make it coexist with future operations without blocking them?
If not, is there a way to make blockingfunction do that without rewriting it?
How do others handle this problem? A lot of the answers I've seen are to the tune of "just trust your CPU-intensive things to be fast, they will be", but this doesn't satisfy.
Absent threading (where I can be guaranteed that the execution of blockingfunction will be interleaved with the execution of whatever else is going on), should I re-write CPU-intensive/time consuming loops to use process.nextTick to perform a fixed, guaranteed-fast number of iterations per tick?
Yes, you are correct. If you defer your function until the next tick, it will just block in that tick rather than the current one.
Unfortunately, there is no magic here that solves this for you. While it is possible to fire up that function in another process, it might not be worth the hassle, depending on what you're doing.
I recommend re-writing your function in such a way that work happens for a bit, and then continues on the next tick. Node ticks are very efficient... you could call them every iteration of a decent sized loop if needed, without a whole ton of overhead. Of course, you would have to profile it in your code to see what the impact is.
Yes, a blocking function will keep blocking even if you run it process.nextTick.
Some options:
If it truly takes a while, then perhaps it should be spun out to a queue where you can have a dedicated worker process handle it.
1a. Node.js has a child-process flavor specifically for forking other node.js files with a built in communication channel. So e.g. you can create one (or several) thread that handles these requests in order, then responds and hits the callback. See: http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
You can break up the blockingFunction into chunks that run in a loop. Have it call every X iterations with process.nextTick to make way for other events to be handled.

ASIO strand::wrap does it not have to serialize in order?

I'm lost on the distinction between posting using strand::wrap and a strand::post? Seems like both guarantee serialization yet how can you serialize with wrap and not get consistent order? Seems like they both would have to do the same thing. When would I use one over the other?
Here is a little more detail pseudo code:
mystrand(ioservice);
mystrand.post(myhandler1);
mystrand.post(myhandler2);
this guarantees my two handlers are serialized and executed in order even in a thread pool.
Now, how is that different from below?
ioservice->post(mystrand.wrap(myhandler1));
ioservice->post(mystrand.wrap(myhandler2));
Seems like they do the same thing?
Why use one over the other? I see both used and am trying to figure out when
one makes more sense than the other.
wrap creates a callable object which, when called, will call dispatch on a strand. If you don't call the object returned by wrap, nothing much will happen at all. So, calling the result of wrap is like calling dispatch. Now how does that compare to post? According to the documentation, post differs from dispatch in that it does not allow the passed function to be invoked right away, within the same context (stack frame) where post is called.
So wrap and post differ in two ways: the immediacy of their action, and their ability to use the caller's own context to execute the given function.
I got all this by reading the documentation.
This way
mystrand(ioservice);
mystrand.post(myhandler1);
mystrand.post(myhandler2);
myhandler1 is guaranteed by mystrand to be executed before myhandler2
but
ioservice->post(mystrand.wrap(myhandler1));
ioservice->post(mystrand.wrap(myhandler2));
the execution order is the order of executing wrapped handlers, which io_service::post does not guarantee.

Resources