IronPython Multiprocessing Module

IronPython Multiprocessing Module - multithreading

I have some dlls written in Dotnet which I am using to access thousands of binary files. The data is compartmentalized by directories, so as a performance enhancement I thought of using multiple processes or threads to churn through the files.
I have a function, currently part of the main class (requiring self as an argument), this could easily be refactored to a private method.
My first inclination is to use the Multiprocess module, but that doesn't seem to be available for IronPython.
My next thought was to use Task
def __createThreads(self):
tasks = Array.CreateInstance(Task, 5)
for idx in range(0, 5):
tasks.append(Task.Factory.StartNew(self.__doWork, args=(idx,)))
Task.WaitAll(tasks)
def __doWork(self, idx):
for index in range (0, idx):
print "Thread: %d | Index: %d" % (idx, index)
Or to use Thread
def __createThreads(self):
threads = list()
for idx in range(0, 5):
t = Thread(ThreadStart(self.__doWork))
t.Start()
threads.append(t)
while len(threads) > 0:
time.sleep(.05)
for t in threads:
if(not t.IsAlive):
threads.remove(t)
What I cannot find is a IronPython example of how to pass arguements

Please be aware that your two examples are not exactly equivalent. The task version will only create/use actual concurrent threads when the run-time thinks it is a good idea (unless you specify TaskCreationOptions.LongRunning). You have to decide what works for your use-case.
The easiest way to pass the idx argument to the __doWork function would be using a lambda to capture the value and invocation. (please be aware of scoping issues as discussed in this question which also hints at alternative solutions for introducing an intermediate scope)
tasks.append(Task.Factory.StartNew(lambda idx = idx: self.__doWork(idx)))
As a side-note: You will have to convert your task list to an array in order for Task.WaitAll to be happy.

Related

About function re-compilation each time we call jax.jit

I am a newbie to jax. When I'm reading to the documentation, I'm confused about the caching behavior of jit.
In the caching section, it says that "Avoid calling jax.jit inside loops. Doing that effectively creates a new f at each call, which will get compiled each time instead of reusing the same cached function". However, running the following code only produces one printing side effect:
import jax
def unjitted_loop_body(prev_i):
print("tracing...")
return prev_i + 1
def g_inner_jitted_poorly(x, n):
i = 0
while i < n:
# Don't do this!
i = jax.jit(unjitted_loop_body)(i)
return x + i
g_inner_jitted_poorly(10, 20)
# output:
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
tracing...
Out[1]: DeviceArray(30, dtype=int32)
The string "tracing..." is only printed once, and it seems that jit does not trace the function again.
Is this intended? Thanks for any help!

Your example is behaving as expected; I suspect it may have been referencing an older version of the docs.
In the current version of the docs you link to the example is slightly different; rather than jit-compiling the function, it jit-compiles a temporary function wrapper:
def g_inner_jitted_partial(x, n):
i = 0
while i < n:
# Don't do this! each time the partial returns
# a function with different hash
i = jax.jit(partial(unjitted_loop_body))(i)
return x + i
If you do this, it will cause re-compiles every time, because partial(unjitted_loop_body) is a new function object in every loop, and JIT cacheing is based on the id of the function object.
Using the approach in your question is fine, because the function passed to jit is a persistent global function, whose id is the same every iteration:
def g_inner_jitted_normal(x, n):
i = 0
while i < n:
# this is OK, since JAX can find the
# cached, compiled function
i = jax.jit(unjitted_loop_body)(i)
return x + i

How to multithread with getattr, one thread per property?

Suppose I have the following object with multiple expensive properties, as so:
class Object:
def __init__(self, num):
self.num = num
#property
def expensive_property(self):
return expensive_calculation
#property
def expensive_property1(self):
return expensive_calculation
#property
def expensive_property2(self):
return expensive_calculation
Note: The number of expensive properties may increase over time.
Given a list of Objects how could I compute each expensive property per thread, for all objects in the list. I am having a hard time figuring out how I should arrange my pool.
This is kinda what I am trying to achieve:
from multithreading.dummy import Pool
from multithreading.dummy import Queue
object_list = [Object(i) for i in range(20)]
properties = [expensive_property2, expensive_propert5, expensive_property9, expensive_property3]
def get(obj, expensive_property):
return [getattr(expensive_property, o) for o in obj]
tasks = Queue()
for p in properties :
tasks.put((get, o, p))
results = []
with Pool(len(properties )) as pool:
while True:
task = tasks.get()
if task is None:
break
func, *args = task
result = pool.apply_async(func, args)
results.append(result)

This is a little crazy because apply_async has an internal queue to distribute tasks over the pool. I can imagine reasons to have another queue around observability or backpressure. Is your example your full program? or are you enqueuing work from a different process/thread?
If your computation is CPU bound one option could be to remove the queue to make things a little simpler:
def wait_all(async_results, timeout_seconds_per_task=1):
for r in async_results:
r.get(timeout_seconds)
wait_all(
[pool.apply_async(get, (o, p)) for p in properties],
timeout_seconds_per_task=1,
)
Like your example above this allows you to distribute computation across your available cpus (pool even defaults to the number of cpus on your machine). If your work is IO bound (suggested by your sleep) processes may have diminishing returns.
You'd have to benchmark but for IO bound you could create a thread pool using the same pattern https://stackoverflow.com/a/3034000/594589
Other options could be to use nonblocking IO with event loop such as gevent, or asyncio. Both would allow you to model the same pool based pattern!

Readable, controllable iterators?

I'm trying to craft an LL(1) parser for a deterministic context-free grammar. One of the things I'd like to be able to use, because it would enable much simpler, less greedy and more maintainable parsing of literal records like numbers, strings, comments and quotations is k tokens of lookahead, instead of just 1 token of lookahead.
Currently, my solution (which works but which I feel is suboptimal) is like (but not) the following:
for idx, tok in enumerate(toklist):
if tok == "blah":
do(stuff)
elif tok == "notblah":
try:
toklist[idx + 1]
except:
whatever()
else:
something(else)
(You can see my actual, much larger implementation at the link above.)
Sometimes, like if the parser finds the beginning of a string or block comment, it would be nice to "jump" the iterator's current counter, such that many indices in the iterator would be skipped.
This can in theory be done with (for example) idx += idx - toklist[idx+1:].index(COMMENT), however in practice, each time the loop repeats, the idx and obj are reinitialised with toklist.next(), overwriting any changes to the variables.
The obvious solution is a while True: or while i < len(toklist): ... i += 1, but there are a few glaring problems with those:
Using while on an iterator like a list is really C-like and really not Pythonic, besides the fact it's horrendously unreadable and unclear compared to an enumerate on the iterator. (Also, for while True:, which may sometimes be desirable, you have to deal with list index out of range.)
For each cycle of the while, there are two ways to get the current token:
using toklist[i] everywhere (ugly, when you could just iterate)
assigning toklist[i] to a shorter, more readable, less typo-vulnerable name each cycle. this has the disadvantage of hogging memory and being slow and inefficient.
Perhaps it can be argued that a while loop is what I should use, but I think while loops are for doing things until a condition is no longer true, and for loops are for iterating and looping finitely over an iterator, and a(n iterative LL) parser should clearly implement the latter.
Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?
This is not a dupe of this because all those answers use complicated, unreadable while loops, which is what I don't want.

Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?
No, there isn't. You could implement your own iterator type though; it wouldn't operate at the same speed (being implemented in Python), but it's doable. For example:
from collections.abc import Iterator
class SequenceIterator(Iterator):
def __init__(self, seq):
self.seq = seq
self.idx = 0
def __next__(self):
try:
ret = self.seq[self.idx]
except IndexError:
raise StopIteration
else:
self.idx += 1
return ret
def seek(self, offset):
self.idx += offset
To use it, you'd do something like:
# Created outside for loop so you have name to call seek on
myseqiter = SequenceIterator(myseq)
for x in myseqiter:
if test(x):
# do stuff with x
else:
# Seek somehow, e.g.
myseqiter.seek(1) # Skips the next value
Adding behaviors like providing the index as well as value is left as an exercise.

asyncio with map&reduce flavor and without flooding the event loop

I am trying to use asyncio in real applications and it doesn't go that
easy, a help of asyncio gurus is needed badly.
Tasks that spawn other tasks without flooding event loop (Success!)
Consider a task like crawling the web starting from some "seeding" web-pages. Each
web-page leads to generation of new downloading tasks in exponential(!)
progression. However we don't want neither to flood the event loop nor to
overload our network. We'd like to control the task flow. This is what I
achieve well with modification of nice Maxime's solution proposed here:
https://mail.python.org/pipermail/python-list/2014-July/687823.html
map & reduce (Fail)
Well, but I'd need as well a very natural thing, kind of map() & reduce()
or functools.reduce() if we are on python3 already. That is, I'd need to
call a "summarizing" function for all the downloading tasks completed on
links from a page. This is where i fail :(
I'd propose an oversimplified but still a nice test to model the use case:
Let's use fibonacci function implementation in its ineffective form.
That is, let the coro_sum() be applied in reduce() and coro_fib be what we apply with
map(). Something like this:
#asyncio.coroutine
def coro_sum(x):
return sum(x)
#asyncio.coroutine
def coro_fib(x):
if x < 2:
return 1
res_coro =
executor_pool.spawn_task_when_arg_list_of_coros_ready(coro=coro_sum,
arg_coro_list=[coro_fib(x - 1), coro_fib(x - 2)])
return res_coro
So that we could run the following tests.
Test #1 on one worker:
executor_pool = ExecutorPool(workers=1)
executor_pool.as_completed( coro_fib(x) for x in range(20) )
Test #2 on two workers:
executor_pool = ExecutorPool(workers=2)
executor_pool.as_completed( coro_fib(x) for x in range(20) )
It would be very important that both each coro_fib() and coro_sum()
invocations are done via a Task on some worker, not just spawned implicitly
and unmanaged!
It would be cool to find asyncio gurus interested in this very natural goal.
Your help and ideas would be very much appreciated.
best regards
Valery

There are multiple ways to compute fibonacci series asynchroniously. First, check that the explosive variant fails in your case:
#asyncio.coroutine
def coro_sum(summands):
return sum(summands)
#asyncio.coroutine
def coro_fib(n):
if n == 0: s = 0
elif n == 1: s = 1
else:
summands, _ = yield from asyncio.wait([coro_fib(n-2), coro_fib(n-1)])
s = yield from coro_sum(f.result() for f in summands)
return s
You could replace summands with:
a = yield from coro_fib(n-2) # don't return until its ready
b = yield from coro_fib(n-1)
s = yield from coro_sum([a, b])
In general, to prevent the exponential growth, you could use asyncio.Queue (synchronization via communication), asyncio.Semaphore (synchonization using mutex) primitives.

Python 3.x: Test if generator has elements remaining

When I use a generator in a for loop, it seems to "know", when there are no more elements yielded. Now, I have to use a generator WITHOUT a for loop, and use next() by hand, to get the next element. My problem is, how do I know, if there are no more elements?
I know only: next() raises an exception (StopIteration), if there is nothing left, BUT isn't an exception a little bit too "heavy" for such a simple problem? Isn't there a method like has_next() or so?
The following lines should make clear, what I mean:
#!/usr/bin/python3
# define a list of some objects
bar = ['abc', 123, None, True, 456.789]
# our primitive generator
def foo(bar):
for b in bar:
yield b
# iterate, using the generator above
print('--- TEST A (for loop) ---')
for baz in foo(bar):
print(baz)
print()
# assign a new iterator to a variable
foobar = foo(bar)
print('--- TEST B (try-except) ---')
while True:
try:
print(foobar.__next__())
except StopIteration:
break
print()
# assign a new iterator to a variable
foobar = foo(bar)
# display generator members
print('--- GENERATOR MEMBERS ---')
print(', '.join(dir(foobar)))
The output is as follows:
--- TEST A (for loop) ---
abc
123
None
True
456.789
--- TEST B (try-except) ---
abc
123
None
True
456.789
--- GENERATOR MEMBERS ---
__class__, __delattr__, __doc__, __eq__, __format__, __ge__, __getattribute__, __gt__, __hash__, __init__, __iter__, __le__, __lt__, __name__, __ne__, __new__, __next__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__, close, gi_code, gi_frame, gi_running, send, throw
Thanks to everybody, and have a nice day! :)

This is a great question. I'll try to show you how we can use Python's introspective abilities and open source to get an answer. We can use the dis module to peek behind the curtain and see how the CPython interpreter implements a for loop over an iterator.
>>> def for_loop(iterable):
... for item in iterable:
... pass # do nothing
...
>>> import dis
>>> dis.dis(for_loop)
2 0 SETUP_LOOP 14 (to 17)
3 LOAD_FAST 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_FAST 1 (item)
3 13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The juicy bit appears to be the FOR_ITER opcode. We can't dive any deeper using dis, so let's look up FOR_ITER in the CPython interpreter's source code. If you poke around, you'll find it in Python/ceval.c; you can view it here. Here's the whole thing:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
Do you see how this works? We try to grab an item from the iterator; if we fail, we check what exception was raised. If it's StopIteration, we clear it and consider the iterator exhausted.
So how does a for loop "just know" when an iterator has been exhausted? Answer: it doesn't -- it has to try and grab an element. But why?
Part of the answer is simplicity. Part of the beauty of implementing iterators is that you only have to define one operation: grab the next element. But more importantly, it makes iterators lazy: they'll only produce the values that they absolutely have to.
Finally, if you are really missing this feature, it's trivial to implement it yourself. Here's an example:
class LookaheadIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.buffer = []
def __iter__(self):
return self
def __next__(self):
if self.buffer:
return self.buffer.pop()
else:
return next(self.iterator)
def has_next(self):
if self.buffer:
return True
try:
self.buffer = [next(self.iterator)]
except StopIteration:
return False
else:
return True
x = LookaheadIterator(range(2))
print(x.has_next())
print(next(x))
print(x.has_next())
print(next(x))
print(x.has_next())
print(next(x))

The two statements you wrote deal with finding the end of the generator in exactly the same way. The for-loop simply calls .next() until the StopIteration exception is raised and then it terminates.
http://docs.python.org/tutorial/classes.html#iterators
As such I don't think waiting for the StopIteration exception is a 'heavy' way to deal with the problem, it's the way that generators are designed to be used.

It is not possible to know beforehand about end-of-iterator in the general case, because arbitrary code may have to run to decide about the end. Buffering elements could help revealing things at costs - but this is rarely useful.
In practice the question arises when one wants to take only one or few elements from an iterator for now, but does not want to write that ugly exception handling code (as indicated in the question). Indeed it is non-pythonic to put the concept "StopIteration" into normal application code. And exception handling on python level is rather time-consuming - particularly when it's just about taking one element.
The pythonic way to handle those situations best is either using for .. break [.. else] like:
for x in iterator:
do_something(x)
break
else:
it_was_exhausted()
or using the builtin next() function with default like
x = next(iterator, default_value)
or using iterator helpers e.g. from itertools module for rewiring things like:
max_3_elements = list(itertools.islice(iterator, 3))
Some iterators however expose a "length hint" (PEP424) :
>>> gen = iter(range(3))
>>> gen.__length_hint__()
3
>>> next(gen)
0
>>> gen.__length_hint__()
2
Note: iterator.__next__() should not be used by normal app code. That's why they renamed it from iterator.next() in Python2. And using next() without default is not much better ...

This may not precisely answer your question, but I found my way here looking to elegantly grab a result from a generator without having to write a try: block. A little googling later I figured this out:
def g():
yield 5
result = next(g(), None)
Now result is either 5 or None, depending on how many times you've called next on the iterator, or depending on whether the generator function returned early instead of yielding.
I strongly prefer handling None as an output over raising for "normal" conditions, so dodging the try/catch here is a big win. If the situation calls for it, there's also an easy place to add a default other than None.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

IronPython Multiprocessing Module - multithreading

Related

About function re-compilation each time we call jax.jit

How to multithread with getattr, one thread per property?

Readable, controllable iterators?

asyncio with map&reduce flavor and without flooding the event loop

Python 3.x: Test if generator has elements remaining

Categories

Resources