What negative effects should I be concerned about with a indefinitely long running script that is infinitely recursive?
My script looks something like this:
class foo():
def __init__(self):
some_instance_vars = 42
self.a()
def a(self):
manipulate(self.some_instance_vars)
do_io()
self.b()
def b(self):
manipulate_again(self.some_instance_vars)
do_io()
self.c()
def c(self):
manipulate_more(self.some_instance_vars)
do_io()
self.a()
foo = Foo()
Some things I can imagine:
This would break the garbage collection because the previous calls are always in scope.
Python puts something in memory for each loop and it infinitely builds up and eventually creates a memory problem.
Are these things problems? Are there other problems I haven't thought of?
Related
Been looking into some decorators in python3. Below is a code snippet. Why am i required to return the function (fn), when it is called inside the wrapper function
from functools import wraps
def log_function_data(fn):
#wraps(fn)
def wrapper(*args, **kwargs):
print(fn.__name__)
print(fn.__doc__)
return fn(*args, **kwargs) #why am i returning this?
return wrapper
#log_function_data
def add(x,y):
'''Adds 2 numbers and returns'''
return x + y
The add function is already returning the result of the operation. So i call the add function without the decorator the return works like:
def add(x,y):
'''Adds 2 numbers and returns'''
return x + y
result = add(2,3) ##have the result = 5
return fn(*args, **kwargs) # why am i returning this?
Please consult the documentation.
Python interprets your code at distinct times:
once, at import time (when we generate bytecode for functions)
repeatedly, at run time (when you call the function)
You could choose not to return fn, but that would cause an important change to the generated bytecode, resulting in a decorated function which returns the default value of None. Since the generated bytecode wouldn't even bother calling add(), it's hard to call the final result a "decorated" function at all, it's more like a completely different function that ignored the fn passed into it.
To better understand decoratoring, you might find it instructive to play around with dis. I recommend making a one-line change in your source code, and noticing the diff's between the variant bytecodes.
Imagine you know someone that is extremely good at maths. Imagine this girl, Alice, is so good that she is the only person in the world that could solve problem X.
There is a catch here, Alice only speaks French, but the problem X is explained in any randomly picked language and is also expected to be answered in that same language. Sadly, there is no way for her to interpret problem X in any other language than french.
Imagine you are interested in solving problem X. You think about it 2 minutes and come to a solution:
I know Alice, she has the math part and I know Bernard, he can
translate from any language to French and back at perfection. Let's make
them work together to solve problem X!
The algorithm they will use to work is the following:
Bernard reads problem X and translated it in French
Alice solves the French version of X
Bernard translates Alice's answer back in the original language
Return a solution
In python, this look like this:
def main():
insane_translator = Bernard()
def attach_translator(french_solver):
def translation_decorator(X):
french_X = insane_translator.translate(X, input_lang=X.lang, output="fr")
french_solution = french_solver(french_X)
solution = insane_translator.translate(french_solution, input_lang="fr", output=X.lang)
return solution
return translation_decorator
x = X()
solve = attach_translator(Alice.solve)
solution = solve(x)
print(solution)
class Alice:
#staticmethod
def solve(french_X):
# Do some incredible things using french_X, Alice is a genius.
return french_X + " 42"
class Bernard:
def translate(self, text, input_lang=None, output=None):
# Translate from any input language to any output language, Bernard is insane!
return f"from {input_lang} to {output} of: \n\t{text}"
class X:
def __init__(self):
self.lang = "Alien"
def __str__(self):
return "\tProblem definition"
main()
Bernard is used as a translator to "decorate" alice's working power. If you don't call fn that means that you don't ask Alice to work, you are therefore only translating back and forth with no benefit. The goal of the decorator is to add some treatment before and after the decorated function is called, the whole point is to call the function in a different way from what it was originally designed to be.
Suppose I have the following object with multiple expensive properties, as so:
class Object:
def __init__(self, num):
self.num = num
#property
def expensive_property(self):
return expensive_calculation
#property
def expensive_property1(self):
return expensive_calculation
#property
def expensive_property2(self):
return expensive_calculation
Note: The number of expensive properties may increase over time.
Given a list of Objects how could I compute each expensive property per thread, for all objects in the list. I am having a hard time figuring out how I should arrange my pool.
This is kinda what I am trying to achieve:
from multithreading.dummy import Pool
from multithreading.dummy import Queue
object_list = [Object(i) for i in range(20)]
properties = [expensive_property2, expensive_propert5, expensive_property9, expensive_property3]
def get(obj, expensive_property):
return [getattr(expensive_property, o) for o in obj]
tasks = Queue()
for p in properties :
tasks.put((get, o, p))
results = []
with Pool(len(properties )) as pool:
while True:
task = tasks.get()
if task is None:
break
func, *args = task
result = pool.apply_async(func, args)
results.append(result)
This is a little crazy because apply_async has an internal queue to distribute tasks over the pool. I can imagine reasons to have another queue around observability or backpressure. Is your example your full program? or are you enqueuing work from a different process/thread?
If your computation is CPU bound one option could be to remove the queue to make things a little simpler:
def wait_all(async_results, timeout_seconds_per_task=1):
for r in async_results:
r.get(timeout_seconds)
wait_all(
[pool.apply_async(get, (o, p)) for p in properties],
timeout_seconds_per_task=1,
)
Like your example above this allows you to distribute computation across your available cpus (pool even defaults to the number of cpus on your machine). If your work is IO bound (suggested by your sleep) processes may have diminishing returns.
You'd have to benchmark but for IO bound you could create a thread pool using the same pattern https://stackoverflow.com/a/3034000/594589
Other options could be to use nonblocking IO with event loop such as gevent, or asyncio. Both would allow you to model the same pool based pattern!
What I want to achieve is the following:
def foo(self):
i = 4
self.bar(__something_which_holds_foo_variables__)
print(i) # prints 8
print(j) # prints 2
def bar(self, vars):
vars.j = 2
vars.i = vars.i * vars.j
Access all local variables from a method from another method.
Using nested methods and nonlocal variables is not an option, as foo and bar may be in different modules.
Passing variables by names is not an option, too, as bar() may produce other variables which I'd like to use later as locals in foo(). Moreover, potentially bar() may use a lot of variables from foo().
Is it possible to achieve this and why such idea may be good or bad?
Just use (for different modules):
def bar():
global modulename.i
Or if it is in a class then:
def bar():
global modulename.classname.i
This idea could be simpler, because you could just make bar(...) a return definition, and let it return the new i and j in some form of a tuple.
In the generic example below I use Foobar_Collection to manage a dictionary of Foo instances. Additionaly, Foobar_Collection carries a method which will sequentially call myMethod()shared by all insances of Foo. It works fine so far. However, I wonder wether I could take advantage
of multiprocessing, so that run_myMethodForAllfoobars() could divide the work for several chunks of instances? The instance methods are "independent" of each other ( I think this case is called embarrassingly parallel). Any help would be great!
class Foobar_Collection(dict):
def __init__(self, *arg, **kw):
super(Foobar_Collection, self).__init__(*arg,**kw)
def foobar(self,*arg,**kw):
foo = Foo(*arg,**kw)
self[foo.name] = foo
return foo
def run_myMethodForAllfoobars(self):
for name in self:
self[name].myMethod(10)
return None
class Foo(object):
def __init__(self,name):
self.name = name
self.result = 0
# just some toy example method
def myMethod(self,x):
self.result += x
return None
Foobar = Foobar_Collection()
Foobar.foobar('A')
Foobar.foobar('B')
Foobar.foobar('C')
Foobar.run_myMethodForAllfoobars()
You can use multiprocessing for this situation, but it's not great because the method that you're trying to parallelize is useful for its side effects rather than its return value. This means you'll need to serialize the Foo object in both directions (sending it to the child process, then sending the modified version back). If your real objects are more complex than the Foo objects in your example, the overhead of copying all of each the object's data may make this slower than just doing everything in one process.
def worker(foo):
foo.myMethod(10)
return foo
class Foobar_Collection(dict):
#...
def run_myMethodForAllfoobars(self):
with multiprocessing.Pool() as pool:
results = pool.map(worker, self.values())
self.update((foo.name, foo) for foo in results)
A better design might let you only serialize the information you need to do the calculation. In your example, the only thing you need from the Foo object is its result (which you'll add 10 to), which you could extract and process without passing around the rest of the object:
def worker(num):
return num + 10
class Foobar_Collection(dict):
#...
def run_myMethodForAllfoobars(self):
with multiprocessing.Pool() as pool:
results = pool.map(worker, (foo.result for foo in self.values()))
for foo, new_result in zip(self.values(), results):
foo.result = new_result
Now obviously this doesn't actually run myMethod on the foo objects any more (though it's equivalent to doing so). If you can't decouple the method from the object like this, it may be hard to get good performance.
When I use a generator in a for loop, it seems to "know", when there are no more elements yielded. Now, I have to use a generator WITHOUT a for loop, and use next() by hand, to get the next element. My problem is, how do I know, if there are no more elements?
I know only: next() raises an exception (StopIteration), if there is nothing left, BUT isn't an exception a little bit too "heavy" for such a simple problem? Isn't there a method like has_next() or so?
The following lines should make clear, what I mean:
#!/usr/bin/python3
# define a list of some objects
bar = ['abc', 123, None, True, 456.789]
# our primitive generator
def foo(bar):
for b in bar:
yield b
# iterate, using the generator above
print('--- TEST A (for loop) ---')
for baz in foo(bar):
print(baz)
print()
# assign a new iterator to a variable
foobar = foo(bar)
print('--- TEST B (try-except) ---')
while True:
try:
print(foobar.__next__())
except StopIteration:
break
print()
# assign a new iterator to a variable
foobar = foo(bar)
# display generator members
print('--- GENERATOR MEMBERS ---')
print(', '.join(dir(foobar)))
The output is as follows:
--- TEST A (for loop) ---
abc
123
None
True
456.789
--- TEST B (try-except) ---
abc
123
None
True
456.789
--- GENERATOR MEMBERS ---
__class__, __delattr__, __doc__, __eq__, __format__, __ge__, __getattribute__, __gt__, __hash__, __init__, __iter__, __le__, __lt__, __name__, __ne__, __new__, __next__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__, close, gi_code, gi_frame, gi_running, send, throw
Thanks to everybody, and have a nice day! :)
This is a great question. I'll try to show you how we can use Python's introspective abilities and open source to get an answer. We can use the dis module to peek behind the curtain and see how the CPython interpreter implements a for loop over an iterator.
>>> def for_loop(iterable):
... for item in iterable:
... pass # do nothing
...
>>> import dis
>>> dis.dis(for_loop)
2 0 SETUP_LOOP 14 (to 17)
3 LOAD_FAST 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_FAST 1 (item)
3 13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The juicy bit appears to be the FOR_ITER opcode. We can't dive any deeper using dis, so let's look up FOR_ITER in the CPython interpreter's source code. If you poke around, you'll find it in Python/ceval.c; you can view it here. Here's the whole thing:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
Do you see how this works? We try to grab an item from the iterator; if we fail, we check what exception was raised. If it's StopIteration, we clear it and consider the iterator exhausted.
So how does a for loop "just know" when an iterator has been exhausted? Answer: it doesn't -- it has to try and grab an element. But why?
Part of the answer is simplicity. Part of the beauty of implementing iterators is that you only have to define one operation: grab the next element. But more importantly, it makes iterators lazy: they'll only produce the values that they absolutely have to.
Finally, if you are really missing this feature, it's trivial to implement it yourself. Here's an example:
class LookaheadIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.buffer = []
def __iter__(self):
return self
def __next__(self):
if self.buffer:
return self.buffer.pop()
else:
return next(self.iterator)
def has_next(self):
if self.buffer:
return True
try:
self.buffer = [next(self.iterator)]
except StopIteration:
return False
else:
return True
x = LookaheadIterator(range(2))
print(x.has_next())
print(next(x))
print(x.has_next())
print(next(x))
print(x.has_next())
print(next(x))
The two statements you wrote deal with finding the end of the generator in exactly the same way. The for-loop simply calls .next() until the StopIteration exception is raised and then it terminates.
http://docs.python.org/tutorial/classes.html#iterators
As such I don't think waiting for the StopIteration exception is a 'heavy' way to deal with the problem, it's the way that generators are designed to be used.
It is not possible to know beforehand about end-of-iterator in the general case, because arbitrary code may have to run to decide about the end. Buffering elements could help revealing things at costs - but this is rarely useful.
In practice the question arises when one wants to take only one or few elements from an iterator for now, but does not want to write that ugly exception handling code (as indicated in the question). Indeed it is non-pythonic to put the concept "StopIteration" into normal application code. And exception handling on python level is rather time-consuming - particularly when it's just about taking one element.
The pythonic way to handle those situations best is either using for .. break [.. else] like:
for x in iterator:
do_something(x)
break
else:
it_was_exhausted()
or using the builtin next() function with default like
x = next(iterator, default_value)
or using iterator helpers e.g. from itertools module for rewiring things like:
max_3_elements = list(itertools.islice(iterator, 3))
Some iterators however expose a "length hint" (PEP424) :
>>> gen = iter(range(3))
>>> gen.__length_hint__()
3
>>> next(gen)
0
>>> gen.__length_hint__()
2
Note: iterator.__next__() should not be used by normal app code. That's why they renamed it from iterator.next() in Python2. And using next() without default is not much better ...
This may not precisely answer your question, but I found my way here looking to elegantly grab a result from a generator without having to write a try: block. A little googling later I figured this out:
def g():
yield 5
result = next(g(), None)
Now result is either 5 or None, depending on how many times you've called next on the iterator, or depending on whether the generator function returned early instead of yielding.
I strongly prefer handling None as an output over raising for "normal" conditions, so dodging the try/catch here is a big win. If the situation calls for it, there's also an easy place to add a default other than None.