How to elegantly define a generator producing an empty sequence [duplicate] - python-3.x

A generator function can be defined by putting the yield keyword in the function’s body:
def gen():
for i in range(10):
yield i
How to define an empty generator function?
The following code doesn’t work, since Python cannot know that it is supposed to be a generator function instead of a normal function:
def empty():
pass
I could do something like this:
def empty():
if False:
yield
But that would be very ugly. Is there a nicer way?

You can use return once in a generator; it stops iteration without yielding anything, and thus provides an explicit alternative to letting the function run out of scope. So use yield to turn the function into a generator, but precede it with return to terminate the generator before yielding anything.
>>> def f():
... return
... yield
...
>>> list(f())
[]
I'm not sure it's that much better than what you have -- it just replaces a no-op if statement with a no-op yield statement. But it is more idiomatic. Note that just using yield doesn't work.
>>> def f():
... yield
...
>>> list(f())
[None]
Why not just use iter(())?
This question asks specifically about an empty generator function. For that reason, I take it to be a question about the internal consistency of Python's syntax, rather than a question about the best way to create an empty iterator in general.
If question is actually about the best way to create an empty iterator, then you might agree with Zectbumo about using iter(()) instead. However, it's important to observe that iter(()) doesn't return a function! It directly returns an empty iterable. Suppose you're working with an API that expects a callable that returns an iterable each time it's called, just like an ordinary generator function. You'll have to do something like this:
def empty():
return iter(())
(Credit should go to Unutbu for giving the first correct version of this answer.)
Now, you may find the above clearer, but I can imagine situations in which it would be less clear. Consider this example of a long list of (contrived) generator function definitions:
def zeros():
while True:
yield 0
def ones():
while True:
yield 1
...
At the end of that long list, I'd rather see something with a yield in it, like this:
def empty():
return
yield
or, in Python 3.3 and above (as suggested by DSM), this:
def empty():
yield from ()
The presence of the yield keyword makes it clear at the briefest glance that this is just another generator function, exactly like all the others. It takes a bit more time to see that the iter(()) version is doing the same thing.
It's a subtle difference, but I honestly think the yield-based functions are more readable and maintainable.
See also this great answer from user3840170 that uses dis to show another reason why this approach is preferable: it emits the fewest instructions when compiled.

iter(())
You don't require a generator. C'mon guys!

Python 3.3 (because I'm on a yield from kick, and because #senderle stole my first thought):
>>> def f():
... yield from ()
...
>>> list(f())
[]
But I have to admit, I'm having a hard time coming up with a use case for this for which iter([]) or (x)range(0) wouldn't work equally well.

Another option is:
(_ for _ in ())

Like #senderle said, use this:
def empty():
return
yield
I’m writing this answer mostly to share another justification for it.
One reason for choosing this solution above the others is that it is optimal as far as the interpreter is concerned.
>>> import dis
>>> def empty_yield_from():
... yield from ()
...
>>> def empty_iter():
... return iter(())
...
>>> def empty_return():
... return
... yield
...
>>> def noop():
... pass
...
>>> dis.dis(empty_yield_from)
2 0 LOAD_CONST 1 (())
2 GET_YIELD_FROM_ITER
4 LOAD_CONST 0 (None)
6 YIELD_FROM
8 POP_TOP
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
>>> dis.dis(empty_iter)
2 0 LOAD_GLOBAL 0 (iter)
2 LOAD_CONST 1 (())
4 CALL_FUNCTION 1
6 RETURN_VALUE
>>> dis.dis(empty_return)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
>>> dis.dis(noop)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
As we can see, the empty_return has exactly the same bytecode as a regular empty function; the rest perform a number of other operations that don’t change the behaviour anyway. The only difference between empty_return and noop is that the former has the generator flag set:
>>> dis.show_code(noop)
Name: noop
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
>>> dis.show_code(empty_return)
Name: empty_return
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, GENERATOR, NOFREE
Constants:
0: None
The above disassembly is outdated as of CPython 3.11, but empty_return still comes out on top, with only two more opcodes (four bytes) than a no-op function:
>>> dis.dis(empty_yield_from)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 1 (())
8 GET_YIELD_FROM_ITER
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE
16 RESUME 2
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>> dis.dis(empty_iter)
1 0 RESUME 0
2 2 LOAD_GLOBAL 1 (NULL + iter)
14 LOAD_CONST 1 (())
16 PRECALL 1
20 CALL 1
30 RETURN_VALUE
>>> dis.dis(empty_return)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 0 (None)
8 RETURN_VALUE
>>> dis.dis(noop)
1 0 RESUME 0
2 2 LOAD_CONST 0 (None)
4 RETURN_VALUE
Of course, the strength of this argument is very dependent on the particular implementation of Python in use; a sufficiently smart alternative interpreter may notice that the other operations amount to nothing useful and optimise them out. However, even if such optimisations are present, they require the interpreter to spend time performing them and to safeguard against optimisation assumptions being broken, like the iter identifier at global scope being rebound to something else (even though that would most likely indicate a bug if it actually happened). In the case of empty_return there is simply nothing to optimise, as bytecode generation stops after a return statement, so even the relatively naïve CPython will not waste time on any spurious operations.

Must it be a generator function? If not, how about
def f():
return iter(())

The "standard" way to make an empty iterator appears to be iter([]).
I suggested to make [] the default argument to iter(); this was rejected with good arguments, see http://bugs.python.org/issue25215
- Jurjen

I want to give a class based example since we haven't had any suggested yet. This is a callable iterator that generates no items. I believe this is a straightforward and descriptive way to solve the issue.
class EmptyGenerator:
def __iter__(self):
return self
def __next__(self):
raise StopIteration
>>> list(EmptyGenerator())
[]

generator = (item for item in [])

Nobody has mentioned it yet, but calling the built-in function zip with no arguments returns an empty iterator:
>>> it = zip()
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Related

In Python, what is the purpose of __length_hint__ of the set_iterator class? [duplicate]

class Foo:
def __getitem__(self, item):
print('getitem', item)
if item == 6:
raise IndexError
return item**2
def __len__(self):
print('len')
return 3
class Bar:
def __iter__(self):
print('iter')
return iter([3, 5, 42, 69])
def __len__(self):
print('len')
return 3
Demo:
>>> list(Foo())
len
getitem 0
getitem 1
getitem 2
getitem 3
getitem 4
getitem 5
getitem 6
[0, 1, 4, 9, 16, 25]
>>> list(Bar())
iter
len
[3, 5, 42, 69]
Why does list call __len__? It doesn't seem to use the result for anything obvious. A for loop doesn't do it. This isn't mentioned anywhere in the iterator protocol, which just talks about __iter__ and __next__.
Is this Python reserving space for the list in advance, or something clever like that?
(CPython 3.6.0 on Linux)
See the Rationale section from PEP 424 that introduced __length_hint__ and offers insight on the motivation:
Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__ , can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.
In addition to that, the documentation for object.__length_hint__ verifies the fact that this is purely an optimization feature:
Called to implement operator.length_hint(). Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer >= 0. This method is purely an optimization and is never required for correctness.
So __length_hint__ is here because it can result in some nice optimizations.
PyObject_LengthHint, first tries to get a value from object.__len__ (if it is defined) and then tries to see if object.__length_hint__ is available. If neither is there, it returns a default value of 8 for lists.
listextend, which is called from list_init as Eli stated in his answer, was modified according to this PEP to offer this optimization for anything that defines either a __len__ or a __length_hint__.
list isn't the only one that benefits from this, of course, bytes objects do:
>>> bytes(Foo())
len
getitem 0
...
b'\x00\x01\x04\t\x10\x19'
so do bytearray objects but, only when you extend them:
>>> bytearray().extend(Foo())
len
getitem 0
...
and tuple objects which create an intermediary sequence to populate themselves:
>>> tuple(Foo())
len
getitem 0
...
(0, 1, 4, 9, 16, 25)
If anybody is wandering why exactly 'iter' is printed before 'len' in class Bar and not after as happens with class Foo:
This is because if the object in hand defines an __iter__ Python will first call it to get the iterator, thereby running the print('iter') too. The same doesn't happen if it falls back to using __getitem__.
list is a list object constructor that will allocate an initial slice of memory for its contents. The list constructor attempts to figure out a good size for that initial slice of memory by checking the length hint or the length of any object passed into the constructor . See the call to PyObject_LengthHint in the Python source here. This place is called from the list constructor -- list_init
If your object has no __len__ or __length_hint__, that's OK -- a default value of 8 is used; it just may be less efficient due to reallocations.
Note: I prepared the answer for [SO]: Why __len__ is called and the result is not used when iterating with __getitem__?, which was marked as a dupe (as it's exactly this question) while I was writing it, so it was no longer possible to post it there, and since I already had it, I decided to post it here (with small adjustments).
Here's a modified version of your code that makes things a bit clearer.
code00.py:
#!/usr/bin/env python3
import sys
class Foo:
def __getitem__(self, item):
print("{0:s}.{1:s}: {2:d}".format(self.__class__.__name__, "getitem", item))
if item == 6:
raise IndexError
return item ** 2
class Bar:
def __iter__(self):
print("{0:s}.{1:s}".format(self.__class__.__name__, "iter"))
return iter([3, 5, 42, 69])
def __len__(self):
result = 3
print("{0:s}.{1:s}: {2:d}".format(self.__class__.__name__, "len", result))
return result
def main():
print("Start ...\n")
for class_obj in [Foo, Bar]:
inst_obj = class_obj()
print("Created {0:s} instance".format(class_obj.__name__))
list_obj = list(inst_obj)
print("Converted instance to list")
print("{0:s}: {1:}\n".format(class_obj.__name__, list_obj))
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main()
print("\nDone.")
Output:
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q041474829]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32
Start ...
Created Foo instance
Foo.getitem: 0
Foo.getitem: 1
Foo.getitem: 2
Foo.getitem: 3
Foo.getitem: 4
Foo.getitem: 5
Foo.getitem: 6
Converted instance to list
Foo: [0, 1, 4, 9, 16, 25]
Created Bar instance
Bar.iter
Bar.len: 3
Converted instance to list
Bar: [3, 5, 42, 69]
Done.
As seen, __len__ is called when the list is constructed. Browsing [GitHub]: python/cpython - (master) cpython/Objects/listobject.c:
list___init__ (which is the initializer: __init__ (tp_init member in PyList_Type)) calls list___init___impl
list___init___impl calls list_extend
list_extend calls PyObject_LengthHint (n = PyObject_LengthHint(iterable, 8);)
PyObject_LengthHint (in abstract.c), does the check:
Py_ssize_t
PyObject_LengthHint(PyObject *o, Py_ssize_t defaultvalue)
// ...
if (_PyObject_HasLen(o)) {
res = PyObject_Length(o);
// ...
So, it's an optimization feature that works for iterables that define __len__.
This is particularly handy when the iterable has a large number of elements, so that they are allocated at once, and therefore skip the list growth mechanism (didn't check if still applies, but at one point, it was): "Space increases by ~12.5% when full" (according to David M. Beazley). It is very useful when lists were constructed out of (other) lists or tuples. For example, constructing a list from an iterable (that doesn't define __len__) with 1000 elements, instead of allocating everything at once, there will be ~41 (log1.125(1000 / 8)) operations (allocation, data shifting, deallocation) required only for increasing the new list as it gets filled (with elements from the source iterable).
Needless to say that for "modern" iterables, the improvement no longer applies.

In Python: How to deal with zero at first position in condition

I found some strange behavior in python. Possibly my logic is not correct.
1 and 2 and 3 in range(5)
Expected: True
Outcome: True
2 and 1 and 99 in range(5)
Expected: False
Outcome False
2 and 1 and 0 in range(5)
Expected: True
Outcome: True
Now the tricky one:
0 and 1 and 2 in range(5)
Expected: True
Outcome: 0
I am sure there is someone who makes me find my logical error.
In each expression, only the last number is checked against the range. The previous ones are evaluated "as is". In python, the expression if i: evaluates to True if i is not 0.
The value returned from the expressions (boolean or int) depends on what the conditions are. If you leave just 1 and 2 for example, the result will be the last int. However, since you have the v in range(n) expression, which returns True or False, the result is cast into a boolean value.
Now, due to short-circuit evaluation, in the last case, only the zero gets evaluated. So the result is not cast into a boolean and 0 is returned.
Edit: After reading the comments, it becomes clear that you want to check if k number exist in range(n). For that, you cannot use the simple expressions you've shown. You need to check if every individual value exists in the range. One - inefficient - approach would be this
if all([v in range(n) for v in values]):
print("All values exist in the range")
Edit 2 (by #Pranav Hosangadi)
Side note:
Since the all() function takes generator expressions, you can avoid the list-comprehension altogether. When you do this, the generator expression will only calculate as many items as needed for the all() to short-circuit. On the other hand, the list-comprehension approach will calculate all elements in the list, and then run all() on that list. Here's a simple example:
l = [100] * 10000
l[-1] = 0
def f1(): # Using generator
return all(li == 0 for li in l)
def f2(): # Using list comp
return all([li == 0 for li in l])
Now for the generator-expression approach, all() needs to calculate only the first element to know that it will short-circuit to False. However, the list-comprehension approach calculates all elements of the list first. All but the last element of this list are False. Then, all() takes this list and short-circuits at the first element. Running some timing on these functions:
import timeit
timeit.timeit('f1()', setup='from __main__ import f1, l', number=10000)
# Out: 0.006381300001521595
timeit.timeit('f2()', setup='from __main__ import f2, l', number=10000)
# Out: 5.257489699986763
f1() is significantly faster than f2().

Eulclid's Algorithm : Python code returns None at the end of recursion [duplicate]

This question already has answers here:
Why does my recursive function return None?
(4 answers)
Closed 3 years ago.
I am trying to implement Euclid's algorithm for computing the greatest common divisor using recursion. Below is the code for the same.
def euclid(a,b):
if a>b:
r=a%b
if (r == 0):
return b
else:
a=b
b=r
euclid(a,b)
print(euclid(20,4)) # Returns 4
print(euclid(20,8)) # Returns None
For the first data set, I get the correct result. But for euclid(20,8) I get a return of None.
While checking in the debugger, I did see the return value of b become 4 but then for some reason, my code jumps to euclid(a,b) and returns None.
The major takeaway here would be to understand why the code does not return 4 but jumps to the euclid(a,b) and return None.
Please refrain from giving alternative code solutions but you are very much encouraged to point out the reason for the current behaviour of the code.
The reason for that code to return None at some point is that in your control flow you eventually end up in a situation where there is no return statement, e.g. for a <= b in your first if or when r != 0 in your second if. The default behavior of Python in that case is to return None, as you seems to have discovered the hard way.
def euclid(a,b):
if a>b:
r=a%b
if (r == 0):
return b
else: # <--- no `return` in this branch!
a=b
b=r
euclid(a,b)
# else: # <--- no `return` in this branch!
# ...
Here is an updated version of your code, that addresses that and also a number of other issues:
the name of the function should be meaningful: Euclid is kind of popular and there is a bunch of things named after him, better specify that you actually want to compute the greatest common divisor (GCD)
the first if is not really needed, as it is taken care of by the subsequent code: computing r and the recursive call will take care of swapping a and b if a < b, and if a == b then r == 0, so you b is being returned directly
a = b and b = r are useless, do not use them, just have your recursive call use b and r directly
the second if can be written more clearly in one line
def gcd_euclid_recursive(a, b):
r = a % b
return gcd_euclid_recursive(b, r) if r else b
gcd_euclid_recursive(120, 72)
# 24
gcd_euclid_recursive(64, 120)
# 8
You don't actually return anything in the else path so it just assumes you know best and returns None for you.
The only reason you get 4 for print(euclid(20,4)) is because 4 is a factor of 20 so never uses the else path. Instead it returns b immediately. For anything else, the thing returned from the first recursive call to euclid() will always be None (even if a lower call to that function returns something, you throw it away when returning from the first call).
You need return euclid(a,b) rather than just euclid(a,b).
As an aside (this isn't necessary to answer your specific question but it goes a long way toward improving the implementation of the code), I'm not a big fan of the if something then return else … construct since the else is totally superfluous (if it didn't return, the else bit is automatic).
Additionally, you don't need to assign variables when you can just change what gets passed to the next recursive level.
Taking both those into account (and simplifying quite a bit), you can do something like:
def euclid(a, b):
if b == 0: return a
return euclid(b, a % b)
Your code has an indentation error and one path is missing in your code(r!=0). It should be
def euclid(a,b):
if a>=b:
r=a%b
if (r == 0):
return b
return euclid(b, r)
else:
return euclid(b, a)

Subclassing int class in Python

I want to do something every time I add two integers in my TestClass.
import builtins
class myInt(int):
def __add__(self, other):
print("Do something")
class TestClass:
def __init__(self):
builtins.int = myInt
def testMethod(self):
a = 1
b = 2
c = a + b
When I call my testMethod nothing happens, however if I define it like this I get the desired effect:
def testMethod(self):
a = int(1)
b = 2
c = a + b
Is it possible to make this work for all int literals without having to typecast them before the operations?
Sorry, it's not possible without building your own custom interpreter. Literal objects aren't constructed by calling the constructor in __builtins__, they are constructed using opcodes that directly call the builtin types.
Also immutable literals are constructed when the code is compiled, so you were too late anyway. If you disassemble testMethod you'll see it simply uses the constants that were compiled, it doesn't attempt to construct them:
>>> dis.dis(TestClass.testMethod)
5 0 LOAD_CONST 1 (1)
2 STORE_FAST 1 (a)
6 4 LOAD_CONST 2 (2)
6 STORE_FAST 2 (b)
7 8 LOAD_FAST 1 (a)
10 LOAD_FAST 2 (b)
12 BINARY_ADD
14 STORE_FAST 3 (c)
16 LOAD_CONST 0 (None)
18 RETURN_VALUE
Mutable literals are constructed at runtime but they use opcodes to construct the appropriate value rather than calling the type:
>>> dis.dis(lambda: {'a': 1, 'b': 2})
1 0 LOAD_CONST 1 (1)
2 LOAD_CONST 2 (2)
4 LOAD_CONST 3 (('a', 'b'))
6 BUILD_CONST_KEY_MAP 2
8 RETURN_VALUE
You could do something along the lines of what you want by parsing the source code (use builtin compile() with ast.PyCF_ONLY_AST flag) then walking the parse tree and replacing int literals with a call to your own type (use ast.NodeTransformer). Then all you have to do is finish the compilation (use compile() again). You could even do that with an import hook so it happens automatically when your module is imported, but it will be messy.

Context manager inside list comprehension inside class [duplicate]

How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:
class Foo:
x = 5
y = [x for i in range(1)]
Python 3.2 gives the error:
NameError: global name 'x' is not defined
Trying Foo.x doesn't work either. Any ideas on how to do this in Python 3?
A slightly more complicated motivating example:
from collections import namedtuple
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for args in [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
]]
In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.
Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.
The why; or, the official word on this
In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see List comprehension rebinds names even after scope of comprehension. Is this right?). That's great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.
This is documented in pep 227:
Names in class scope are not accessible. Names are resolved in
the innermost enclosing function scope. If a class definition
occurs in a chain of nested scopes, the resolution process skips
class definitions.
and in the class compound statement documentation:
The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.
Emphasis mine; the execution frame is the temporary scope.
Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.
Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.
# Same error, in Python 2 or 3
y = {x: x for i in range(1)}
The (small) exception; or, why one part may still work
There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it's the range(1):
y = [x for i in range(1)]
# ^^^^^^^^
Thus, using x in that expression would not throw an error:
# Runs fine
y = [i for i in range(x)]
This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension's scope:
# NameError
y = [i for i in range(1) for j in range(x)]
# ^^^^^^^^^^^^^^^^^ -----------------
# outer loop inner, nested loop
This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.
Looking under the hood; or, way more detail than you ever wanted
You can see this all in action using the dis module. I'm using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.
To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:
>>> import dis
>>> def foo():
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo)
2 0 LOAD_BUILD_CLASS
1 LOAD_CONST 1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>)
4 LOAD_CONST 2 ('Foo')
7 MAKE_FUNCTION 0
10 LOAD_CONST 2 ('Foo')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_FAST 0 (Foo)
5 19 LOAD_FAST 0 (Foo)
22 RETURN_VALUE
The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.
The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.
The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.
Let's inspect that code object that creates the class body itself; code objects have a co_consts structure:
>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
2 0 LOAD_FAST 0 (__locals__)
3 STORE_LOCALS
4 LOAD_NAME 0 (__name__)
7 STORE_NAME 1 (__module__)
10 LOAD_CONST 0 ('foo.<locals>.Foo')
13 STORE_NAME 2 (__qualname__)
3 16 LOAD_CONST 1 (5)
19 STORE_NAME 3 (x)
4 22 LOAD_CONST 2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>)
25 LOAD_CONST 3 ('foo.<locals>.Foo.<listcomp>')
28 MAKE_FUNCTION 0
31 LOAD_NAME 4 (range)
34 LOAD_CONST 4 (1)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 GET_ITER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 STORE_NAME 5 (y)
47 LOAD_CONST 5 (None)
50 RETURN_VALUE
The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn't work because x isn't defined as a global). Note that after storing 5 in x, it loads another code object; that's the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.
From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.
Python 2.x uses inline bytecode there instead, here is output from Python 2.7:
2 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
3 6 LOAD_CONST 0 (5)
9 STORE_NAME 2 (x)
4 12 BUILD_LIST 0
15 LOAD_NAME 3 (range)
18 LOAD_CONST 1 (1)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 12 (to 40)
28 STORE_NAME 4 (i)
31 LOAD_NAME 2 (x)
34 LIST_APPEND 2
37 JUMP_ABSOLUTE 25
>> 40 STORE_NAME 5 (y)
43 LOAD_LOCALS
44 RETURN_VALUE
No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.
However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn't found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:
>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_GLOBAL 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.
Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):
>>> def foo():
... x = 2
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
5 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_DEREF 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The LOAD_DEREF will indirectly load x from the code object cell objects:
>>> foo.__code__.co_cellvars # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars # Refers to `x` in foo
('x',)
>>> foo().y
[2]
The actual referencing looks the value up from the current frame data structures, which were initialized from a function object's .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function's closure. To see a closure in action, we'd have to inspect a nested function instead:
>>> def spam(x):
... def eggs():
... return x
... return eggs
...
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5
So, to summarize:
List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn't include the class body as a scope, the temporary function namespace is not considered.
A workaround; or, what to do about it
If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:
>>> class Foo:
... x = 5
... def y(x):
... return [x for i in range(1)]
... y = y(x)
...
>>> Foo.y
[5]
The 'temporary' y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:
>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)
Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.
The best work-around is to just use __init__ to create an instance variable instead:
def __init__(self):
self.y = [self.x for i in range(1)]
and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don't store the generated class at all), or use a global:
from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])
class StateDatabase:
db = [State(*args) for args in [
('Alabama', 'Montgomery'),
('Alaska', 'Juneau'),
# ...
]]
In my opinion it is a flaw in Python 3. I hope they change it.
Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):
class A:
x = 4
y = [x+i for i in range(1)]
NOTE: simply scoping it with A.x would not solve it
New Way (works in 3+):
class A:
x = 4
y = (lambda x=x: [x+i for i in range(1)])()
Because the syntax is so ugly I just initialize all my class variables in the constructor typically
The accepted answer provides excellent information, but there appear to be a few other wrinkles here -- differences between list comprehension and generator expressions. A demo that I played around with:
class Foo:
# A class-level variable.
X = 10
# I can use that variable to define another class-level variable.
Y = sum((X, X))
# Works in Python 2, but not 3.
# In Python 3, list comprehensions were given their own scope.
try:
Z1 = sum([X for _ in range(3)])
except NameError:
Z1 = None
# Fails in both.
# Apparently, generator expressions (that's what the entire argument
# to sum() is) did have their own scope even in Python 2.
try:
Z2 = sum(X for _ in range(3))
except NameError:
Z2 = None
# Workaround: put the computation in lambda or def.
compute_z3 = lambda val: sum(val for _ in range(3))
# Then use that function.
Z3 = compute_z3(X)
# Also worth noting: here I can refer to XS in the for-part of the
# generator expression (Z4 works), but I cannot refer to XS in the
# inner-part of the generator expression (Z5 fails).
XS = [15, 15, 15, 15]
Z4 = sum(val for val in XS)
try:
Z5 = sum(XS[i] for i in range(len(XS)))
except NameError:
Z5 = None
print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)
Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension's scope:
import itertools as it
class Foo:
x = 5
y = [j for i, j in zip(range(3), it.repeat(x))]
One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:
class Foo:
x = 5
y = [j for j in (x,) for i in range(3)]
For the specific example of the OP:
from collections import namedtuple
import itertools as it
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for State, args in zip(it.repeat(State), [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
])]
This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.
To illustrate why this is a bug, let's return to the original example. This fails:
class Foo:
x = 5
y = [x for i in range(1)]
But this works:
def Foo():
x = 5
y = [x for i in range(1)]
The limitation is stated at the end of this section in the reference guide.
This may be by design, but IMHO, it's a bad design. I know I'm not an expert here, and I've tried reading the rationale behind this, but it just goes over my head, as I think it would for any average Python programmer.
To me, a comprehension doesn't seem that much different than a regular mathematical expression. For example, if 'foo' is a local function variable, I can easily do something like:
(foo + 5) + 7
But I can't do:
[foo + x for x in [1,2,3]]
To me, the fact that one expression exists in the current scope and the other creates a scope of its own is very surprising and, no pun intended, 'incomprehensible'.
I spent quite some time to understand why this is a feature, not a bug.
Consider the simple code:
a = 5
def myfunc():
print(a)
Since there is no "a" defined in myfunc(), the scope would expand and the code will execute.
Now consider the same code in the class. It cannot work because this would completely mess around accessing the data in the class instances. You would never know, are you accessing a variable in the base class or the instance.
The list comprehension is just a sub-case of the same effect.
One can use a for loop:
class A:
x=5
##Won't work:
## y=[i for i in range(101) if i%x==0]
y=[]
for i in range(101):
if i%x==0:
y.append(i)
Please correct me i'm not wrong...

Resources