Related
After dive into Python's source code, I find out that it maintains an array of PyInt_Objects ranging from int(-5) to int(256) (#src/Objects/intobject.c)
A little experiment proves it:
>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False
But if I run those code together in a py file (or join them with semi-colons), the result is different:
>>> a = 257; b = 257; a is b
True
I'm curious why they are still the same object, so I digg deeper into the syntax tree and compiler, I came up with a calling hierarchy listed below:
PyRun_FileExFlags()
mod = PyParser_ASTFromFile()
node *n = PyParser_ParseFileFlagsEx() //source to cst
parsetoke()
ps = PyParser_New()
for (;;)
PyTokenizer_Get()
PyParser_AddToken(ps, ...)
mod = PyAST_FromNode(n, ...) //cst to ast
run_mod(mod, ...)
co = PyAST_Compile(mod, ...) //ast to CFG
PyFuture_FromAST()
PySymtable_Build()
co = compiler_mod()
PyEval_EvalCode(co, ...)
PyEval_EvalCodeEx()
Then I added some debug code in PyInt_FromLong and before/after PyAST_FromNode, and executed a test.py:
a = 257
b = 257
print "id(a) = %d, id(b) = %d" % (id(a), id(b))
the output looks like:
DEBUG: before PyAST_FromNode
name = a
ival = 257, id = 176046536
name = b
ival = 257, id = 176046752
name = a
name = b
DEBUG: after PyAST_FromNode
run_mod
PyAST_Compile ok
id(a) = 176046536, id(b) = 176046536
Eval ok
It means that during the cst to ast transform, two different PyInt_Objects are created (actually it's performed in the ast_for_atom() function), but they are later merged.
I find it hard to comprehend the source in PyAST_Compile and PyEval_EvalCode, so I'm here to ask for help, I'll be appreciative if some one gives a hint?
Python caches integers in the range [-5, 256], so integers in that range are usually but not always identical.
What you see for 257 is the Python compiler optimizing identical literals when compiled in the same code object.
When typing in the Python shell each line is a completely different statement, parsed and compiled separately, thus:
>>> a = 257
>>> b = 257
>>> a is b
False
But if you put the same code into a file:
$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True
This happens whenever the compiler has a chance to analyze the literals together, for example when defining a function in the interactive interpreter:
>>> def test():
... a = 257
... b = 257
... print a is b
...
>>> dis.dis(test)
2 0 LOAD_CONST 1 (257)
3 STORE_FAST 0 (a)
3 6 LOAD_CONST 1 (257)
9 STORE_FAST 1 (b)
4 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 COMPARE_OP 8 (is)
21 PRINT_ITEM
22 PRINT_NEWLINE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> test()
True
>>> test.func_code.co_consts
(None, 257)
Note how the compiled code contains a single constant for the 257.
In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like statically typed languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.
Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:
>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True
For more complex literals, like tuples, it "doesn't work":
>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False
But the literals inside the tuple are shared:
>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True
(Note that constant folding and the peephole optimizer can change behaviour even between bugfix versions, so which examples return True or False is basically arbitrary and will change in the future).
Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:
>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257
The parser has two choices:
Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.
Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).
Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:
x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString((char *)s,
(char **)0,
0);
}
As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created,
and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.
The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.
In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:
/* Make None the first constant, so the lambda can't have a
docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
return 0;
So it seems like compiler_add_o is used to insert constants for functions/lambdas etc.
The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.
I've been struggling with some operations over NumPy arrays and functions in a recent script. I've finally discovered what seems to be the error: NumPy __isub__.
Heres a example:
def test(apocalypse):
apocalypse = apocalypse - 3
return apocalypse
def test2(apocalypse):
apocalypse -= 3
return apocalypse
foo = np.array([1,2,3])
print(test(foo))
print(foo)
bar = np.array([1,2,3])
print(test2(bar))
print(bar)
Which results in:
[-2 -1 0]
[1 2 3]
[-2 -1 0]
[-2 -1 0]
Was it expected? Should -= change a global variable (same for +=)? I've tried with int and float instead of NumPy arrays and it worked as I expected.
Posting an answer in case anybody needs more details than the hpaulj and Willem Van Onsem quick and good answers.
I'm supposing we already understand how mutable/immutable objects work. If not, here we have an answer about it, in a by reference vs by assignment context.
The problem I faced was due to the fact that isub is an in-place operator and it was used with a mutable object so the original variable passed by assignment to the test function has its object changed, having not to do with global variables as I supposed.
Note that as we could see, it's not because you have a mutable object that all operations will change the object in-place though. Example with id:
arr = np.array([1,2,3])
print(id(arr)) # prints 139838225358560
arr += 3
print(arr) # prints [4 5 6]
print(id(arr)) # prints 139838225358560
arr = arr + 3
print(arr) # prints [7 8 9]
print(id(arr)) # prints 139838016018832
The in-place operator += performs the change of the object arr while the assignment = creates a new one. The same goes in my question with apocalypse -= 3 and apocalypse = apocalypse - 3, respectively.
Python documentation has a short and good explanation of in-place operators with a list of them.
This question already has answers here:
Scope of python variable in for loop
(10 answers)
Closed 9 years ago.
I am trying to do something as simple as changing the varible in which I am iterating over (i) but I am getting different behaviours in both Python and C.
In Python,
for i in range(10):
print i,
if i == 2:
i = 4;
I get 0 1 2 3 4 5 6 7 8 9, but the equivalent in C:
int i;
for (i = 0; i < 10; i++) {
printf("%d", i);
if (i == 2)
i = 4;
}
I get 01256789 (note that numbers 3 and 4 don't appear, as expected).
What's happening here?
Python has a few nice things happening in the background of for loops.
For example:
for i in range(10):
will constantly set i to be the next element in the range 0-10 no matter what.
If you want to do the equivalent in python you would do:
i = 0
while i < 10:
print(i)
if i == 2:
i = 4
else: # these line are
i += 1 # the correct way
i += 1 # note that this is wrong if you want 1,2,4,5,6,7,8,9
If you are trying to convert it to C then you have to remember that the i++ in the for loop will always add to the i.
The function range() creates a list.
For example, range(10) will create the following list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
When you say for i in range(10), first off all the list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] will be generated, and then i will be assigned all the values from that list, in order.
It doesn't matter if you change the value of i because on the next iteration it will be assigned the next element from the list.
It C/C++ on the other hand, at the end of each iteration the value of i is incremented and then compared to the maximum allowed value (in this case 9) and if it is greater, then the loop ends.
When you call range(10) you create an iteratable list [0,1,2,3,4,5,6,7,8,9].
And the for loop just pick up one number after the other from the list at each turn, whether or not you haved changed the value of i.
Python gives you the elements in range(10), one after another. C repeatedly increments a variable.
Both of them don't really care what else you do with the variable inside the loop, but since the two language constructs do slightly different things, the outcome is different in some cases.
You can not do this by using range function.
you have to do it by using while loop only because for loop uses range function and in range function variable will get incremented by its internal method no matter what you specify in the loop it will get incremented by range list only.
for i in range(10):
... print i
... if i == 2:
... i = 4
... else:
... i += 1
...
0
1
2
3
4
5
6
7
8
9
An interesting example is here....
for i in range(10):
... print i
... i = i + 10
... print i
...
this will print...
0
10
1
11
2
12
3
13
4
14
5
15
6
16
7
17
8
18
9
19
It's because when you use the range() function in python. Your variable i will be go through the value in range. For example,
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
However, the C language that you have written is just using the normally condition to change the value of i. There is no function involved.
A generator function can be defined by putting the yield keyword in the function’s body:
def gen():
for i in range(10):
yield i
How to define an empty generator function?
The following code doesn’t work, since Python cannot know that it is supposed to be a generator function instead of a normal function:
def empty():
pass
I could do something like this:
def empty():
if False:
yield
But that would be very ugly. Is there a nicer way?
You can use return once in a generator; it stops iteration without yielding anything, and thus provides an explicit alternative to letting the function run out of scope. So use yield to turn the function into a generator, but precede it with return to terminate the generator before yielding anything.
>>> def f():
... return
... yield
...
>>> list(f())
[]
I'm not sure it's that much better than what you have -- it just replaces a no-op if statement with a no-op yield statement. But it is more idiomatic. Note that just using yield doesn't work.
>>> def f():
... yield
...
>>> list(f())
[None]
Why not just use iter(())?
This question asks specifically about an empty generator function. For that reason, I take it to be a question about the internal consistency of Python's syntax, rather than a question about the best way to create an empty iterator in general.
If question is actually about the best way to create an empty iterator, then you might agree with Zectbumo about using iter(()) instead. However, it's important to observe that iter(()) doesn't return a function! It directly returns an empty iterable. Suppose you're working with an API that expects a callable that returns an iterable each time it's called, just like an ordinary generator function. You'll have to do something like this:
def empty():
return iter(())
(Credit should go to Unutbu for giving the first correct version of this answer.)
Now, you may find the above clearer, but I can imagine situations in which it would be less clear. Consider this example of a long list of (contrived) generator function definitions:
def zeros():
while True:
yield 0
def ones():
while True:
yield 1
...
At the end of that long list, I'd rather see something with a yield in it, like this:
def empty():
return
yield
or, in Python 3.3 and above (as suggested by DSM), this:
def empty():
yield from ()
The presence of the yield keyword makes it clear at the briefest glance that this is just another generator function, exactly like all the others. It takes a bit more time to see that the iter(()) version is doing the same thing.
It's a subtle difference, but I honestly think the yield-based functions are more readable and maintainable.
See also this great answer from user3840170 that uses dis to show another reason why this approach is preferable: it emits the fewest instructions when compiled.
iter(())
You don't require a generator. C'mon guys!
Python 3.3 (because I'm on a yield from kick, and because #senderle stole my first thought):
>>> def f():
... yield from ()
...
>>> list(f())
[]
But I have to admit, I'm having a hard time coming up with a use case for this for which iter([]) or (x)range(0) wouldn't work equally well.
Another option is:
(_ for _ in ())
Like #senderle said, use this:
def empty():
return
yield
I’m writing this answer mostly to share another justification for it.
One reason for choosing this solution above the others is that it is optimal as far as the interpreter is concerned.
>>> import dis
>>> def empty_yield_from():
... yield from ()
...
>>> def empty_iter():
... return iter(())
...
>>> def empty_return():
... return
... yield
...
>>> def noop():
... pass
...
>>> dis.dis(empty_yield_from)
2 0 LOAD_CONST 1 (())
2 GET_YIELD_FROM_ITER
4 LOAD_CONST 0 (None)
6 YIELD_FROM
8 POP_TOP
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
>>> dis.dis(empty_iter)
2 0 LOAD_GLOBAL 0 (iter)
2 LOAD_CONST 1 (())
4 CALL_FUNCTION 1
6 RETURN_VALUE
>>> dis.dis(empty_return)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
>>> dis.dis(noop)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
As we can see, the empty_return has exactly the same bytecode as a regular empty function; the rest perform a number of other operations that don’t change the behaviour anyway. The only difference between empty_return and noop is that the former has the generator flag set:
>>> dis.show_code(noop)
Name: noop
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
>>> dis.show_code(empty_return)
Name: empty_return
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, GENERATOR, NOFREE
Constants:
0: None
The above disassembly is outdated as of CPython 3.11, but empty_return still comes out on top, with only two more opcodes (four bytes) than a no-op function:
>>> dis.dis(empty_yield_from)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 1 (())
8 GET_YIELD_FROM_ITER
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE
16 RESUME 2
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>> dis.dis(empty_iter)
1 0 RESUME 0
2 2 LOAD_GLOBAL 1 (NULL + iter)
14 LOAD_CONST 1 (())
16 PRECALL 1
20 CALL 1
30 RETURN_VALUE
>>> dis.dis(empty_return)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 0 (None)
8 RETURN_VALUE
>>> dis.dis(noop)
1 0 RESUME 0
2 2 LOAD_CONST 0 (None)
4 RETURN_VALUE
Of course, the strength of this argument is very dependent on the particular implementation of Python in use; a sufficiently smart alternative interpreter may notice that the other operations amount to nothing useful and optimise them out. However, even if such optimisations are present, they require the interpreter to spend time performing them and to safeguard against optimisation assumptions being broken, like the iter identifier at global scope being rebound to something else (even though that would most likely indicate a bug if it actually happened). In the case of empty_return there is simply nothing to optimise, as bytecode generation stops after a return statement, so even the relatively naïve CPython will not waste time on any spurious operations.
Must it be a generator function? If not, how about
def f():
return iter(())
The "standard" way to make an empty iterator appears to be iter([]).
I suggested to make [] the default argument to iter(); this was rejected with good arguments, see http://bugs.python.org/issue25215
- Jurjen
I want to give a class based example since we haven't had any suggested yet. This is a callable iterator that generates no items. I believe this is a straightforward and descriptive way to solve the issue.
class EmptyGenerator:
def __iter__(self):
return self
def __next__(self):
raise StopIteration
>>> list(EmptyGenerator())
[]
generator = (item for item in [])
Nobody has mentioned it yet, but calling the built-in function zip with no arguments returns an empty iterator:
>>> it = zip()
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
How do you access other class variables from a list comprehension within the class definition? The following works in Python 2 but fails in Python 3:
class Foo:
x = 5
y = [x for i in range(1)]
Python 3.2 gives the error:
NameError: global name 'x' is not defined
Trying Foo.x doesn't work either. Any ideas on how to do this in Python 3?
A slightly more complicated motivating example:
from collections import namedtuple
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for args in [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
]]
In this example, apply() would have been a decent workaround, but it is sadly removed from Python 3.
Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.
The why; or, the official word on this
In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see List comprehension rebinds names even after scope of comprehension. Is this right?). That's great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.
This is documented in pep 227:
Names in class scope are not accessible. Names are resolved in
the innermost enclosing function scope. If a class definition
occurs in a chain of nested scopes, the resolution process skips
class definitions.
and in the class compound statement documentation:
The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.
Emphasis mine; the execution frame is the temporary scope.
Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.
Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.
# Same error, in Python 2 or 3
y = {x: x for i in range(1)}
The (small) exception; or, why one part may still work
There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it's the range(1):
y = [x for i in range(1)]
# ^^^^^^^^
Thus, using x in that expression would not throw an error:
# Runs fine
y = [i for i in range(x)]
This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension's scope:
# NameError
y = [i for i in range(1) for j in range(x)]
# ^^^^^^^^^^^^^^^^^ -----------------
# outer loop inner, nested loop
This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.
Looking under the hood; or, way more detail than you ever wanted
You can see this all in action using the dis module. I'm using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.
To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:
>>> import dis
>>> def foo():
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo)
2 0 LOAD_BUILD_CLASS
1 LOAD_CONST 1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>)
4 LOAD_CONST 2 ('Foo')
7 MAKE_FUNCTION 0
10 LOAD_CONST 2 ('Foo')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_FAST 0 (Foo)
5 19 LOAD_FAST 0 (Foo)
22 RETURN_VALUE
The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.
The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.
The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.
Let's inspect that code object that creates the class body itself; code objects have a co_consts structure:
>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
2 0 LOAD_FAST 0 (__locals__)
3 STORE_LOCALS
4 LOAD_NAME 0 (__name__)
7 STORE_NAME 1 (__module__)
10 LOAD_CONST 0 ('foo.<locals>.Foo')
13 STORE_NAME 2 (__qualname__)
3 16 LOAD_CONST 1 (5)
19 STORE_NAME 3 (x)
4 22 LOAD_CONST 2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>)
25 LOAD_CONST 3 ('foo.<locals>.Foo.<listcomp>')
28 MAKE_FUNCTION 0
31 LOAD_NAME 4 (range)
34 LOAD_CONST 4 (1)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 GET_ITER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 STORE_NAME 5 (y)
47 LOAD_CONST 5 (None)
50 RETURN_VALUE
The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn't work because x isn't defined as a global). Note that after storing 5 in x, it loads another code object; that's the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.
From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.
Python 2.x uses inline bytecode there instead, here is output from Python 2.7:
2 0 LOAD_NAME 0 (__name__)
3 STORE_NAME 1 (__module__)
3 6 LOAD_CONST 0 (5)
9 STORE_NAME 2 (x)
4 12 BUILD_LIST 0
15 LOAD_NAME 3 (range)
18 LOAD_CONST 1 (1)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 12 (to 40)
28 STORE_NAME 4 (i)
31 LOAD_NAME 2 (x)
34 LIST_APPEND 2
37 JUMP_ABSOLUTE 25
>> 40 STORE_NAME 5 (y)
43 LOAD_LOCALS
44 RETURN_VALUE
No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.
However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn't found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:
>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_GLOBAL 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.
Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):
>>> def foo():
... x = 2
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
5 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_DEREF 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The LOAD_DEREF will indirectly load x from the code object cell objects:
>>> foo.__code__.co_cellvars # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars # Refers to `x` in foo
('x',)
>>> foo().y
[2]
The actual referencing looks the value up from the current frame data structures, which were initialized from a function object's .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function's closure. To see a closure in action, we'd have to inspect a nested function instead:
>>> def spam(x):
... def eggs():
... return x
... return eggs
...
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5
So, to summarize:
List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn't include the class body as a scope, the temporary function namespace is not considered.
A workaround; or, what to do about it
If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:
>>> class Foo:
... x = 5
... def y(x):
... return [x for i in range(1)]
... y = y(x)
...
>>> Foo.y
[5]
The 'temporary' y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:
>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)
Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.
The best work-around is to just use __init__ to create an instance variable instead:
def __init__(self):
self.y = [self.x for i in range(1)]
and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don't store the generated class at all), or use a global:
from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])
class StateDatabase:
db = [State(*args) for args in [
('Alabama', 'Montgomery'),
('Alaska', 'Juneau'),
# ...
]]
In my opinion it is a flaw in Python 3. I hope they change it.
Old Way (works in 2.7, throws NameError: name 'x' is not defined in 3+):
class A:
x = 4
y = [x+i for i in range(1)]
NOTE: simply scoping it with A.x would not solve it
New Way (works in 3+):
class A:
x = 4
y = (lambda x=x: [x+i for i in range(1)])()
Because the syntax is so ugly I just initialize all my class variables in the constructor typically
The accepted answer provides excellent information, but there appear to be a few other wrinkles here -- differences between list comprehension and generator expressions. A demo that I played around with:
class Foo:
# A class-level variable.
X = 10
# I can use that variable to define another class-level variable.
Y = sum((X, X))
# Works in Python 2, but not 3.
# In Python 3, list comprehensions were given their own scope.
try:
Z1 = sum([X for _ in range(3)])
except NameError:
Z1 = None
# Fails in both.
# Apparently, generator expressions (that's what the entire argument
# to sum() is) did have their own scope even in Python 2.
try:
Z2 = sum(X for _ in range(3))
except NameError:
Z2 = None
# Workaround: put the computation in lambda or def.
compute_z3 = lambda val: sum(val for _ in range(3))
# Then use that function.
Z3 = compute_z3(X)
# Also worth noting: here I can refer to XS in the for-part of the
# generator expression (Z4 works), but I cannot refer to XS in the
# inner-part of the generator expression (Z5 fails).
XS = [15, 15, 15, 15]
Z4 = sum(val for val in XS)
try:
Z5 = sum(XS[i] for i in range(len(XS)))
except NameError:
Z5 = None
print(Foo.Z1, Foo.Z2, Foo.Z3, Foo.Z4, Foo.Z5)
Since the outermost iterator is evaluated in the surrounding scope we can use zip together with itertools.repeat to carry the dependencies over to the comprehension's scope:
import itertools as it
class Foo:
x = 5
y = [j for i, j in zip(range(3), it.repeat(x))]
One can also use nested for loops in the comprehension and include the dependencies in the outermost iterable:
class Foo:
x = 5
y = [j for j in (x,) for i in range(3)]
For the specific example of the OP:
from collections import namedtuple
import itertools as it
class StateDatabase:
State = namedtuple('State', ['name', 'capital'])
db = [State(*args) for State, args in zip(it.repeat(State), [
['Alabama', 'Montgomery'],
['Alaska', 'Juneau'],
# ...
])]
This is a bug in Python. Comprehensions are advertised as being equivalent to for loops, but this is not true in classes. At least up to Python 3.6.6, in a comprehension used in a class, only one variable from outside the comprehension is accessible inside the comprehension, and it must be used as the outermost iterator. In a function, this scope limitation does not apply.
To illustrate why this is a bug, let's return to the original example. This fails:
class Foo:
x = 5
y = [x for i in range(1)]
But this works:
def Foo():
x = 5
y = [x for i in range(1)]
The limitation is stated at the end of this section in the reference guide.
This may be by design, but IMHO, it's a bad design. I know I'm not an expert here, and I've tried reading the rationale behind this, but it just goes over my head, as I think it would for any average Python programmer.
To me, a comprehension doesn't seem that much different than a regular mathematical expression. For example, if 'foo' is a local function variable, I can easily do something like:
(foo + 5) + 7
But I can't do:
[foo + x for x in [1,2,3]]
To me, the fact that one expression exists in the current scope and the other creates a scope of its own is very surprising and, no pun intended, 'incomprehensible'.
I spent quite some time to understand why this is a feature, not a bug.
Consider the simple code:
a = 5
def myfunc():
print(a)
Since there is no "a" defined in myfunc(), the scope would expand and the code will execute.
Now consider the same code in the class. It cannot work because this would completely mess around accessing the data in the class instances. You would never know, are you accessing a variable in the base class or the instance.
The list comprehension is just a sub-case of the same effect.
One can use a for loop:
class A:
x=5
##Won't work:
## y=[i for i in range(101) if i%x==0]
y=[]
for i in range(101):
if i%x==0:
y.append(i)
Please correct me i'm not wrong...