Python is operator behaviour with integers [duplicate] - python-3.x

After dive into Python's source code, I find out that it maintains an array of PyInt_Objects ranging from int(-5) to int(256) (#src/Objects/intobject.c)
A little experiment proves it:
>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False
But if I run those code together in a py file (or join them with semi-colons), the result is different:
>>> a = 257; b = 257; a is b
True
I'm curious why they are still the same object, so I digg deeper into the syntax tree and compiler, I came up with a calling hierarchy listed below:
PyRun_FileExFlags()
mod = PyParser_ASTFromFile()
node *n = PyParser_ParseFileFlagsEx() //source to cst
parsetoke()
ps = PyParser_New()
for (;;)
PyTokenizer_Get()
PyParser_AddToken(ps, ...)
mod = PyAST_FromNode(n, ...) //cst to ast
run_mod(mod, ...)
co = PyAST_Compile(mod, ...) //ast to CFG
PyFuture_FromAST()
PySymtable_Build()
co = compiler_mod()
PyEval_EvalCode(co, ...)
PyEval_EvalCodeEx()
Then I added some debug code in PyInt_FromLong and before/after PyAST_FromNode, and executed a test.py:
a = 257
b = 257
print "id(a) = %d, id(b) = %d" % (id(a), id(b))
the output looks like:
DEBUG: before PyAST_FromNode
name = a
ival = 257, id = 176046536
name = b
ival = 257, id = 176046752
name = a
name = b
DEBUG: after PyAST_FromNode
run_mod
PyAST_Compile ok
id(a) = 176046536, id(b) = 176046536
Eval ok
It means that during the cst to ast transform, two different PyInt_Objects are created (actually it's performed in the ast_for_atom() function), but they are later merged.
I find it hard to comprehend the source in PyAST_Compile and PyEval_EvalCode, so I'm here to ask for help, I'll be appreciative if some one gives a hint?

Python caches integers in the range [-5, 256], so integers in that range are usually but not always identical.
What you see for 257 is the Python compiler optimizing identical literals when compiled in the same code object.
When typing in the Python shell each line is a completely different statement, parsed and compiled separately, thus:
>>> a = 257
>>> b = 257
>>> a is b
False
But if you put the same code into a file:
$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True
This happens whenever the compiler has a chance to analyze the literals together, for example when defining a function in the interactive interpreter:
>>> def test():
... a = 257
... b = 257
... print a is b
...
>>> dis.dis(test)
2 0 LOAD_CONST 1 (257)
3 STORE_FAST 0 (a)
3 6 LOAD_CONST 1 (257)
9 STORE_FAST 1 (b)
4 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 COMPARE_OP 8 (is)
21 PRINT_ITEM
22 PRINT_NEWLINE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> test()
True
>>> test.func_code.co_consts
(None, 257)
Note how the compiled code contains a single constant for the 257.
In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like statically typed languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.
Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:
>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True
For more complex literals, like tuples, it "doesn't work":
>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False
But the literals inside the tuple are shared:
>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True
(Note that constant folding and the peephole optimizer can change behaviour even between bugfix versions, so which examples return True or False is basically arbitrary and will change in the future).
Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:
>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257
The parser has two choices:
Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.
Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).
Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:
x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString((char *)s,
(char **)0,
0);
}
As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created,
and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.
The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.
In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:
/* Make None the first constant, so the lambda can't have a
docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
return 0;
So it seems like compiler_add_o is used to insert constants for functions/lambdas etc.
The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

Related

Unexplained behavior in terminal with is operator [duplicate]

This question already has answers here:
Compare if two variables reference the same object in python
(6 answers)
Closed 5 months ago.
The is operator does not match the values of the variables, but the
instances themselves.
What does it really mean?
I declared two variables named x and y assigning the same values in both variables, but it returns false when I use the is operator.
I need a clarification. Here is my code.
x = [1, 2, 3]
y = [1, 2, 3]
print(x is y) # It prints false!
You misunderstood what the is operator tests. It tests if two variables point the same object, not if two variables have the same value.
From the documentation for the is operator:
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object.
Use the == operator instead:
print(x == y)
This prints True. x and y are two separate lists:
x[0] = 4
print(y) # prints [1, 2, 3]
print(x == y) # prints False
If you use the id() function you'll see that x and y have different identifiers:
>>> id(x)
4401064560
>>> id(y)
4401098192
but if you were to assign y to x then both point to the same object:
>>> x = y
>>> id(x)
4401064560
>>> id(y)
4401064560
>>> x is y
True
and is shows both are the same object, it returns True.
Remember that in Python, names are just labels referencing values; you can have multiple names point to the same object. is tells you if two names point to one and the same object. == tells you if two names refer to objects that have the same value.
Another duplicate was asking why two equal strings are generally not identical, which isn't really answered here:
>>> x = 'a'
>>> x += 'bc'
>>> y = 'abc'
>>> x == y
True
>>> x is y
False
So, why aren't they the same string? Especially given this:
>>> z = 'abc'
>>> w = 'abc'
>>> z is w
True
Let's put off the second part for a bit. How could the first one be true?
The interpreter would have to have an "interning table", a table mapping string values to string objects, so every time you try to create a new string with the contents 'abc', you get back the same object. Wikipedia has a more detailed discussion on how interning works.
And Python has a string interning table; you can manually intern strings with the sys.intern method.
In fact, Python is allowed to automatically intern any immutable types, but not required to do so. Different implementations will intern different values.
CPython (the implementation you're using if you don't know which implementation you're using) auto-interns small integers and some special singletons like False, but not strings (or large integers, or small tuples, or anything else). You can see this pretty easily:
>>> a = 0
>>> a += 1
>>> b = 1
>>> a is b
True
>>> a = False
>>> a = not a
>>> b = True
a is b
True
>>> a = 1000
>>> a += 1
>>> b = 1001
>>> a is b
False
OK, but why were z and w identical?
That's not the interpreter automatically interning, that's the compiler folding values.
If the same compile-time string appears twice in the same module (what exactly this means is hard to define—it's not the same thing as a string literal, because r'abc', 'abc', and 'a' 'b' 'c' are all different literals but the same string—but easy to understand intuitively), the compiler will only create one instance of the string, with two references.
In fact, the compiler can go even further: 'ab' + 'c' can be converted to 'abc' by the optimizer, in which case it can be folded together with an 'abc' constant in the same module.
Again, this is something Python is allowed but not required to do. But in this case, CPython always folds small strings (and also, e.g., small tuples). (Although the interactive interpreter's statement-by-statement compiler doesn't run the same optimization as the module-at-a-time compiler, so you won't see exactly the same results interactively.)
So, what should you do about this as a programmer?
Well… nothing. You almost never have any reason to care if two immutable values are identical. If you want to know when you can use a is b instead of a == b, you're asking the wrong question. Just always use a == b except in two cases:
For more readable comparisons to the singleton values like x is None.
For mutable values, when you need to know whether mutating x will affect the y.
is only returns true if they're actually the same object. If they were the same, a change to one would also show up in the other. Here's an example of the difference.
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> print x is y
False
>>> z = y
>>> print y is z
True
>>> print x is z
False
>>> y[0] = 5
>>> print z
[5, 2, 3]
Prompted by a duplicate question, this analogy might work:
# - Darling, I want some pudding!
# - There is some in the fridge.
pudding_to_eat = fridge_pudding
pudding_to_eat is fridge_pudding
# => True
# - Honey, what's with all the dirty dishes?
# - I wanted to eat pudding so I made some. Sorry about the mess, Darling.
# - But there was already some in the fridge.
pudding_to_eat = make_pudding(ingredients)
pudding_to_eat is fridge_pudding
# => False
is and is not are the two identity operators in Python. is operator does not compare the values of the variables, but compares the identities of the variables. Consider this:
>>> a = [1,2,3]
>>> b = [1,2,3]
>>> hex(id(a))
'0x1079b1440'
>>> hex(id(b))
'0x107960878'
>>> a is b
False
>>> a == b
True
>>>
The above example shows you that the identity (can also be the memory address in Cpython) is different for both a and b (even though their values are the same). That is why when you say a is b it returns false due to the mismatch in the identities of both the operands. However when you say a == b, it returns true because the == operation only verifies if both the operands have the same value assigned to them.
Interesting example (for the extra grade):
>>> del a
>>> del b
>>> a = 132
>>> b = 132
>>> hex(id(a))
'0x7faa2b609738'
>>> hex(id(b))
'0x7faa2b609738'
>>> a is b
True
>>> a == b
True
>>>
In the above example, even though a and b are two different variables, a is b returned True. This is because the type of a is int which is an immutable object. So python (I guess to save memory) allocated the same object to b when it was created with the same value. So in this case, the identities of the variables matched and a is b turned out to be True.
This will apply for all immutable objects:
>>> del a
>>> del b
>>> a = "asd"
>>> b = "asd"
>>> hex(id(a))
'0x1079b05a8'
>>> hex(id(b))
'0x1079b05a8'
>>> a is b
True
>>> a == b
True
>>>
Hope that helps.
x is y is same as id(x) == id(y), comparing identity of objects.
As #tomasz-kurgan pointed out in the comment below is operator behaves unusually with certain objects.
E.g.
>>> class A(object):
... def foo(self):
... pass
...
>>> a = A()
>>> a.foo is a.foo
False
>>> id(a.foo) == id(a.foo)
True
Ref;
https://docs.python.org/2/reference/expressions.html#is-not
https://docs.python.org/2/reference/expressions.html#id24
As you can check here to a small integers. Numbers above 257 are not an small ints, so it is calculated as a different object.
It is better to use == instead in this case.
Further information is here: http://docs.python.org/2/c-api/int.html
X points to an array, Y points to a different array. Those arrays are identical, but the is operator will look at those pointers, which are not identical.
It compares object identity, that is, whether the variables refer to the same object in memory. It's like the == in Java or C (when comparing pointers).
A simple example with fruits
fruitlist = [" apple ", " banana ", " cherry ", " durian "]
newfruitlist = fruitlist
verynewfruitlist = fruitlist [:]
print ( fruitlist is newfruitlist )
print ( fruitlist is verynewfruitlist )
print ( newfruitlist is verynewfruitlist )
Output:
True
False
False
If you try
fruitlist = [" apple ", " banana ", " cherry ", " durian "]
newfruitlist = fruitlist
verynewfruitlist = fruitlist [:]
print ( fruitlist == newfruitlist )
print ( fruitlist == verynewfruitlist )
print ( newfruitlist == verynewfruitlist )
The output is different:
True
True
True
That's because the == operator compares just the content of the variable. To compare the identities of 2 variable use the is operator
To print the identification number:
print ( id( variable ) )
The is operator is nothing but an English version of ==.
Because the IDs of the two lists are different so the answer is false.
You can try:
a=[1,2,3]
b=a
print(b is a )#True
*Because the IDs of both the list would be same

Python assigning variables with an OR on assignment, multiple statements in one line?

I am not super familiar with python, and I am having trouble reading this code. I have never seen this syntax, where there multiple statements are paired together (I think) on one line, separated by commas.
if L1.data < L2.data:
tail.next, L1 = L1, L1.next
Also, I don't understand assignment in python with "or": where is the conditional getting evaluated? See this example. When would tail.next be assigned L1, and when would tail.next be assigned L2?
tail.next = L1 or L2
Any clarification would be greatly appreciated. I haven't been able to find much on either syntax
See below
>>> a = 0
>>> b = 1
>>> a, b
(0, 1)
>>> a, b = b, a
>>> a, b
(1, 0)
>>>
It allows one to swap values without requiring a temporary variable.
In your case, the line
tail.next, L1 = L1, L1.next
is equivalent to
tail.next = L1
L1 = L1.next
In python when we write any comma separated values it creates a tuple (a kind of a datastructure).
a = 4,5
type(a) --> tuple
This is called tuple packing.
When we do:
a, b = 4,5
This is called tuple unpacking. It is equivalent to:
a = 4
b = 5
or is the boolean operator here.

Subclassing int class in Python

I want to do something every time I add two integers in my TestClass.
import builtins
class myInt(int):
def __add__(self, other):
print("Do something")
class TestClass:
def __init__(self):
builtins.int = myInt
def testMethod(self):
a = 1
b = 2
c = a + b
When I call my testMethod nothing happens, however if I define it like this I get the desired effect:
def testMethod(self):
a = int(1)
b = 2
c = a + b
Is it possible to make this work for all int literals without having to typecast them before the operations?
Sorry, it's not possible without building your own custom interpreter. Literal objects aren't constructed by calling the constructor in __builtins__, they are constructed using opcodes that directly call the builtin types.
Also immutable literals are constructed when the code is compiled, so you were too late anyway. If you disassemble testMethod you'll see it simply uses the constants that were compiled, it doesn't attempt to construct them:
>>> dis.dis(TestClass.testMethod)
5 0 LOAD_CONST 1 (1)
2 STORE_FAST 1 (a)
6 4 LOAD_CONST 2 (2)
6 STORE_FAST 2 (b)
7 8 LOAD_FAST 1 (a)
10 LOAD_FAST 2 (b)
12 BINARY_ADD
14 STORE_FAST 3 (c)
16 LOAD_CONST 0 (None)
18 RETURN_VALUE
Mutable literals are constructed at runtime but they use opcodes to construct the appropriate value rather than calling the type:
>>> dis.dis(lambda: {'a': 1, 'b': 2})
1 0 LOAD_CONST 1 (1)
2 LOAD_CONST 2 (2)
4 LOAD_CONST 3 (('a', 'b'))
6 BUILD_CONST_KEY_MAP 2
8 RETURN_VALUE
You could do something along the lines of what you want by parsing the source code (use builtin compile() with ast.PyCF_ONLY_AST flag) then walking the parse tree and replacing int literals with a call to your own type (use ast.NodeTransformer). Then all you have to do is finish the compilation (use compile() again). You could even do that with an import hook so it happens automatically when your module is imported, but it will be messy.

How to append float to list?

I want to append float to list, but I got an error like this:
<ipython-input-47-08d9c3f8f180> in maxEs()
12 Es = lists[0]*0.3862 + lists[1]*0.3091 + lists[2]*0.4884
13 aaa = []
---> 14 Es.append(aaa)
15
AttributeError: 'float' object has no attribute 'append'
I guess I can't append float to list. Can I add floats to list another way?
This is my code:
import math
def maxEs():
for a in range(1, 101):
for b in range(1,101):
for c in range(1,101):
if a+b+c == 100 :
lists = []
lists.append(a*0.01)
lists.append(b*0.01)
lists.append(c*0.01)
Es = lists[0]*0.3862 + lists[1]*0.3091 + lists[2]*0.4884
aaa = []
Es.append(aaa)
I don't know what you want, but you are trying to append a list to a float not the other way round.
Should be
aaa.append(Es)
The other answer already explained the main problem with your code, but there is more:
as already said, it has to be aaa.append(Es) (you did it right for the other list)
speaking of the other list: you don't need it at all; just use the values directly in the formula
aaa is re-initialized and overwritten in each iteration of the loop; you should probably move it to the top
you do not need the inner loop to find c; once you know a and b, you can calculate c so that it satisfies the condition
you can also restrict the loop for b, so the result does not exceed 100
finally, you should probably return some result (the max of aaa maybe?)
We do not know what exactly the code is trying to achieve, but maybe try this:
def maxEs():
aaa = []
for a in range(1, 98 + 1):
for b in range(1, 99-a + 1):
c = 100 - a - b
Es = 0.01 * (a * 0.3862 + b * 0.3091 + c * 0.4884)
aaa.append(Es)
return max(aaa)

Python 3 index is len(l) conditional evaluation error

I have the following merge sort code. When the line if ib is len(b) or ... is changed to use double equal ==: if ib == len(b) or ..., the code does not raise an IndexError exception.
This is very unexpected because:
len(b) is evaluated to a number and is is equivalent to == for integers. You can test it out: a python expression
(1 is len([0]) )
is evaluated to be True.
the input to the function is range(1500, -1, -1), and range objects are handled differently in python3. I was suspecting that since the input was handled as a range instance, the length evaluation might have been an instance instead of a integer primitive. This is again strange because
1 is len(range(1))
also gives you True as the result.
Is this a bug with the conditional evaluation in Python3?
Tom Caswell supplied this following useful express in our discussion, I'm copy pasting it here for your notice:
tt = [j is int(str(j)) for j in range(15000)]
only the first 256 items are True. The rest are False hahahaha.
The original script:
def merge_sort(arr):
if len(arr) >= 2:
s = int(len(arr)/2)
a = merge_sort(arr[:s])
b = merge_sort(arr[s:])
ia = 0
ib = 0
new_arr = []
while len(new_arr) < len(arr):
try:
if ib is len(b) or a[ia] <= b[ib]:
new_arr.append(a[ia])
ia += 1
else:
new_arr.append(b[ib])
ib += 1
except IndexError:
print(len(a), len(b), ia, ib)
raise IndexError
return new_arr
else:
return arr
print(merge_sort(range(1500, -1, -1)))
Python does not guarantee that two integer instances with equal value are the same instance. In the example below, the reason the first 256 comparisons return equal is because Python caches -5 to 256 in Long.
This behavior is described here: https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong
example:
tt = [j is int(str(j)) for j in range(500)]
plt.plot(tt)
IIRC that any of them pass the is test is an implementation-specific optimization detail.
is checks whether 2 arguments refer to the same object, == checks whether 2 arguments have the same value. You cannot assume they mean the same thing, they have different uses, and you'll get an error thrown if you attempt to use them interchangeably.

Resources