Basic first order logic inference fails for symmetric binary predicate - nlp

Super basic question. I am trying to express a symmetric relationship between two binary predicates (parent and child). But, with the following statement, my resolution prover allows me to prove anything. The converted CNF form makes sense to me as does the proof by resolution, but this should be an obvious case for false. What am I missing?
forall x,y (is-parent-of(x,y) <-> is-child-of(y,x))
I am using the nltk python library and the ResolutionProver prover. Here is the nltk code:
from nltk.sem import Expression as exp
from nltk.inference import ResolutionProver as prover
s = exp.fromstring('all x.(all y.(parentof(y, x) <-> childof(x, y)))')
q = exp.fromstring('foo(Bar)')
print prover().prove(q, [s], verbose=True)
output:
[1] {-foo(Bar)} A
[2] {-parentof(z9,z10), childof(z10,z9)} A
[3] {parentof(z11,z12), -childof(z12,z11)} A
[4] {} (2, 3)
True

Here is a quick fix for the ResolutionProver.
The issue that causes the prover to be unsound is that it does not implement the resolution rule correctly when there is more than one complementary literal. E.g. given the clauses {A B C} and {-A -B D} binary resolution would produce the clauses {A -A C D} and {B -B C D}. Both would be discarded as tautologies. The current NLTK implementation instead would produce {C D}.
This was probably introduced because clauses are represented in NLTK as lists, therefore identical literals may occur more than once within a clause. This rule does correctly produce an empty clause when applied to the clauses {A A} and {-A -A}, but in general this rule is not correct.
It seems that if we keep clauses free from repetitions of identical literals we can regain soundness with a few changes.
First define a function that removes identical literals.
Here is a naive implementation of such a function
import nltk.inference.resolution as res
def _simplify(clause):
"""
Remove duplicate literals from a clause
"""
duplicates=[]
for i,c in enumerate(clause):
if i in duplicates:
continue
for j,d in enumerate(clause[i+1:],start=i+1):
if j in duplicates:
continue
if c == d:
duplicates.append(j)
result=[]
for i,c in enumerate(clause):
if not i in duplicates:
result.append(clause[i])
return res.Clause(result)
Now we can plug this function into some of the functions of the nltk.inference.resolution module.
def _iterate_first_fix(first, second, bindings, used, skipped, finalize_method, debug):
"""
This method facilitates movement through the terms of 'self'
"""
debug.line('unify(%s,%s) %s'%(first, second, bindings))
if not len(first) or not len(second): #if no more recursions can be performed
return finalize_method(first, second, bindings, used, skipped, debug)
else:
#explore this 'self' atom
result = res._iterate_second(first, second, bindings, used, skipped, finalize_method, debug+1)
#skip this possible 'self' atom
newskipped = (skipped[0]+[first[0]], skipped[1])
result += res._iterate_first(first[1:], second, bindings, used, newskipped, finalize_method, debug+1)
try:
newbindings, newused, unused = res._unify_terms(first[0], second[0], bindings, used)
#Unification found, so progress with this line of unification
#put skipped and unused terms back into play for later unification.
newfirst = first[1:] + skipped[0] + unused[0]
newsecond = second[1:] + skipped[1] + unused[1]
# We return immediately when `_unify_term()` is successful
result += _simplify(finalize_method(newfirst,newsecond,newbindings,newused,([],[]),debug))
except res.BindingException:
pass
return result
res._iterate_first=_iterate_first_fix
Similarly update res._iterate_second
def _iterate_second_fix(first, second, bindings, used, skipped, finalize_method, debug):
"""
This method facilitates movement through the terms of 'other'
"""
debug.line('unify(%s,%s) %s'%(first, second, bindings))
if not len(first) or not len(second): #if no more recursions can be performed
return finalize_method(first, second, bindings, used, skipped, debug)
else:
#skip this possible pairing and move to the next
newskipped = (skipped[0], skipped[1]+[second[0]])
result = res._iterate_second(first, second[1:], bindings, used, newskipped, finalize_method, debug+1)
try:
newbindings, newused, unused = res._unify_terms(first[0], second[0], bindings, used)
#Unification found, so progress with this line of unification
#put skipped and unused terms back into play for later unification.
newfirst = first[1:] + skipped[0] + unused[0]
newsecond = second[1:] + skipped[1] + unused[1]
# We return immediately when `_unify_term()` is successful
result += _simplify(finalize_method(newfirst,newsecond,newbindings,newused,([],[]),debug))
except res.BindingException:
#the atoms could not be unified,
pass
return result
res._iterate_second=_iterate_second_fix
Finally, plug our function into the clausify() to ensure the inputs are repetition-free.
def clausify_simplify(expression):
"""
Skolemize, clausify, and standardize the variables apart.
"""
clause_list = []
for clause in res._clausify(res.skolemize(expression)):
for free in clause.free():
if res.is_indvar(free.name):
newvar = res.VariableExpression(res.unique_variable())
clause = clause.replace(free, newvar)
clause_list.append(_simplify(clause))
return clause_list
res.clausify=clausify_simplify
After applying these changes the prover should run the standard tests and also deal correctly with the parentof/childof relationships.
print res.ResolutionProver().prove(q, [s], verbose=True)
output:
[1] {-foo(Bar)} A
[2] {-parentof(z144,z143), childof(z143,z144)} A
[3] {parentof(z146,z145), -childof(z145,z146)} A
[4] {childof(z145,z146), -childof(z145,z146)} (2, 3) Tautology
[5] {-parentof(z146,z145), parentof(z146,z145)} (2, 3) Tautology
[6] {childof(z145,z146), -childof(z145,z146)} (2, 3) Tautology
False
Update: Achieving correctness is not the end of the story. A more efficient solution would be to replace the container used to store literals in the Clause class with the one based on built-in Python hash-based sets, however that seems to require a more thorough rework of the prover implementation and introducing some performance testing infrastructure as well.

Related

Why there is difference between 'is' and '==' with various results in python [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
My Google-fu has failed me.
In Python, are the following two tests for equality equivalent?
n = 5
# Test one.
if n == 5:
print 'Yay!'
# Test two.
if n is 5:
print 'Yay!'
Does this hold true for objects where you would be comparing instances (a list say)?
Okay, so this kind of answers my question:
L = []
L.append(1)
if L == [1]:
print 'Yay!'
# Holds true, but...
if L is [1]:
print 'Yay!'
# Doesn't.
So == tests value where is tests to see if they are the same object?
is will return True if two variables point to the same object (in memory), == if the objects referred to by the variables are equal.
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
>>> b == a
True
# Make a new copy of list `a` via the slice operator,
# and assign it to variable `b`
>>> b = a[:]
>>> b is a
False
>>> b == a
True
In your case, the second test only works because Python caches small integer objects, which is an implementation detail. For larger integers, this does not work:
>>> 1000 is 10**3
False
>>> 1000 == 10**3
True
The same holds true for string literals:
>>> "a" is "a"
True
>>> "aa" is "a" * 2
True
>>> x = "a"
>>> "aa" is x * 2
False
>>> "aa" is intern(x*2)
True
Please see this question as well.
There is a simple rule of thumb to tell you when to use == or is.
== is for value equality. Use it when you would like to know if two objects have the same value.
is is for reference equality. Use it when you would like to know if two references refer to the same object.
In general, when you are comparing something to a simple type, you are usually checking for value equality, so you should use ==. For example, the intention of your example is probably to check whether x has a value equal to 2 (==), not whether x is literally referring to the same object as 2.
Something else to note: because of the way the CPython reference implementation works, you'll get unexpected and inconsistent results if you mistakenly use is to compare for reference equality on integers:
>>> a = 500
>>> b = 500
>>> a == b
True
>>> a is b
False
That's pretty much what we expected: a and b have the same value, but are distinct entities. But what about this?
>>> c = 200
>>> d = 200
>>> c == d
True
>>> c is d
True
This is inconsistent with the earlier result. What's going on here? It turns out the reference implementation of Python caches integer objects in the range -5..256 as singleton instances for performance reasons. Here's an example demonstrating this:
>>> for i in range(250, 260): a = i; print "%i: %s" % (i, a is int(str(i)));
...
250: True
251: True
252: True
253: True
254: True
255: True
256: True
257: False
258: False
259: False
This is another obvious reason not to use is: the behavior is left up to implementations when you're erroneously using it for value equality.
Is there a difference between == and is in Python?
Yes, they have a very important difference.
==: check for equality - the semantics are that equivalent objects (that aren't necessarily the same object) will test as equal. As the documentation says:
The operators <, >, ==, >=, <=, and != compare the values of two objects.
is: check for identity - the semantics are that the object (as held in memory) is the object. Again, the documentation says:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. Object identity is
determined using the id() function. x is not y yields the inverse
truth value.
Thus, the check for identity is the same as checking for the equality of the IDs of the objects. That is,
a is b
is the same as:
id(a) == id(b)
where id is the builtin function that returns an integer that "is guaranteed to be unique among simultaneously existing objects" (see help(id)) and where a and b are any arbitrary objects.
Other Usage Directions
You should use these comparisons for their semantics. Use is to check identity and == to check equality.
So in general, we use is to check for identity. This is usually useful when we are checking for an object that should only exist once in memory, referred to as a "singleton" in the documentation.
Use cases for is include:
None
enum values (when using Enums from the enum module)
usually modules
usually class objects resulting from class definitions
usually function objects resulting from function definitions
anything else that should only exist once in memory (all singletons, generally)
a specific object that you want by identity
Usual use cases for == include:
numbers, including integers
strings
lists
sets
dictionaries
custom mutable objects
other builtin immutable objects, in most cases
The general use case, again, for ==, is the object you want may not be the same object, instead it may be an equivalent one
PEP 8 directions
PEP 8, the official Python style guide for the standard library also mentions two use-cases for is:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Also, beware of writing if x when you really mean if x is not None --
e.g. when testing whether a variable or argument that defaults to None
was set to some other value. The other value might have a type (such
as a container) that could be false in a boolean context!
Inferring equality from identity
If is is true, equality can usually be inferred - logically, if an object is itself, then it should test as equivalent to itself.
In most cases this logic is true, but it relies on the implementation of the __eq__ special method. As the docs say,
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality. A
motivation for this default behavior is the desire that all objects
should be reflexive (i.e. x is y implies x == y).
and in the interests of consistency, recommends:
Equality comparison should be reflexive. In other words, identical
objects should compare equal:
x is y implies x == y
We can see that this is the default behavior for custom objects:
>>> class Object(object): pass
>>> obj = Object()
>>> obj2 = Object()
>>> obj == obj, obj is obj
(True, True)
>>> obj == obj2, obj is obj2
(False, False)
The contrapositive is also usually true - if somethings test as not equal, you can usually infer that they are not the same object.
Since tests for equality can be customized, this inference does not always hold true for all types.
An exception
A notable exception is nan - it always tests as not equal to itself:
>>> nan = float('nan')
>>> nan
nan
>>> nan is nan
True
>>> nan == nan # !!!!!
False
Checking for identity can be much a much quicker check than checking for equality (which might require recursively checking members).
But it cannot be substituted for equality where you may find more than one object as equivalent.
Note that comparing equality of lists and tuples will assume that identity of objects are equal (because this is a fast check). This can create contradictions if the logic is inconsistent - as it is for nan:
>>> [nan] == [nan]
True
>>> (nan,) == (nan,)
True
A Cautionary Tale:
The question is attempting to use is to compare integers. You shouldn't assume that an instance of an integer is the same instance as one obtained by another reference. This story explains why.
A commenter had code that relied on the fact that small integers (-5 to 256 inclusive) are singletons in Python, instead of checking for equality.
Wow, this can lead to some insidious bugs. I had some code that checked if a is b, which worked as I wanted because a and b are typically small numbers. The bug only happened today, after six months in production, because a and b were finally large enough to not be cached. – gwg
It worked in development. It may have passed some unittests.
And it worked in production - until the code checked for an integer larger than 256, at which point it failed in production.
This is a production failure that could have been caught in code review or possibly with a style-checker.
Let me emphasize: do not use is to compare integers.
== determines if the values are equal, while is determines if they are the exact same object.
What's the difference between is and ==?
== and is are different comparison! As others already said:
== compares the values of the objects.
is compares the references of the objects.
In Python names refer to objects, for example in this case value1 and value2 refer to an int instance storing the value 1000:
value1 = 1000
value2 = value1
Because value2 refers to the same object is and == will give True:
>>> value1 == value2
True
>>> value1 is value2
True
In the following example the names value1 and value2 refer to different int instances, even if both store the same integer:
>>> value1 = 1000
>>> value2 = 1000
Because the same value (integer) is stored == will be True, that's why it's often called "value comparison". However is will return False because these are different objects:
>>> value1 == value2
True
>>> value1 is value2
False
When to use which?
Generally is is a much faster comparison. That's why CPython caches (or maybe reuses would be the better term) certain objects like small integers, some strings, etc. But this should be treated as implementation detail that could (even if unlikely) change at any point without warning.
You should only use is if you:
want to check if two objects are really the same object (not just the same "value"). One example can be if you use a singleton object as constant.
want to compare a value to a Python constant. The constants in Python are:
None
True1
False1
NotImplemented
Ellipsis
__debug__
classes (for example int is int or int is float)
there could be additional constants in built-in modules or 3rd party modules. For example np.ma.masked from the NumPy module)
In every other case you should use == to check for equality.
Can I customize the behavior?
There is some aspect to == that hasn't been mentioned already in the other answers: It's part of Pythons "Data model". That means its behavior can be customized using the __eq__ method. For example:
class MyClass(object):
def __init__(self, val):
self._value = val
def __eq__(self, other):
print('__eq__ method called')
try:
return self._value == other._value
except AttributeError:
raise TypeError('Cannot compare {0} to objects of type {1}'
.format(type(self), type(other)))
This is just an artificial example to illustrate that the method is really called:
>>> MyClass(10) == MyClass(10)
__eq__ method called
True
Note that by default (if no other implementation of __eq__ can be found in the class or the superclasses) __eq__ uses is:
class AClass(object):
def __init__(self, value):
self._value = value
>>> a = AClass(10)
>>> b = AClass(10)
>>> a == b
False
>>> a == a
So it's actually important to implement __eq__ if you want "more" than just reference-comparison for custom classes!
On the other hand you cannot customize is checks. It will always compare just if you have the same reference.
Will these comparisons always return a boolean?
Because __eq__ can be re-implemented or overridden, it's not limited to return True or False. It could return anything (but in most cases it should return a boolean!).
For example with NumPy arrays the == will return an array:
>>> import numpy as np
>>> np.arange(10) == 2
array([False, False, True, False, False, False, False, False, False, False], dtype=bool)
But is checks will always return True or False!
1 As Aaron Hall mentioned in the comments:
Generally you shouldn't do any is True or is False checks because one normally uses these "checks" in a context that implicitly converts the condition to a boolean (for example in an if statement). So doing the is True comparison and the implicit boolean cast is doing more work than just doing the boolean cast - and you limit yourself to booleans (which isn't considered pythonic).
Like PEP8 mentions:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
They are completely different. is checks for object identity, while == checks for equality (a notion that depends on the two operands' types).
It is only a lucky coincidence that "is" seems to work correctly with small integers (e.g. 5 == 4+1). That is because CPython optimizes the storage of integers in the range (-5 to 256) by making them singletons. This behavior is totally implementation-dependent and not guaranteed to be preserved under all manner of minor transformative operations.
For example, Python 3.5 also makes short strings singletons, but slicing them disrupts this behavior:
>>> "foo" + "bar" == "foobar"
True
>>> "foo" + "bar" is "foobar"
True
>>> "foo"[:] + "bar" == "foobar"
True
>>> "foo"[:] + "bar" is "foobar"
False
https://docs.python.org/library/stdtypes.html#comparisons
is tests for identity
== tests for equality
Each (small) integer value is mapped to a single value, so every 3 is identical and equal. This is an implementation detail, not part of the language spec though
Your answer is correct. The is operator compares the identity of two objects. The == operator compares the values of two objects.
An object's identity never changes once it has been created; you may think of it as the object's address in memory.
You can control comparison behaviour of object values by defining a __cmp__ method or a rich comparison method like __eq__.
Have a look at Stack Overflow question Python's “is” operator behaves unexpectedly with integers.
What it mostly boils down to is that "is" checks to see if they are the same object, not just equal to each other (the numbers below 256 are a special case).
In a nutshell, is checks whether two references point to the same object or not.== checks whether two objects have the same value or not.
a=[1,2,3]
b=a #a and b point to the same object
c=list(a) #c points to different object
if a==b:
print('#') #output:#
if a is b:
print('##') #output:##
if a==c:
print('###') #output:##
if a is c:
print('####') #no output as c and a point to different object
As the other people in this post answer the question in details the difference between == and is for comparing Objects or variables, I would emphasize mainly the comparison between is and == for strings which can give different results and I would urge programmers to carefully use them.
For string comparison, make sure to use == instead of is:
str = 'hello'
if (str is 'hello'):
print ('str is hello')
if (str == 'hello'):
print ('str == hello')
Out:
str is hello
str == hello
But in the below example == and is will get different results:
str2 = 'hello sam'
if (str2 is 'hello sam'):
print ('str2 is hello sam')
if (str2 == 'hello sam'):
print ('str2 == hello sam')
Out:
str2 == hello sam
Conclusion and Analysis:
Use is carefully to compare between strings.
Since is for comparing objects and since in Python 3+ every variable such as string interpret as an object, let's see what happened in above paragraphs.
In python there is id function that shows a unique constant of an object during its lifetime. This id is using in back-end of Python interpreter to compare two objects using is keyword.
str = 'hello'
id('hello')
> 140039832615152
id(str)
> 140039832615152
But
str2 = 'hello sam'
id('hello sam')
> 140039832615536
id(str2)
> 140039832615792
As John Feminella said, most of the time you will use == and != because your objective is to compare values. I'd just like to categorise what you would do the rest of the time:
There is one and only one instance of NoneType i.e. None is a singleton. Consequently foo == None and foo is None mean the same. However the is test is faster and the Pythonic convention is to use foo is None.
If you are doing some introspection or mucking about with garbage collection or checking whether your custom-built string interning gadget is working or suchlike, then you probably have a use-case for foo is bar.
True and False are also (now) singletons, but there is no use-case for foo == True and no use case for foo is True.
Most of them already answered to the point. Just as an additional note (based on my understanding and experimenting but not from a documented source), the statement
== if the objects referred to by the variables are equal
from above answers should be read as
== if the objects referred to by the variables are equal and objects belonging to the same type/class
. I arrived at this conclusion based on the below test:
list1 = [1,2,3,4]
tuple1 = (1,2,3,4)
print(list1)
print(tuple1)
print(id(list1))
print(id(tuple1))
print(list1 == tuple1)
print(list1 is tuple1)
Here the contents of the list and tuple are same but the type/class are different.

Representing a constant symbol in Sympy such that it is not a free_symbol

Application
I want to create a python function (e.g., Laplacian(expr)). The Laplacian operator is defined as taking the sum of the second partial derivatives of expr with respect to each variable (e.g., Laplacian(f(x,y,z)) is diff(f,x,x) + diff(f,y,y) + diff(f,z,z). In the expression, there may be arbitrary constants c,k, etc that are not variables as far as the expression is concerned. Just as you cannot take the derivative diff(f,126), taking the derivative of the expression with respect to c is not defined.
Need I need to be able to extract the non-constant free symbols from an expression.
Problem
Though I can construct c = Symbol('c', constant=True, number=True) in Sympy, c.is_constant() evaluates to False. Similarly, g(c).is_constant() evaluates to false. For my application, the symbol c should have the exact same behavior as E.is_constant() == True and g(E).is_constant() == True, as it is a number.
Caveats
I cannot register c as a singleton, as it is only defined with respect to this particular proof or expression.
I cannot construct it in the same way values like E are constructed, as there is no specific numeric value for it to be assigned to.
I cannot simply add a constants keyword to Laplacian, as I do not know all such constants that may appear (just as it would not make sense to add constants=[1,2,3,4,...] to solve()).
I cannot simply add a variables keyword to Laplacian, as I do not know the variables that appear in the expression.
The desired usage is as follows:
>>> C = ... # somehow create the constant
>>> symbols_that_arent_constant_numbers(g(C))
set()
>>> symbols_that_arent_constant_numbers(g(C, x))
{x}
>>> g(C).is_constant()
True
stretch goals: It would be awesome to have an arbitrary constant symbol that absorbs other constant terms in the same way that constantsimp operates. Consider introducing an integration constant c into an expression, and then multiplying that expression by I. As far as we are concerned algebraically, cI=c without loosing any generality.
Note
Per Oscar Benjamin's comments on question, current best practices when constructing a sympy-style method (like Laplacian) is to pass a constants or variables keyword into a method. Bare that in mind when applying the following solution. Furthermore, free_symbols has many applications within Sympy, so using another class that has established semantics may have unexpected side-effects.
(I am not accepting my own solution in the event that a better one comes along, as Mr. Benjamin has pointed out there are many open related issues.)
Solution
Sympy provides a mechanism to create such a constant: sympy.physics.units.quantities.Quantity. Its behavior is equivalent to Symbol and singleton constants, but most notably it does not appear as a free symbol. This can help prevent code from interpreting it as a variable that may be differentiated, etc.
from sympy.physics.units.quantities import Quantity
C = Quantity('C')
print("C constant? : ", C.is_constant())
print("C free symbols : ", C.free_symbols)
print("x constant? : ", x.is_constant())
print("g(C) constant? : ", g(C).is_constant())
print("g(x) constant? : ", g(x).is_constant())
print("g(C,x) constant : ", g(C,x).is_constant())
print("g(C) free symbols : ", g(C).free_symbols)
print("g(C,x) free symbols: ", g(C,x).free_symbols)
assert C.is_constant()
assert C.free_symbols == set([])
assert g(C).is_constant()
assert g(C, x).is_constant() == g(x).is_constant() # consistent interface
assert g(C).free_symbols == set([])
assert g(C, x).free_symbols == set([x])
assert [5/C] == solve(C*x -5, x)
The above snippet produces the following output when tested in sympy==1.5.1:
C constant? : True
C free symbols : set()
x constant? : False
g(C) constant? : True
g(x) constant? : None
g(C,x) constant : None
g(C) free symbols : set()
g(C,x) free symbols: {x}
Note that while g(C).is_constant()==True, we see that g(x).is_constant() == None, as well as g(C,x).is_constant() == None. Consequently, I only assert that those two applications have a consistent interface.

Recursive strategies with additional parameters in Hypothesis

Using recursive, I can generate simple ASTs, e.g.
from hypothesis import *
from hypothesis.strategies import *
def trees():
base = integers(min_value=1, max_value=10).map(lambda n: 'x' + str(n))
#composite
def extend(draw, children):
op = draw(sampled_from(['+', '-', '*', '/']))
return (op, draw(children), draw(children))
return recursive(base, draw)
Now I want to change it so I can generate boolean operations in addition to the arithmetical ones. My initial idea is to add a parameter to trees:
def trees(tpe):
base = integers(min_value=1, max_value=10).map(lambda n: 'x' + str(n) + ': ' + tpe)
#composite
def extend(draw, children):
if tpe == 'bool':
op = draw(sampled_from(['&&', '||']))
return (op, draw(children), draw(children))
elif tpe == 'num':
op = draw(sampled_from(['+', '-', '*', '/']))
return (op, draw(children), draw(children))
return recursive(base, draw)
Ok so far. But how do I mix them? That is, I also want comparison operators and the ternary operator, which would require "calling children with a different parameter", so to say.
The trees need to be well-typed: if the operation is '||' or '&&', both arguments need to be boolean, arguments to '+' or '<' need to be numbers, etc. If I only had two types, I could just use filter (given a type_of function):
if op in ('&&', '||'):
bool_trees = children.filter(lambda x: type_of(x) == 'bool')
return (op, draw(bool_trees), draw(bool_trees))
but in the real case it wouldn't be acceptable.
Does recursive support this? Or is there another way? Obviously, I can directly define trees recursively, but that runs into the standard problems.
You can simply describe trees where the comparison is drawn from either set of operations - in this case trivially by sampling from ['&&', '||', '+', '-', '*', '/'].
def trees():
return recursive(
integers(min_value=1, max_value=10).map('x{}'.format),
lambda node: tuples(sampled_from('&& || + - * /'.split()), node, node)
)
But of course that won't be well-typed (except perhaps by rare coincidence). I think the best option for well-typed ASTs is:
For each type, define a strategy for trees which evaluate to that type. The base case is simply (a strategy for) a value of that type.
The extension is to pre-calculate the possible combinations of types and operations that would generate a value of this type, using mutual recursion via st.deferred. That would look something like...
bool_strat = deferred(
lambda: one_of(
booleans(),
tuples(sampled_from(["and", "or"], bool_strat, bool_strat),
tuples(sampled_from(["==", "!=", "<", ...]), integer_strat, integer_strat),
)
)
integer_strat = deferred(
lambda: one_of(
integers(),
tuples(sampled_from("= - * /".split()), integer_strat, integer_strat),
)
)
any_type_ast = bool_strat | integer_strat
And it will work as if by magic :D
(on the other hand, this is a fair bit more complex - if your workaround is working for you, don't feel obliged to do this instead!)
If you're seeing problematic blowups in size - which should be very rare, as the engine has had a lot of work since that article was written - there's honestly not much to do about it. Threading a depth limit through the whole thing and decrementing it each step does work as a last resort, but it's not nice to work with.
The solution I used for now is to adapt the generated trees so e.g. if a num tree is generated when the operation needs a bool, I also draw a comparison operator op and a constant const and return (op, tree, const):
def make_bool(tree, draw):
if type_of(tree) == 'bool':
return tree
else type_of(tree) == 'num':
op = draw(sampled_from(comparison_ops))
const = draw(integers())
side = draw(booleans())
return (op, tree, const) if side else (op, const, tree)
// in def extend:
if tpe == 'bool':
op = draw(sampled_from(bool_ops + comparison_ops))
if op in bool_ops:
return (op, make_bool(draw(children), draw), make_bool(draw(children), draw))
else:
return (op, make_num(draw(children), draw), make_num(draw(children), draw))
Unfortunately, it's specific to ASTs and will mean specific kinds of trees are generated more often. So I'd still be happy to see better alternatives.

Python operating on big numbers causes margin errors [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
My Google-fu has failed me.
In Python, are the following two tests for equality equivalent?
n = 5
# Test one.
if n == 5:
print 'Yay!'
# Test two.
if n is 5:
print 'Yay!'
Does this hold true for objects where you would be comparing instances (a list say)?
Okay, so this kind of answers my question:
L = []
L.append(1)
if L == [1]:
print 'Yay!'
# Holds true, but...
if L is [1]:
print 'Yay!'
# Doesn't.
So == tests value where is tests to see if they are the same object?
is will return True if two variables point to the same object (in memory), == if the objects referred to by the variables are equal.
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
>>> b == a
True
# Make a new copy of list `a` via the slice operator,
# and assign it to variable `b`
>>> b = a[:]
>>> b is a
False
>>> b == a
True
In your case, the second test only works because Python caches small integer objects, which is an implementation detail. For larger integers, this does not work:
>>> 1000 is 10**3
False
>>> 1000 == 10**3
True
The same holds true for string literals:
>>> "a" is "a"
True
>>> "aa" is "a" * 2
True
>>> x = "a"
>>> "aa" is x * 2
False
>>> "aa" is intern(x*2)
True
Please see this question as well.
There is a simple rule of thumb to tell you when to use == or is.
== is for value equality. Use it when you would like to know if two objects have the same value.
is is for reference equality. Use it when you would like to know if two references refer to the same object.
In general, when you are comparing something to a simple type, you are usually checking for value equality, so you should use ==. For example, the intention of your example is probably to check whether x has a value equal to 2 (==), not whether x is literally referring to the same object as 2.
Something else to note: because of the way the CPython reference implementation works, you'll get unexpected and inconsistent results if you mistakenly use is to compare for reference equality on integers:
>>> a = 500
>>> b = 500
>>> a == b
True
>>> a is b
False
That's pretty much what we expected: a and b have the same value, but are distinct entities. But what about this?
>>> c = 200
>>> d = 200
>>> c == d
True
>>> c is d
True
This is inconsistent with the earlier result. What's going on here? It turns out the reference implementation of Python caches integer objects in the range -5..256 as singleton instances for performance reasons. Here's an example demonstrating this:
>>> for i in range(250, 260): a = i; print "%i: %s" % (i, a is int(str(i)));
...
250: True
251: True
252: True
253: True
254: True
255: True
256: True
257: False
258: False
259: False
This is another obvious reason not to use is: the behavior is left up to implementations when you're erroneously using it for value equality.
Is there a difference between == and is in Python?
Yes, they have a very important difference.
==: check for equality - the semantics are that equivalent objects (that aren't necessarily the same object) will test as equal. As the documentation says:
The operators <, >, ==, >=, <=, and != compare the values of two objects.
is: check for identity - the semantics are that the object (as held in memory) is the object. Again, the documentation says:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. Object identity is
determined using the id() function. x is not y yields the inverse
truth value.
Thus, the check for identity is the same as checking for the equality of the IDs of the objects. That is,
a is b
is the same as:
id(a) == id(b)
where id is the builtin function that returns an integer that "is guaranteed to be unique among simultaneously existing objects" (see help(id)) and where a and b are any arbitrary objects.
Other Usage Directions
You should use these comparisons for their semantics. Use is to check identity and == to check equality.
So in general, we use is to check for identity. This is usually useful when we are checking for an object that should only exist once in memory, referred to as a "singleton" in the documentation.
Use cases for is include:
None
enum values (when using Enums from the enum module)
usually modules
usually class objects resulting from class definitions
usually function objects resulting from function definitions
anything else that should only exist once in memory (all singletons, generally)
a specific object that you want by identity
Usual use cases for == include:
numbers, including integers
strings
lists
sets
dictionaries
custom mutable objects
other builtin immutable objects, in most cases
The general use case, again, for ==, is the object you want may not be the same object, instead it may be an equivalent one
PEP 8 directions
PEP 8, the official Python style guide for the standard library also mentions two use-cases for is:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Also, beware of writing if x when you really mean if x is not None --
e.g. when testing whether a variable or argument that defaults to None
was set to some other value. The other value might have a type (such
as a container) that could be false in a boolean context!
Inferring equality from identity
If is is true, equality can usually be inferred - logically, if an object is itself, then it should test as equivalent to itself.
In most cases this logic is true, but it relies on the implementation of the __eq__ special method. As the docs say,
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality. A
motivation for this default behavior is the desire that all objects
should be reflexive (i.e. x is y implies x == y).
and in the interests of consistency, recommends:
Equality comparison should be reflexive. In other words, identical
objects should compare equal:
x is y implies x == y
We can see that this is the default behavior for custom objects:
>>> class Object(object): pass
>>> obj = Object()
>>> obj2 = Object()
>>> obj == obj, obj is obj
(True, True)
>>> obj == obj2, obj is obj2
(False, False)
The contrapositive is also usually true - if somethings test as not equal, you can usually infer that they are not the same object.
Since tests for equality can be customized, this inference does not always hold true for all types.
An exception
A notable exception is nan - it always tests as not equal to itself:
>>> nan = float('nan')
>>> nan
nan
>>> nan is nan
True
>>> nan == nan # !!!!!
False
Checking for identity can be much a much quicker check than checking for equality (which might require recursively checking members).
But it cannot be substituted for equality where you may find more than one object as equivalent.
Note that comparing equality of lists and tuples will assume that identity of objects are equal (because this is a fast check). This can create contradictions if the logic is inconsistent - as it is for nan:
>>> [nan] == [nan]
True
>>> (nan,) == (nan,)
True
A Cautionary Tale:
The question is attempting to use is to compare integers. You shouldn't assume that an instance of an integer is the same instance as one obtained by another reference. This story explains why.
A commenter had code that relied on the fact that small integers (-5 to 256 inclusive) are singletons in Python, instead of checking for equality.
Wow, this can lead to some insidious bugs. I had some code that checked if a is b, which worked as I wanted because a and b are typically small numbers. The bug only happened today, after six months in production, because a and b were finally large enough to not be cached. – gwg
It worked in development. It may have passed some unittests.
And it worked in production - until the code checked for an integer larger than 256, at which point it failed in production.
This is a production failure that could have been caught in code review or possibly with a style-checker.
Let me emphasize: do not use is to compare integers.
== determines if the values are equal, while is determines if they are the exact same object.
What's the difference between is and ==?
== and is are different comparison! As others already said:
== compares the values of the objects.
is compares the references of the objects.
In Python names refer to objects, for example in this case value1 and value2 refer to an int instance storing the value 1000:
value1 = 1000
value2 = value1
Because value2 refers to the same object is and == will give True:
>>> value1 == value2
True
>>> value1 is value2
True
In the following example the names value1 and value2 refer to different int instances, even if both store the same integer:
>>> value1 = 1000
>>> value2 = 1000
Because the same value (integer) is stored == will be True, that's why it's often called "value comparison". However is will return False because these are different objects:
>>> value1 == value2
True
>>> value1 is value2
False
When to use which?
Generally is is a much faster comparison. That's why CPython caches (or maybe reuses would be the better term) certain objects like small integers, some strings, etc. But this should be treated as implementation detail that could (even if unlikely) change at any point without warning.
You should only use is if you:
want to check if two objects are really the same object (not just the same "value"). One example can be if you use a singleton object as constant.
want to compare a value to a Python constant. The constants in Python are:
None
True1
False1
NotImplemented
Ellipsis
__debug__
classes (for example int is int or int is float)
there could be additional constants in built-in modules or 3rd party modules. For example np.ma.masked from the NumPy module)
In every other case you should use == to check for equality.
Can I customize the behavior?
There is some aspect to == that hasn't been mentioned already in the other answers: It's part of Pythons "Data model". That means its behavior can be customized using the __eq__ method. For example:
class MyClass(object):
def __init__(self, val):
self._value = val
def __eq__(self, other):
print('__eq__ method called')
try:
return self._value == other._value
except AttributeError:
raise TypeError('Cannot compare {0} to objects of type {1}'
.format(type(self), type(other)))
This is just an artificial example to illustrate that the method is really called:
>>> MyClass(10) == MyClass(10)
__eq__ method called
True
Note that by default (if no other implementation of __eq__ can be found in the class or the superclasses) __eq__ uses is:
class AClass(object):
def __init__(self, value):
self._value = value
>>> a = AClass(10)
>>> b = AClass(10)
>>> a == b
False
>>> a == a
So it's actually important to implement __eq__ if you want "more" than just reference-comparison for custom classes!
On the other hand you cannot customize is checks. It will always compare just if you have the same reference.
Will these comparisons always return a boolean?
Because __eq__ can be re-implemented or overridden, it's not limited to return True or False. It could return anything (but in most cases it should return a boolean!).
For example with NumPy arrays the == will return an array:
>>> import numpy as np
>>> np.arange(10) == 2
array([False, False, True, False, False, False, False, False, False, False], dtype=bool)
But is checks will always return True or False!
1 As Aaron Hall mentioned in the comments:
Generally you shouldn't do any is True or is False checks because one normally uses these "checks" in a context that implicitly converts the condition to a boolean (for example in an if statement). So doing the is True comparison and the implicit boolean cast is doing more work than just doing the boolean cast - and you limit yourself to booleans (which isn't considered pythonic).
Like PEP8 mentions:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
They are completely different. is checks for object identity, while == checks for equality (a notion that depends on the two operands' types).
It is only a lucky coincidence that "is" seems to work correctly with small integers (e.g. 5 == 4+1). That is because CPython optimizes the storage of integers in the range (-5 to 256) by making them singletons. This behavior is totally implementation-dependent and not guaranteed to be preserved under all manner of minor transformative operations.
For example, Python 3.5 also makes short strings singletons, but slicing them disrupts this behavior:
>>> "foo" + "bar" == "foobar"
True
>>> "foo" + "bar" is "foobar"
True
>>> "foo"[:] + "bar" == "foobar"
True
>>> "foo"[:] + "bar" is "foobar"
False
https://docs.python.org/library/stdtypes.html#comparisons
is tests for identity
== tests for equality
Each (small) integer value is mapped to a single value, so every 3 is identical and equal. This is an implementation detail, not part of the language spec though
Your answer is correct. The is operator compares the identity of two objects. The == operator compares the values of two objects.
An object's identity never changes once it has been created; you may think of it as the object's address in memory.
You can control comparison behaviour of object values by defining a __cmp__ method or a rich comparison method like __eq__.
Have a look at Stack Overflow question Python's “is” operator behaves unexpectedly with integers.
What it mostly boils down to is that "is" checks to see if they are the same object, not just equal to each other (the numbers below 256 are a special case).
In a nutshell, is checks whether two references point to the same object or not.== checks whether two objects have the same value or not.
a=[1,2,3]
b=a #a and b point to the same object
c=list(a) #c points to different object
if a==b:
print('#') #output:#
if a is b:
print('##') #output:##
if a==c:
print('###') #output:##
if a is c:
print('####') #no output as c and a point to different object
As the other people in this post answer the question in details the difference between == and is for comparing Objects or variables, I would emphasize mainly the comparison between is and == for strings which can give different results and I would urge programmers to carefully use them.
For string comparison, make sure to use == instead of is:
str = 'hello'
if (str is 'hello'):
print ('str is hello')
if (str == 'hello'):
print ('str == hello')
Out:
str is hello
str == hello
But in the below example == and is will get different results:
str2 = 'hello sam'
if (str2 is 'hello sam'):
print ('str2 is hello sam')
if (str2 == 'hello sam'):
print ('str2 == hello sam')
Out:
str2 == hello sam
Conclusion and Analysis:
Use is carefully to compare between strings.
Since is for comparing objects and since in Python 3+ every variable such as string interpret as an object, let's see what happened in above paragraphs.
In python there is id function that shows a unique constant of an object during its lifetime. This id is using in back-end of Python interpreter to compare two objects using is keyword.
str = 'hello'
id('hello')
> 140039832615152
id(str)
> 140039832615152
But
str2 = 'hello sam'
id('hello sam')
> 140039832615536
id(str2)
> 140039832615792
As John Feminella said, most of the time you will use == and != because your objective is to compare values. I'd just like to categorise what you would do the rest of the time:
There is one and only one instance of NoneType i.e. None is a singleton. Consequently foo == None and foo is None mean the same. However the is test is faster and the Pythonic convention is to use foo is None.
If you are doing some introspection or mucking about with garbage collection or checking whether your custom-built string interning gadget is working or suchlike, then you probably have a use-case for foo is bar.
True and False are also (now) singletons, but there is no use-case for foo == True and no use case for foo is True.
Most of them already answered to the point. Just as an additional note (based on my understanding and experimenting but not from a documented source), the statement
== if the objects referred to by the variables are equal
from above answers should be read as
== if the objects referred to by the variables are equal and objects belonging to the same type/class
. I arrived at this conclusion based on the below test:
list1 = [1,2,3,4]
tuple1 = (1,2,3,4)
print(list1)
print(tuple1)
print(id(list1))
print(id(tuple1))
print(list1 == tuple1)
print(list1 is tuple1)
Here the contents of the list and tuple are same but the type/class are different.

Python, perplexity about "is" operator on integers [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

Resources