Related
I've noticed that many operations on lists that modify the list's contents will return None, rather than returning the list itself. Examples:
>>> mylist = ['a', 'b', 'c']
>>> empty = mylist.clear()
>>> restored = mylist.extend(range(3))
>>> backwards = mylist.reverse()
>>> with_four = mylist.append(4)
>>> in_order = mylist.sort()
>>> without_one = mylist.remove(1)
>>> mylist
[0, 2, 4]
>>> [empty, restored, backwards, with_four, in_order, without_one]
[None, None, None, None, None, None]
What is the thought process behind this decision?
To me, it seems hampering, since it prevents "chaining" of list processing (e.g. mylist.reverse().append('a string')[:someLimit]). I imagine it might be that "The Powers That Be" decided that list comprehension is a better paradigm (a valid opinion), and so didn't want to encourage other methods - but it seems perverse to prevent an intuitive method, even if better alternatives exist.
This question is specifically about Python's design decision to return None from mutating list methods like .append. Novices often write incorrect code that expects .append (in particular) to return the same list that was just modified.
For the simple question of "how do I append to a list?" (or debugging questions that boil down to that problem), see Why does "x = x.append([i])" not work in a for loop?.
To get modified versions of the list, see:
For .sort: How can I get a sorted copy of a list?
For .reverse: How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)?
The same issue applies to some methods of other built-in data types, e.g. set.discard (see How to remove specific element from sets inside a list using list comprehension) and dict.update (see Why doesn't a python dict.update() return the object?).
The same reasoning applies to designing your own APIs. See Is making in-place operations return the object a bad idea?.
The general design principle in Python is for functions that mutate an object in-place to return None. I'm not sure it would have been the design choice I'd have chosen, but it's basically to emphasise that a new object is not returned.
Guido van Rossum (our Python BDFL) states the design choice on the Python-Dev mailing list:
I'd like to explain once more why I'm so adamant that sort() shouldn't
return 'self'.
This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability; it requires that the
reader must be intimately familiar with each of the methods. The
second form makes it clear that each of these calls acts on the same
object, and so even if you don't know the class and its methods very
well, you can understand that the second and third call are applied to
x (and that all calls are made for their side-effects), and not to
something else.
I'd like to reserve chaining for operations that return new values,
like string processing operations:
y = x.rstrip("\n").split(":").lower()
There are a few standard library modules that encourage chaining of
side-effect calls (pstat comes to mind). There shouldn't be any new
ones; pstat slipped through my filter when it was weak.
I can't speak for the developers, but I find this behavior very intuitive.
If a method works on the original object and modifies it in-place, it doesn't return anything, because there is no new information - you obviously already have a reference to the (now mutated) object, so why return it again?
If, however, a method or function creates a new object, then of course it has to return it.
So l.reverse() returns nothing (because now the list has been reversed, but the identfier l still points to that list), but reversed(l) has to return the newly generated list because l still points to the old, unmodified list.
EDIT: I just learned from another answer that this principle is called Command-Query separation.
One could argue that the signature itself makes it clear that the function mutates the list rather than returning a new one: if the function returned a list, its behavior would have been much less obvious.
If you were sent here after asking for help fixing your code:
In the future, please try to look for problems in the code yourself, by carefully studying what happens when the code runs. Rather than giving up because there is an error message, check the result of each calculation, and see where the code starts working differently from what you expect.
If you had code calling a method like .append or .sort on a list, you will notice that the return value is None, while the list is modified in place. Study the example carefully:
>>> x = ['e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> y = x.sort()
>>> print(y)
None
>>> print(x)
['a', 'e', 'e', 'l', 'm', 'p', 'x']
y got the special None value, because that is what was returned. x changed, because the sort happened in place.
It works this way on purpose, so that code like x.sort().reverse() breaks. See the other answers to understand why the Python developers wanted it that way.
To fix the problem
First, think carefully about the intent of the code. Should x change? Do we actually need a separate y?
Let's consider .sort first. If x should change, then call x.sort() by itself, without assigning the result anywhere.
If a sorted copy is needed instead, use y = x.sorted(). See How can I get a sorted copy of a list? for details.
For other methods, we can get modified copies like so:
.clear -> there is no point to this; a "cleared copy" of the list is just an empty list. Just use y = [].
.append and .extend -> probably the simplest way is to use the + operator. To add multiple elements from a list l, use y = x + l rather than .extend. To add a single element e wrap it in a list first: y = x + [e]. Another way in 3.5 and up is to use unpacking: y = [*x, *l] for .extend, y = [*x, e] for .append. See also How to allow list append() method to return the new list for .append and How do I concatenate two lists in Python? for .extend.
.reverse -> First, consider whether an actual copy is needed. The built-in reversed gives you an iterator that can be used to loop over the elements in reverse order. To make an actual copy, simply pass that iterator to list: y = list(reversed(x)). See How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)? for details.
.remove -> Figure out the index of the element that will be removed (using .index), then use slicing to find the elements before and after that point and put them together. As a function:
def without(a_list, value):
index = a_list.index(value)
return a_list[:index] + a_list[index+1:]
(We can translate .pop similarly to make a modified copy, though of course .pop actually returns an element from the list.)
See also A quick way to return list without a specific element in Python.
(If you plan to remove multiple elements, strongly consider using a list comprehension (or filter) instead. It will be much simpler than any of the workarounds needed for removing items from the list while iterating over it. This way also naturally gives a modified copy.)
For any of the above, of course, we can also make a modified copy by explicitly making a copy and then using the in-place method on the copy. The most elegant approach will depend on the context and on personal taste.
As we know list in python is a mutable object and one of characteristics of mutable object is the ability to modify the state of this object without the need to assign its new state to a variable. we should demonstrate more about this topic to understand the root of this issue.
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created. Object mutability is one of the characteristics that makes Python a dynamically typed language.
Every object in python has three attributes:
Identity – This refers to the address that the object refers to in the computer’s memory.
Type – This refers to the kind of object that is created. For example integer, list, string etc.
Value – This refers to the value stored by the object. For example str = "a".
While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.
let us discuss the below code step-by-step to depict what it means in Python:
Creating a list which contains name of cities
cities = ['London', 'New York', 'Chicago']
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [1]: 0x1691d7de8c8
Adding a new city to the list cities
cities.append('Delhi')
Printing the elements from the list cities, separated by a comma
for city in cities:
print(city, end=', ')
Output [2]: London, New York, Chicago, Delhi
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [3]: 0x1691d7de8c8
The above example shows us that we were able to change the internal state of the object cities by adding one more city 'Delhi' to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name cities is a MUTABLE OBJECT.
While the immutable object internal state can not be changed. For instance, consider the below code and associated error message with it, while trying to change the value of a Tuple at index 0
Creating a Tuple with variable name foo
foo = (1, 2)
Changing the index 0 value from 1 to 3
foo[0] = 3
TypeError: 'tuple' object does not support item assignment
We can conclude from the examples why mutable object shouldn't return anything when executing operations on it because it's modifying the internal state of the object directly and there is no point in returning new modified object. unlike immutable object which should return new object of the modified state after executing operations on it.
First of All, I should tell that what I am suggesting is without a doubt, a bad programming practice but if you want to use append in lambda function and you don't care about the code readability, there is way to just do that.
Imagine you have a list of lists and you want to append a element to each inner lists using map and lambda. here is how you can do that:
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: [x.append(my_new_element), x][1], my_list))
print(new_list)
How it works:
when lambda wants to calculate to output, first it should calculate the [x.append(my_new_element), x] expression. To calculate this expression the append function will run and the result of expression will be [None, x] and by specifying that you want the second element of the list the result of [None,x][1] will be x
Using custom function is more readable and the better option:
def append_my_list(input_list, new_element):
input_list.append(new_element)
return input_list
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: append_my_list(x, my_new_element), my_list))
print(new_list)
What's the difference between tuples/lists and what are their advantages/disadvantages?
Apart from tuples being immutable there is also a semantic distinction that should guide their usage. Tuples are heterogeneous data structures (i.e., their entries have different meanings), while lists are homogeneous sequences. Tuples have structure, lists have order.
Using this distinction makes code more explicit and understandable.
One example would be pairs of page and line number to reference locations in a book, e.g.:
my_location = (42, 11) # page number, line number
You can then use this as a key in a dictionary to store notes on locations. A list on the other hand could be used to store multiple locations. Naturally one might want to add or remove locations from the list, so it makes sense that lists are mutable. On the other hand it doesn't make sense to add or remove items from an existing location - hence tuples are immutable.
There might be situations where you want to change items within an existing location tuple, for example when iterating through the lines of a page. But tuple immutability forces you to create a new location tuple for each new value. This seems inconvenient on the face of it, but using immutable data like this is a cornerstone of value types and functional programming techniques, which can have substantial advantages.
There are some interesting articles on this issue, e.g. "Python Tuples are Not Just Constant Lists" or "Understanding tuples vs. lists in Python". The official Python documentation also mentions this
"Tuples are immutable, and usually contain an heterogeneous sequence ...".
In a statically typed language like Haskell the values in a tuple generally have different types and the length of the tuple must be fixed. In a list the values all have the same type and the length is not fixed. So the difference is very obvious.
Finally there is the namedtuple in Python, which makes sense because a tuple is already supposed to have structure. This underlines the idea that tuples are a light-weight alternative to classes and instances.
Difference between list and tuple
Literal
someTuple = (1,2)
someList = [1,2]
Size
a = tuple(range(1000))
b = list(range(1000))
a.__sizeof__() # 8024
b.__sizeof__() # 9088
Due to the smaller size of a tuple operation, it becomes a bit faster, but not that much to mention about until you have a huge number of elements.
Permitted operations
b = [1,2]
b[0] = 3 # [3, 2]
a = (1,2)
a[0] = 3 # Error
That also means that you can't delete an element or sort a tuple.
However, you could add a new element to both list and tuple with the only difference that since the tuple is immutable, you are not really adding an element but you are creating a new tuple, so the id of will change
a = (1,2)
b = [1,2]
id(a) # 140230916716520
id(b) # 748527696
a += (3,) # (1, 2, 3)
b += [3] # [1, 2, 3]
id(a) # 140230916878160
id(b) # 748527696
Usage
As a list is mutable, it can't be used as a key in a dictionary, whereas a tuple can be used.
a = (1,2)
b = [1,2]
c = {a: 1} # OK
c = {b: 1} # Error
If you went for a walk, you could note your coordinates at any instant in an (x,y) tuple.
If you wanted to record your journey, you could append your location every few seconds to a list.
But you couldn't do it the other way around.
The key difference is that tuples are immutable. This means that you cannot change the values in a tuple once you have created it.
So if you're going to need to change the values use a List.
Benefits to tuples:
Slight performance improvement.
As a tuple is immutable it can be used as a key in a dictionary.
If you can't change it neither can anyone else, which is to say you don't need to worry about any API functions etc. changing your tuple without being asked.
Lists are mutable; tuples are not.
From docs.python.org/2/tutorial/datastructures.html
Tuples are immutable, and usually contain an heterogeneous sequence of
elements that are accessed via unpacking (see later in this section)
or indexing (or even by attribute in the case of namedtuples). Lists
are mutable, and their elements are usually homogeneous and are
accessed by iterating over the list.
This is an example of Python lists:
my_list = [0,1,2,3,4]
top_rock_list = ["Bohemian Rhapsody","Kashmir","Sweet Emotion", "Fortunate Son"]
This is an example of Python tuple:
my_tuple = (a,b,c,d,e)
celebrity_tuple = ("John", "Wayne", 90210, "Actor", "Male", "Dead")
Python lists and tuples are similar in that they both are ordered collections of values. Besides the shallow difference that lists are created using brackets "[ ... , ... ]" and tuples using parentheses "( ... , ... )", the core technical "hard coded in Python syntax" difference between them is that the elements of a particular tuple are immutable whereas lists are mutable (...so only tuples are hashable and can be used as dictionary/hash keys!). This gives rise to differences in how they can or can't be used (enforced a priori by syntax) and differences in how people choose to use them (encouraged as 'best practices,' a posteriori, this is what smart programers do). The main difference a posteriori in differentiating when tuples are used versus when lists are used lies in what meaning people give to the order of elements.
For tuples, 'order' signifies nothing more than just a specific 'structure' for holding information. What values are found in the first field can easily be switched into the second field as each provides values across two different dimensions or scales. They provide answers to different types of questions and are typically of the form: for a given object/subject, what are its attributes? The object/subject stays constant, the attributes differ.
For lists, 'order' signifies a sequence or a directionality. The second element MUST come after the first element because it's positioned in the 2nd place based on a particular and common scale or dimension. The elements are taken as a whole and mostly provide answers to a single question typically of the form, for a given attribute, how do these objects/subjects compare? The attribute stays constant, the object/subject differs.
There are countless examples of people in popular culture and programmers who don't conform to these differences and there are countless people who might use a salad fork for their main course. At the end of the day, it's fine and both can usually get the job done.
To summarize some of the finer details
Similarities:
Duplicates - Both tuples and lists allow for duplicates
Indexing, Selecting, & Slicing - Both tuples and lists index using integer values found within brackets. So, if you want the first 3 values of a given list or tuple, the syntax would be the same:
>>> my_list[0:3]
[0,1,2]
>>> my_tuple[0:3]
[a,b,c]
Comparing & Sorting - Two tuples or two lists are both compared by their first element, and if there is a tie, then by the second element, and so on. No further attention is paid to subsequent elements after earlier elements show a difference.
>>> [0,2,0,0,0,0]>[0,0,0,0,0,500]
True
>>> (0,2,0,0,0,0)>(0,0,0,0,0,500)
True
Differences: - A priori, by definition
Syntax - Lists use [], tuples use ()
Mutability - Elements in a given list are mutable, elements in a given tuple are NOT mutable.
# Lists are mutable:
>>> top_rock_list
['Bohemian Rhapsody', 'Kashmir', 'Sweet Emotion', 'Fortunate Son']
>>> top_rock_list[1]
'Kashmir'
>>> top_rock_list[1] = "Stairway to Heaven"
>>> top_rock_list
['Bohemian Rhapsody', 'Stairway to Heaven', 'Sweet Emotion', 'Fortunate Son']
# Tuples are NOT mutable:
>>> celebrity_tuple
('John', 'Wayne', 90210, 'Actor', 'Male', 'Dead')
>>> celebrity_tuple[5]
'Dead'
>>> celebrity_tuple[5]="Alive"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Hashtables (Dictionaries) - As hashtables (dictionaries) require that its keys are hashable and therefore immutable, only tuples can act as dictionary keys, not lists.
#Lists CAN'T act as keys for hashtables(dictionaries)
>>> my_dict = {[a,b,c]:"some value"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
#Tuples CAN act as keys for hashtables(dictionaries)
>>> my_dict = {("John","Wayne"): 90210}
>>> my_dict
{('John', 'Wayne'): 90210}
Differences - A posteriori, in usage
Homo vs. Heterogeneity of Elements - Generally list objects are homogenous and tuple objects are heterogeneous. That is, lists are used for objects/subjects of the same type (like all presidential candidates, or all songs, or all runners) whereas although it's not forced by), whereas tuples are more for heterogenous objects.
Looping vs. Structures - Although both allow for looping (for x in my_list...), it only really makes sense to do it for a list. Tuples are more appropriate for structuring and presenting information (%s %s residing in %s is an %s and presently %s % ("John","Wayne",90210, "Actor","Dead"))
It's been mentioned that the difference is largely semantic: people expect a tuple and list to represent different information. But this goes further than a guideline; some libraries actually behave differently based on what they are passed. Take NumPy for example (copied from another post where I ask for more examples):
>>> import numpy as np
>>> a = np.arange(9).reshape(3,3)
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> idx = (1,1)
>>> a[idx]
4
>>> idx = [1,1]
>>> a[idx]
array([[3, 4, 5],
[3, 4, 5]])
The point is, while NumPy may not be part of the standard library, it's a major Python library, and within NumPy lists and tuples are completely different things.
Lists are for looping, tuples are for structures i.e. "%s %s" %tuple.
Lists are usually homogeneous, tuples are usually heterogeneous.
Lists are for variable length, tuples are for fixed length.
The values of list can be changed any time but the values of tuples can't be change.
The advantages and disadvantages depends upon the use. If you have such a data which you never want to change then you should have to use tuple, otherwise list is the best option.
Difference between list and tuple
Tuples and lists are both seemingly similar sequence types in Python.
Literal syntax
We use parenthesis () to construct tuples and square brackets [ ] to get a new list. Also, we can use call of the appropriate type to get required structure — tuple or list.
someTuple = (4,6)
someList = [2,6]
Mutability
Tuples are immutable, while lists are mutable. This point is the base the for the following ones.
Memory usage
Due to mutability, you need more memory for lists and less memory for tuples.
Extending
You can add a new element to both tuples and lists with the only difference that the id of the tuple will be changed (i.e., we’ll have a new object).
Hashing
Tuples are hashable and lists are not. It means that you can use a tuple as a key in a dictionary. The list can't be used as a key in a dictionary, whereas a tuple can be used
tup = (1,2)
list_ = [1,2]
c = {tup : 1} # ok
c = {list_ : 1} # error
Semantics
This point is more about best practice. You should use tuples as heterogeneous data structures, while lists are homogenous sequences.
Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures.
As people have already answered here that tuples are immutable while lists are mutable, but there is one important aspect of using tuples which we must remember
If the tuple contains a list or a dictionary inside it, those can be changed even if the tuple itself is immutable.
For example, let's assume we have a tuple which contains a list and a dictionary as
my_tuple = (10,20,30,[40,50],{ 'a' : 10})
we can change the contents of the list as
my_tuple[3][0] = 400
my_tuple[3][1] = 500
which makes new tuple looks like
(10, 20, 30, [400, 500], {'a': 10})
we can also change the dictionary inside tuple as
my_tuple[4]['a'] = 500
which will make the overall tuple looks like
(10, 20, 30, [400, 500], {'a': 500})
This happens because list and dictionary are the objects and these objects are not changing, but the contents its pointing to.
So the tuple remains immutable without any exception
The PEP 484 -- Type Hints says that the types of elements of a tuple can be individually typed; so that you can say Tuple[str, int, float]; but a list, with List typing class can take only one type parameter: List[str], which hints that the difference of the 2 really is that the former is heterogeneous, whereas the latter intrinsically homogeneous.
Also, the standard library mostly uses the tuple as a return value from such standard functions where the C would return a struct.
As people have already mentioned the differences I will write about why tuples.
Why tuples are preferred?
Allocation optimization for small tuples
To reduce memory fragmentation and speed up allocations, Python reuses old tuples. If a
tuple no longer needed and has less than 20 items instead of deleting
it permanently Python moves it to a free list.
A free list is divided into 20 groups, where each group represents a
list of tuples of length n between 0 and 20. Each group can store up
to 2 000 tuples. The first (zero) group contains only 1 element and
represents an empty tuple.
>>> a = (1,2,3)
>>> id(a)
4427578104
>>> del a
>>> b = (1,2,4)
>>> id(b)
4427578104
In the example above we can see that a and b have the same id. That is
because we immediately occupied a destroyed tuple which was on the
free list.
Allocation optimization for lists
Since lists can be modified, Python does not use the same optimization as in tuples. However,
Python lists also have a free list, but it is used only for empty
objects. If an empty list is deleted or collected by GC, it can be
reused later.
>>> a = []
>>> id(a)
4465566792
>>> del a
>>> b = []
>>> id(b)
4465566792
Source: https://rushter.com/blog/python-lists-and-tuples/
Why tuples are efficient than lists? -> https://stackoverflow.com/a/22140115
The most important difference is time ! When you do not want to change the data inside the list better to use tuple ! Here is the example why use tuple !
import timeit
print(timeit.timeit(stmt='[1,2,3,4,5,6,7,8,9,10]', number=1000000)) #created list
print(timeit.timeit(stmt='(1,2,3,4,5,6,7,8,9,10)', number=1000000)) # created tuple
In this example we executed both statements 1 million times
Output :
0.136621
0.013722200000000018
Any one can clearly notice the time difference.
A direction quotation from the documentation on 5.3. Tuples and Sequences:
Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.
In other words, TUPLES are used to store group of elements where the contents/members of the group would not change while LISTS are used to store group of elements where the members of the group can change.
For instance, if i want to store IP of my network in a variable, it's best i used a tuple since the the IP is fixed. Like this my_ip = ('192.168.0.15', 33, 60). However, if I want to store group of IPs of places I would visit in the next 6 month, then I should use a LIST, since I will keep updating and adding new IP to the group. Like this
places_to_visit = [
('192.168.0.15', 33, 60),
('192.168.0.22', 34, 60),
('192.168.0.1', 34, 60),
('192.168.0.2', 34, 60),
('192.168.0.8', 34, 60),
('192.168.0.11', 34, 60)
]
First of all, they both are the non-scalar objects (also known as a compound objects) in Python.
Tuples, ordered sequence of elements (which can contain any object with no aliasing issue)
Immutable (tuple, int, float, str)
Concatenation using + (brand new tuple will be created of course)
Indexing
Slicing
Singleton (3,) # -> (3) instead of (3) # -> 3
List (Array in other languages), ordered sequence of values
Mutable
Singleton [3]
Cloning new_array = origin_array[:]
List comprehension [x**2 for x in range(1,7)] gives you
[1,4,9,16,25,36] (Not readable)
Using list may also cause an aliasing bug (two distinct paths
pointing to the same object).
Just a quick extension to list vs tuple responses:
Due to dynamic nature, list allocates more bit buckets than the actual memory required. This is done to prevent costly reallocation operation in case extra items are appended in the future.
On the other hand, being static, lightweight tuple object does not reserve extra memory required to store them.
Lists are mutable and tuples are immutable.
Just consider this example.
a = ["1", "2", "ra", "sa"] #list
b = ("1", "2", "ra", "sa") #tuple
Now change index values of list and tuple.
a[2] = 1000
print a #output : ['1', '2', 1000, 'sa']
b[2] = 1000
print b #output : TypeError: 'tuple' object does not support item assignment.
Hence proved the following code is invalid with tuple, because we attempted to update a tuple, which is not allowed.
Lists are mutable. whereas tuples are immutable. Accessing an offset element with index makes more sense in tuples than lists, Because the elements and their index cannot be changed.
List is mutable and tuples is immutable. The main difference between mutable and immutable is memory usage when you are trying to append an item.
When you create a variable, some fixed memory is assigned to the variable. If it is a list, more memory is assigned than actually used. E.g. if current memory assignment is 100 bytes, when you want to append the 101th byte, maybe another 100 bytes will be assigned (in total 200 bytes in this case).
However, if you know that you are not frequently add new elements, then you should use tuples. Tuples assigns exactly size of the memory needed, and hence saves memory, especially when you use large blocks of memory.
Why two zip objects (made using same two iterables) aren't equal?
list1 = [1, 2, 3, 4]
list2 = [5, 6, 7, 8]
a = zip(list1, list2)
b = zip(list1, list2)
print(a == b)
The above code is printing False.
Shouldn't it give `True' since the zip objects are similar?
The returned zip object is an iterator. To be able to test the value equality of it, potentially the entire iterator would need to be walked, which would exhaust it, so you'd never be able to use it after. And if only part of the iterator was walked during the test, it would be left in a "difficult to determine state", so it wouldn't be usable there either.
I'm not sure if zip is considered to be a "user-defined class", but note this line in the docs:
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves)
In theory they could have made a workaround that would allow equality testing of a zip object (in cases where the object that was zipped wasn't itself an iterator), but there was likely no need. I can't think of a scenario where you would need to test such a thing, and where comparing the object forced as a list isn't an option.
Simple problem.
I've got these lists:
alpha = [5,10,1,2]
beta = [1,5,2]
gamma = [5,2,87,100,1]
I thought I could sort them all with:
map(lambda x: list.sort(x),[alpha,beta,gamma])
Which doesn't work.
What is working though is:
a,b,c = map(lambda x: list.sort(x),[alpha,beta,gamma])
Can someone explain why I need to define a,b and c for this code to work?
Because map() is lazy (since Python 3). It returns a generator-like object that is only evaluated when its contents are asked for, for instance because you want to assign its individual elements to variables.
E.g., this also forces it to evaluate:
>>> list(map(lambda x: list.sort(x),[alpha,beta,gamma]))
But using map is a bit archaic, list comprehensions and generator comprehensions exist and are almost always more idiomatic. Also, list.sort(x) is a bit of an odd way to write x.sort() that may or may not work, avoid it.
[x.sort() for x in [alpha, beta, gamma]]
works as you expected.
But if you aren't interested in the result, building a list isn't relevant. What you really want is a simple for loop:
for x in [alpha, beta, gamma]:
x.sort()
Which is perfectly Pythonic, except that I maybe like this one even better in this fixed, simple case:
alpha.sort()
beta.sort()
gamma.sort()
can't get more explicit than that.
Its is because list.sort() returns None. It sorts the list in place
If you want to sort them lists and return it do this:
a,b,c = (sorted(l) for l in [alpha, beta, gamma])
In [10]: alpha.sort()
In [11]: sorted(alpha)
Out[11]: [1, 2, 5, 10]
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.