zip objects created using the same iterables not equal. Why? - python-3.x

Why two zip objects (made using same two iterables) aren't equal?
list1 = [1, 2, 3, 4]
list2 = [5, 6, 7, 8]
a = zip(list1, list2)
b = zip(list1, list2)
print(a == b)
The above code is printing False.
Shouldn't it give `True' since the zip objects are similar?

The returned zip object is an iterator. To be able to test the value equality of it, potentially the entire iterator would need to be walked, which would exhaust it, so you'd never be able to use it after. And if only part of the iterator was walked during the test, it would be left in a "difficult to determine state", so it wouldn't be usable there either.
I'm not sure if zip is considered to be a "user-defined class", but note this line in the docs:
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves)
In theory they could have made a workaround that would allow equality testing of a zip object (in cases where the object that was zipped wasn't itself an iterator), but there was likely no need. I can't think of a scenario where you would need to test such a thing, and where comparing the object forced as a list isn't an option.

Related

adding two lists works but using append() retuns none [duplicate]

I've noticed that many operations on lists that modify the list's contents will return None, rather than returning the list itself. Examples:
>>> mylist = ['a', 'b', 'c']
>>> empty = mylist.clear()
>>> restored = mylist.extend(range(3))
>>> backwards = mylist.reverse()
>>> with_four = mylist.append(4)
>>> in_order = mylist.sort()
>>> without_one = mylist.remove(1)
>>> mylist
[0, 2, 4]
>>> [empty, restored, backwards, with_four, in_order, without_one]
[None, None, None, None, None, None]
What is the thought process behind this decision?
To me, it seems hampering, since it prevents "chaining" of list processing (e.g. mylist.reverse().append('a string')[:someLimit]). I imagine it might be that "The Powers That Be" decided that list comprehension is a better paradigm (a valid opinion), and so didn't want to encourage other methods - but it seems perverse to prevent an intuitive method, even if better alternatives exist.
This question is specifically about Python's design decision to return None from mutating list methods like .append. Novices often write incorrect code that expects .append (in particular) to return the same list that was just modified.
For the simple question of "how do I append to a list?" (or debugging questions that boil down to that problem), see Why does "x = x.append([i])" not work in a for loop?.
To get modified versions of the list, see:
For .sort: How can I get a sorted copy of a list?
For .reverse: How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)?
The same issue applies to some methods of other built-in data types, e.g. set.discard (see How to remove specific element from sets inside a list using list comprehension) and dict.update (see Why doesn't a python dict.update() return the object?).
The same reasoning applies to designing your own APIs. See Is making in-place operations return the object a bad idea?.
The general design principle in Python is for functions that mutate an object in-place to return None. I'm not sure it would have been the design choice I'd have chosen, but it's basically to emphasise that a new object is not returned.
Guido van Rossum (our Python BDFL) states the design choice on the Python-Dev mailing list:
I'd like to explain once more why I'm so adamant that sort() shouldn't
return 'self'.
This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability; it requires that the
reader must be intimately familiar with each of the methods. The
second form makes it clear that each of these calls acts on the same
object, and so even if you don't know the class and its methods very
well, you can understand that the second and third call are applied to
x (and that all calls are made for their side-effects), and not to
something else.
I'd like to reserve chaining for operations that return new values,
like string processing operations:
y = x.rstrip("\n").split(":").lower()
There are a few standard library modules that encourage chaining of
side-effect calls (pstat comes to mind). There shouldn't be any new
ones; pstat slipped through my filter when it was weak.
I can't speak for the developers, but I find this behavior very intuitive.
If a method works on the original object and modifies it in-place, it doesn't return anything, because there is no new information - you obviously already have a reference to the (now mutated) object, so why return it again?
If, however, a method or function creates a new object, then of course it has to return it.
So l.reverse() returns nothing (because now the list has been reversed, but the identfier l still points to that list), but reversed(l) has to return the newly generated list because l still points to the old, unmodified list.
EDIT: I just learned from another answer that this principle is called Command-Query separation.
One could argue that the signature itself makes it clear that the function mutates the list rather than returning a new one: if the function returned a list, its behavior would have been much less obvious.
If you were sent here after asking for help fixing your code:
In the future, please try to look for problems in the code yourself, by carefully studying what happens when the code runs. Rather than giving up because there is an error message, check the result of each calculation, and see where the code starts working differently from what you expect.
If you had code calling a method like .append or .sort on a list, you will notice that the return value is None, while the list is modified in place. Study the example carefully:
>>> x = ['e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> y = x.sort()
>>> print(y)
None
>>> print(x)
['a', 'e', 'e', 'l', 'm', 'p', 'x']
y got the special None value, because that is what was returned. x changed, because the sort happened in place.
It works this way on purpose, so that code like x.sort().reverse() breaks. See the other answers to understand why the Python developers wanted it that way.
To fix the problem
First, think carefully about the intent of the code. Should x change? Do we actually need a separate y?
Let's consider .sort first. If x should change, then call x.sort() by itself, without assigning the result anywhere.
If a sorted copy is needed instead, use y = x.sorted(). See How can I get a sorted copy of a list? for details.
For other methods, we can get modified copies like so:
.clear -> there is no point to this; a "cleared copy" of the list is just an empty list. Just use y = [].
.append and .extend -> probably the simplest way is to use the + operator. To add multiple elements from a list l, use y = x + l rather than .extend. To add a single element e wrap it in a list first: y = x + [e]. Another way in 3.5 and up is to use unpacking: y = [*x, *l] for .extend, y = [*x, e] for .append. See also How to allow list append() method to return the new list for .append and How do I concatenate two lists in Python? for .extend.
.reverse -> First, consider whether an actual copy is needed. The built-in reversed gives you an iterator that can be used to loop over the elements in reverse order. To make an actual copy, simply pass that iterator to list: y = list(reversed(x)). See How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)? for details.
.remove -> Figure out the index of the element that will be removed (using .index), then use slicing to find the elements before and after that point and put them together. As a function:
def without(a_list, value):
index = a_list.index(value)
return a_list[:index] + a_list[index+1:]
(We can translate .pop similarly to make a modified copy, though of course .pop actually returns an element from the list.)
See also A quick way to return list without a specific element in Python.
(If you plan to remove multiple elements, strongly consider using a list comprehension (or filter) instead. It will be much simpler than any of the workarounds needed for removing items from the list while iterating over it. This way also naturally gives a modified copy.)
For any of the above, of course, we can also make a modified copy by explicitly making a copy and then using the in-place method on the copy. The most elegant approach will depend on the context and on personal taste.
As we know list in python is a mutable object and one of characteristics of mutable object is the ability to modify the state of this object without the need to assign its new state to a variable. we should demonstrate more about this topic to understand the root of this issue.
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created. Object mutability is one of the characteristics that makes Python a dynamically typed language.
Every object in python has three attributes:
Identity – This refers to the address that the object refers to in the computer’s memory.
Type – This refers to the kind of object that is created. For example integer, list, string etc.
Value – This refers to the value stored by the object. For example str = "a".
While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.
let us discuss the below code step-by-step to depict what it means in Python:
Creating a list which contains name of cities
cities = ['London', 'New York', 'Chicago']
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [1]: 0x1691d7de8c8
Adding a new city to the list cities
cities.append('Delhi')
Printing the elements from the list cities, separated by a comma
for city in cities:
print(city, end=', ')
Output [2]: London, New York, Chicago, Delhi
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [3]: 0x1691d7de8c8
The above example shows us that we were able to change the internal state of the object cities by adding one more city 'Delhi' to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name cities is a MUTABLE OBJECT.
While the immutable object internal state can not be changed. For instance, consider the below code and associated error message with it, while trying to change the value of a Tuple at index 0
Creating a Tuple with variable name foo
foo = (1, 2)
Changing the index 0 value from 1 to 3
foo[0] = 3
TypeError: 'tuple' object does not support item assignment
We can conclude from the examples why mutable object shouldn't return anything when executing operations on it because it's modifying the internal state of the object directly and there is no point in returning new modified object. unlike immutable object which should return new object of the modified state after executing operations on it.
First of All, I should tell that what I am suggesting is without a doubt, a bad programming practice but if you want to use append in lambda function and you don't care about the code readability, there is way to just do that.
Imagine you have a list of lists and you want to append a element to each inner lists using map and lambda. here is how you can do that:
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: [x.append(my_new_element), x][1], my_list))
print(new_list)
How it works:
when lambda wants to calculate to output, first it should calculate the [x.append(my_new_element), x] expression. To calculate this expression the append function will run and the result of expression will be [None, x] and by specifying that you want the second element of the list the result of [None,x][1] will be x
Using custom function is more readable and the better option:
def append_my_list(input_list, new_element):
input_list.append(new_element)
return input_list
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: append_my_list(x, my_new_element), my_list))
print(new_list)

Difference between tuples and lists [duplicate]

What's the difference between tuples/lists and what are their advantages/disadvantages?
Apart from tuples being immutable there is also a semantic distinction that should guide their usage. Tuples are heterogeneous data structures (i.e., their entries have different meanings), while lists are homogeneous sequences. Tuples have structure, lists have order.
Using this distinction makes code more explicit and understandable.
One example would be pairs of page and line number to reference locations in a book, e.g.:
my_location = (42, 11) # page number, line number
You can then use this as a key in a dictionary to store notes on locations. A list on the other hand could be used to store multiple locations. Naturally one might want to add or remove locations from the list, so it makes sense that lists are mutable. On the other hand it doesn't make sense to add or remove items from an existing location - hence tuples are immutable.
There might be situations where you want to change items within an existing location tuple, for example when iterating through the lines of a page. But tuple immutability forces you to create a new location tuple for each new value. This seems inconvenient on the face of it, but using immutable data like this is a cornerstone of value types and functional programming techniques, which can have substantial advantages.
There are some interesting articles on this issue, e.g. "Python Tuples are Not Just Constant Lists" or "Understanding tuples vs. lists in Python". The official Python documentation also mentions this
"Tuples are immutable, and usually contain an heterogeneous sequence ...".
In a statically typed language like Haskell the values in a tuple generally have different types and the length of the tuple must be fixed. In a list the values all have the same type and the length is not fixed. So the difference is very obvious.
Finally there is the namedtuple in Python, which makes sense because a tuple is already supposed to have structure. This underlines the idea that tuples are a light-weight alternative to classes and instances.
Difference between list and tuple
Literal
someTuple = (1,2)
someList = [1,2]
Size
a = tuple(range(1000))
b = list(range(1000))
a.__sizeof__() # 8024
b.__sizeof__() # 9088
Due to the smaller size of a tuple operation, it becomes a bit faster, but not that much to mention about until you have a huge number of elements.
Permitted operations
b = [1,2]
b[0] = 3 # [3, 2]
a = (1,2)
a[0] = 3 # Error
That also means that you can't delete an element or sort a tuple.
However, you could add a new element to both list and tuple with the only difference that since the tuple is immutable, you are not really adding an element but you are creating a new tuple, so the id of will change
a = (1,2)
b = [1,2]
id(a) # 140230916716520
id(b) # 748527696
a += (3,) # (1, 2, 3)
b += [3] # [1, 2, 3]
id(a) # 140230916878160
id(b) # 748527696
Usage
As a list is mutable, it can't be used as a key in a dictionary, whereas a tuple can be used.
a = (1,2)
b = [1,2]
c = {a: 1} # OK
c = {b: 1} # Error
If you went for a walk, you could note your coordinates at any instant in an (x,y) tuple.
If you wanted to record your journey, you could append your location every few seconds to a list.
But you couldn't do it the other way around.
The key difference is that tuples are immutable. This means that you cannot change the values in a tuple once you have created it.
So if you're going to need to change the values use a List.
Benefits to tuples:
Slight performance improvement.
As a tuple is immutable it can be used as a key in a dictionary.
If you can't change it neither can anyone else, which is to say you don't need to worry about any API functions etc. changing your tuple without being asked.
Lists are mutable; tuples are not.
From docs.python.org/2/tutorial/datastructures.html
Tuples are immutable, and usually contain an heterogeneous sequence of
elements that are accessed via unpacking (see later in this section)
or indexing (or even by attribute in the case of namedtuples). Lists
are mutable, and their elements are usually homogeneous and are
accessed by iterating over the list.
This is an example of Python lists:
my_list = [0,1,2,3,4]
top_rock_list = ["Bohemian Rhapsody","Kashmir","Sweet Emotion", "Fortunate Son"]
This is an example of Python tuple:
my_tuple = (a,b,c,d,e)
celebrity_tuple = ("John", "Wayne", 90210, "Actor", "Male", "Dead")
Python lists and tuples are similar in that they both are ordered collections of values. Besides the shallow difference that lists are created using brackets "[ ... , ... ]" and tuples using parentheses "( ... , ... )", the core technical "hard coded in Python syntax" difference between them is that the elements of a particular tuple are immutable whereas lists are mutable (...so only tuples are hashable and can be used as dictionary/hash keys!). This gives rise to differences in how they can or can't be used (enforced a priori by syntax) and differences in how people choose to use them (encouraged as 'best practices,' a posteriori, this is what smart programers do). The main difference a posteriori in differentiating when tuples are used versus when lists are used lies in what meaning people give to the order of elements.
For tuples, 'order' signifies nothing more than just a specific 'structure' for holding information. What values are found in the first field can easily be switched into the second field as each provides values across two different dimensions or scales. They provide answers to different types of questions and are typically of the form: for a given object/subject, what are its attributes? The object/subject stays constant, the attributes differ.
For lists, 'order' signifies a sequence or a directionality. The second element MUST come after the first element because it's positioned in the 2nd place based on a particular and common scale or dimension. The elements are taken as a whole and mostly provide answers to a single question typically of the form, for a given attribute, how do these objects/subjects compare? The attribute stays constant, the object/subject differs.
There are countless examples of people in popular culture and programmers who don't conform to these differences and there are countless people who might use a salad fork for their main course. At the end of the day, it's fine and both can usually get the job done.
To summarize some of the finer details
Similarities:
Duplicates - Both tuples and lists allow for duplicates
Indexing, Selecting, & Slicing - Both tuples and lists index using integer values found within brackets. So, if you want the first 3 values of a given list or tuple, the syntax would be the same:
>>> my_list[0:3]
[0,1,2]
>>> my_tuple[0:3]
[a,b,c]
Comparing & Sorting - Two tuples or two lists are both compared by their first element, and if there is a tie, then by the second element, and so on. No further attention is paid to subsequent elements after earlier elements show a difference.
>>> [0,2,0,0,0,0]>[0,0,0,0,0,500]
True
>>> (0,2,0,0,0,0)>(0,0,0,0,0,500)
True
Differences: - A priori, by definition
Syntax - Lists use [], tuples use ()
Mutability - Elements in a given list are mutable, elements in a given tuple are NOT mutable.
# Lists are mutable:
>>> top_rock_list
['Bohemian Rhapsody', 'Kashmir', 'Sweet Emotion', 'Fortunate Son']
>>> top_rock_list[1]
'Kashmir'
>>> top_rock_list[1] = "Stairway to Heaven"
>>> top_rock_list
['Bohemian Rhapsody', 'Stairway to Heaven', 'Sweet Emotion', 'Fortunate Son']
# Tuples are NOT mutable:
>>> celebrity_tuple
('John', 'Wayne', 90210, 'Actor', 'Male', 'Dead')
>>> celebrity_tuple[5]
'Dead'
>>> celebrity_tuple[5]="Alive"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Hashtables (Dictionaries) - As hashtables (dictionaries) require that its keys are hashable and therefore immutable, only tuples can act as dictionary keys, not lists.
#Lists CAN'T act as keys for hashtables(dictionaries)
>>> my_dict = {[a,b,c]:"some value"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
#Tuples CAN act as keys for hashtables(dictionaries)
>>> my_dict = {("John","Wayne"): 90210}
>>> my_dict
{('John', 'Wayne'): 90210}
Differences - A posteriori, in usage
Homo vs. Heterogeneity of Elements - Generally list objects are homogenous and tuple objects are heterogeneous. That is, lists are used for objects/subjects of the same type (like all presidential candidates, or all songs, or all runners) whereas although it's not forced by), whereas tuples are more for heterogenous objects.
Looping vs. Structures - Although both allow for looping (for x in my_list...), it only really makes sense to do it for a list. Tuples are more appropriate for structuring and presenting information (%s %s residing in %s is an %s and presently %s % ("John","Wayne",90210, "Actor","Dead"))
It's been mentioned that the difference is largely semantic: people expect a tuple and list to represent different information. But this goes further than a guideline; some libraries actually behave differently based on what they are passed. Take NumPy for example (copied from another post where I ask for more examples):
>>> import numpy as np
>>> a = np.arange(9).reshape(3,3)
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> idx = (1,1)
>>> a[idx]
4
>>> idx = [1,1]
>>> a[idx]
array([[3, 4, 5],
[3, 4, 5]])
The point is, while NumPy may not be part of the standard library, it's a major Python library, and within NumPy lists and tuples are completely different things.
Lists are for looping, tuples are for structures i.e. "%s %s" %tuple.
Lists are usually homogeneous, tuples are usually heterogeneous.
Lists are for variable length, tuples are for fixed length.
The values of list can be changed any time but the values of tuples can't be change.
The advantages and disadvantages depends upon the use. If you have such a data which you never want to change then you should have to use tuple, otherwise list is the best option.
Difference between list and tuple
Tuples and lists are both seemingly similar sequence types in Python.
Literal syntax
We use parenthesis () to construct tuples and square brackets [ ] to get a new list. Also, we can use call of the appropriate type to get required structure — tuple or list.
someTuple = (4,6)
someList = [2,6]
Mutability
Tuples are immutable, while lists are mutable. This point is the base the for the following ones.
Memory usage
Due to mutability, you need more memory for lists and less memory for tuples.
Extending
You can add a new element to both tuples and lists with the only difference that the id of the tuple will be changed (i.e., we’ll have a new object).
Hashing
Tuples are hashable and lists are not. It means that you can use a tuple as a key in a dictionary. The list can't be used as a key in a dictionary, whereas a tuple can be used
tup = (1,2)
list_ = [1,2]
c = {tup : 1} # ok
c = {list_ : 1} # error
Semantics
This point is more about best practice. You should use tuples as heterogeneous data structures, while lists are homogenous sequences.
Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures.
As people have already answered here that tuples are immutable while lists are mutable, but there is one important aspect of using tuples which we must remember
If the tuple contains a list or a dictionary inside it, those can be changed even if the tuple itself is immutable.
For example, let's assume we have a tuple which contains a list and a dictionary as
my_tuple = (10,20,30,[40,50],{ 'a' : 10})
we can change the contents of the list as
my_tuple[3][0] = 400
my_tuple[3][1] = 500
which makes new tuple looks like
(10, 20, 30, [400, 500], {'a': 10})
we can also change the dictionary inside tuple as
my_tuple[4]['a'] = 500
which will make the overall tuple looks like
(10, 20, 30, [400, 500], {'a': 500})
This happens because list and dictionary are the objects and these objects are not changing, but the contents its pointing to.
So the tuple remains immutable without any exception
The PEP 484 -- Type Hints says that the types of elements of a tuple can be individually typed; so that you can say Tuple[str, int, float]; but a list, with List typing class can take only one type parameter: List[str], which hints that the difference of the 2 really is that the former is heterogeneous, whereas the latter intrinsically homogeneous.
Also, the standard library mostly uses the tuple as a return value from such standard functions where the C would return a struct.
As people have already mentioned the differences I will write about why tuples.
Why tuples are preferred?
Allocation optimization for small tuples
To reduce memory fragmentation and speed up allocations, Python reuses old tuples. If a
tuple no longer needed and has less than 20 items instead of deleting
it permanently Python moves it to a free list.
A free list is divided into 20 groups, where each group represents a
list of tuples of length n between 0 and 20. Each group can store up
to 2 000 tuples. The first (zero) group contains only 1 element and
represents an empty tuple.
>>> a = (1,2,3)
>>> id(a)
4427578104
>>> del a
>>> b = (1,2,4)
>>> id(b)
4427578104
In the example above we can see that a and b have the same id. That is
because we immediately occupied a destroyed tuple which was on the
free list.
Allocation optimization for lists
Since lists can be modified, Python does not use the same optimization as in tuples. However,
Python lists also have a free list, but it is used only for empty
objects. If an empty list is deleted or collected by GC, it can be
reused later.
>>> a = []
>>> id(a)
4465566792
>>> del a
>>> b = []
>>> id(b)
4465566792
Source: https://rushter.com/blog/python-lists-and-tuples/
Why tuples are efficient than lists? -> https://stackoverflow.com/a/22140115
The most important difference is time ! When you do not want to change the data inside the list better to use tuple ! Here is the example why use tuple !
import timeit
print(timeit.timeit(stmt='[1,2,3,4,5,6,7,8,9,10]', number=1000000)) #created list
print(timeit.timeit(stmt='(1,2,3,4,5,6,7,8,9,10)', number=1000000)) # created tuple
In this example we executed both statements 1 million times
Output :
0.136621
0.013722200000000018
Any one can clearly notice the time difference.
A direction quotation from the documentation on 5.3. Tuples and Sequences:
Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.
In other words, TUPLES are used to store group of elements where the contents/members of the group would not change while LISTS are used to store group of elements where the members of the group can change.
For instance, if i want to store IP of my network in a variable, it's best i used a tuple since the the IP is fixed. Like this my_ip = ('192.168.0.15', 33, 60). However, if I want to store group of IPs of places I would visit in the next 6 month, then I should use a LIST, since I will keep updating and adding new IP to the group. Like this
places_to_visit = [
('192.168.0.15', 33, 60),
('192.168.0.22', 34, 60),
('192.168.0.1', 34, 60),
('192.168.0.2', 34, 60),
('192.168.0.8', 34, 60),
('192.168.0.11', 34, 60)
]
First of all, they both are the non-scalar objects (also known as a compound objects) in Python.
Tuples, ordered sequence of elements (which can contain any object with no aliasing issue)
Immutable (tuple, int, float, str)
Concatenation using + (brand new tuple will be created of course)
Indexing
Slicing
Singleton (3,) # -> (3) instead of (3) # -> 3
List (Array in other languages), ordered sequence of values
Mutable
Singleton [3]
Cloning new_array = origin_array[:]
List comprehension [x**2 for x in range(1,7)] gives you
[1,4,9,16,25,36] (Not readable)
Using list may also cause an aliasing bug (two distinct paths
pointing to the same object).
Just a quick extension to list vs tuple responses:
Due to dynamic nature, list allocates more bit buckets than the actual memory required. This is done to prevent costly reallocation operation in case extra items are appended in the future.
On the other hand, being static, lightweight tuple object does not reserve extra memory required to store them.
Lists are mutable and tuples are immutable.
Just consider this example.
a = ["1", "2", "ra", "sa"] #list
b = ("1", "2", "ra", "sa") #tuple
Now change index values of list and tuple.
a[2] = 1000
print a #output : ['1', '2', 1000, 'sa']
b[2] = 1000
print b #output : TypeError: 'tuple' object does not support item assignment.
Hence proved the following code is invalid with tuple, because we attempted to update a tuple, which is not allowed.
Lists are mutable. whereas tuples are immutable. Accessing an offset element with index makes more sense in tuples than lists, Because the elements and their index cannot be changed.
List is mutable and tuples is immutable. The main difference between mutable and immutable is memory usage when you are trying to append an item.
When you create a variable, some fixed memory is assigned to the variable. If it is a list, more memory is assigned than actually used. E.g. if current memory assignment is 100 bytes, when you want to append the 101th byte, maybe another 100 bytes will be assigned (in total 200 bytes in this case).
However, if you know that you are not frequently add new elements, then you should use tuples. Tuples assigns exactly size of the memory needed, and hence saves memory, especially when you use large blocks of memory.

What kind of comprehension is this?

One of my students found that, for ell (a list of string) and estr (a string), the following expression is True iff a member of ell is contained in estr:
any(t in estr for t in ell)
Can anyone explain why this is syntactically legal, and if so what does the comprehension generate?
This is a generator expression.
func_that_takes_any_iterable(i for i in iterable)
It's like a list comprehension, but a generator, meaning it only produces one element at a time:
>>> a = [i for i in range(10)]
>>> b = (i for i in range(10))
>>> print(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(b)
<generator object <genexpr> at 0x7fb9113fae40>
>>> print(list(b))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(list(b))
[]
When using a generator expression in isolation, the parentheses are required for syntax reasons. When creating one as an argument to a function, the syntax allows for the extra parentheses to be absent.
Most functions don't care what type of iterable they're given - list, tuple, dict, generator, etc - and generators are perfectly valid. They're also marginally more memory-efficient than list comprehensions, since they don't generate the entire thing up front. For all() and any(), this is especially good, since those methods short-circuit as soon as they would return False and True respectively.
As of python 3.8, the syntactic restrictions for the walrus operator := are similar - in isolation, it must be used inside its own set of parentheses, but inside another expression it can generally be used without them.
This is syntactically legal because it's not a list. Anything of the form (x for x in array) is a generator. Think of it like lazy list, which will generate answer only when you ask.
Now generator are also an iterable so it's perfectly fine for it to be inside an any() function.
willBeTrue = any(True for i in range(20))
So this will basically generate 20 Trues and any function looks for any true value; thus will return True.
Now coming to your expression:
ans = any(t in estr for t in ell)
This {t in estr} returns a boolean value. Now generator makes len(ell) amount of those boolean value and any thinks good.. since atleast one is true I return True.

List defined in __init__ seems to be global [duplicate]

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function called with no parameter to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function, or "together" with it?
Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Actually, this is not a design flaw, and it is not because of internals or performance. It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you think of it this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, the effbot (Fredrik Lundh) has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
Suppose you have the following code
fruits = ("apples", "bananas", "loganberries")
def eat(food=fruits):
...
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")
However, suppose later on in the code, I do something like
def some_random_function():
global fruits
fruits = ("blueberries", "mangos")
then if default parameters were bound at function execution rather than function declaration, I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.
The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) ); // does this work?
Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).
Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
The relevant part of the documentation:
Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:
def whats_on_the_telly(penguin=None):
if penguin is None:
penguin = []
penguin.append("property of the zoo")
return penguin
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff.
When you instantiate a list:
a = []
you expect to get a new list referenced by a.
Why should the a=[] in
def x(a=[]):
instantiate a new list on function definition and not on invocation?
It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller".
I think this is ambiguous instead:
def x(a=datetime.datetime.now()):
user, do you want a to default to the datetime corresponding to when you're defining or executing x?
In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation).
On the other hand, if the user wanted the definition-time mapping he could write:
b = datetime.datetime.now()
def x(a=b):
I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:
def x(static a=b):
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
class BananaBunch:
bananas = []
def addBanana(self, banana):
self.bananas.append(banana)
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.
Given a simple little function func defined as:
>>> def func(a = []):
... a.append(5)
When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.
So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).
Function Before Execution:
>>> def func(a = []):
... a.append(5)
...
After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):
>>> func.__defaults__
([],)
O.k, so an empty list as the single entry in __defaults__, just as expected.
Function After Execution:
Let's now execute this function:
>>> func()
Now, let's see those __defaults__ again:
>>> func.__defaults__
([5],)
Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:
>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)
So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use None as the default and then initialize in the function body:
def func(a = None):
# or: a = [] if a is None else a
if a is None:
a = []
Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.
To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you'll see how these are indeed refering to the same list instance:
>>> def func(a = []):
... a.append(5)
... return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True
All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
def bar(a=input('Did you just see me without calling the function?')):
pass # use raw_input in Py2
as you'll notice, input() is called before the process of building the function and binding it to the name bar is made.
I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:
1. Performance
def foo(arg=something_expensive_to_compute())):
...
If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.
2. Forcing bound parameters
A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:
funcs = [ lambda i=i: i for i in range(10)]
This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.
The only way to implement this otherwise would be to create a further closure with the i bound, ie:
def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]
3. Introspection
Consider the code:
def foo(a='test', b=100, c=[]):
print a,b,c
We can get information about the arguments and defaults using the inspect module, which
>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))
This information is very useful for things like document generation, metaprogramming, decorators etc.
Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:
_undefined = object() # sentinel value
def foo(a=_undefined, b=_undefined, c=_undefined)
if a is _undefined: a='test'
if b is _undefined: b=100
if c is _undefined: c=[]
However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.
5 points in defense of Python
Simplicity: The behavior is simple in the following sense:
Most people fall into this trap only once, not several times.
Consistency: Python always passes objects, not names.
The default parameter is, obviously, part of the function
heading (not the function body). It therefore ought to be evaluated
at module load time (and only at module load time, unless nested), not
at function call time.
Usefulness: As Frederik Lundh points out in his explanation
of "Default Parameter Values in Python", the
current behavior can be quite useful for advanced programming.
(Use sparingly.)
Sufficient documentation: In the most basic Python documentation,
the tutorial, the issue is loudly announced as
an "Important warning" in the first subsection of Section
"More on Defining Functions".
The warning even uses boldface,
which is rarely applied outside of headings.
RTFM: Read the fine manual.
Meta-learning: Falling into the trap is actually a very
helpful moment (at least if you are a reflective learner),
because you will subsequently better understand the point
"Consistency" above and that will
teach you a great deal about Python.
This behavior is easy explained by:
function (class etc.) declaration is executed only once, creating all default value objects
everything is passed by reference
So:
def x(a=0, b=[], c=[], d=0):
a = a + 1
b = b + [1]
c.append(1)
print a, b, c
a doesn't change - every assignment call creates new int object - new object is printed
b doesn't change - new array is build from default value and printed
c changes - operation is performed on same object - and it is printed
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
Example:
def foo(a=[]): # the same problematic function
a.append(5)
return a
>>> somevar = [1, 2] # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5] # usually expected [1, 2]
Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.
def foo(a=[]):
a = a[:] # a copy
a.append(5)
return a # or everything safe by one line: "return a + [5]"
Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying
Example problem for a similar SO question
class Test(object): # the original problematic class
def __init__(self, var1=[]):
self._var1 = var1
somevar = [1, 2] # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar # [1, 2, [1]] but usually expected [1, 2]
print t2._var1 # [1, 2, [1]] but usually expected [1, 2]
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )
Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.)
.)
2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..
3) In some cases is the mutable behavior of default parameters useful.
What you're asking is why this:
def func(a=[], b = 2):
pass
isn't internally equivalent to this:
def func(a=None, b = None):
a_default = lambda: []
b_default = lambda: 2
def actual_func(a=None, b=None):
if a is None: a = a_default()
if b is None: b = b_default()
return actual_func
func = func()
except for the case of explicitly calling func(None, None), which we'll ignore.
In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?
One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.
This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
>>> def foo(a):
a.append(5)
print a
>>> a = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]
No default values in sight in this code, but you get exactly the same problem.
The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).
Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.
If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
def foo(mutable_default_argument=[]): # make a list the default argument
"""function that uses a list"""
is almost exactly equivalent to this:
_a_list = [] # create a list in the globals
def foo(mutable_default_argument=_a_list): # make it the default argument
"""function that uses a list"""
del _a_list # remove globals name binding
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
seeing that the list is created before the function has finished compiling to a function object,
observing that the id is the same each time the list is referenced,
observing that the list stays changed when the function that uses it is called a second time,
observing the order in which the output is printed from the source (which I conveniently numbered for you):
example.py
print('1. Global scope being evaluated')
def create_list():
'''noisily create a list for usage as a kwarg'''
l = []
print('3. list being created and returned, id: ' + str(id(l)))
return l
print('2. example_function about to be compiled to an object')
def example_function(default_kwarg1=create_list()):
print('appending "a" in default default_kwarg1')
default_kwarg1.append("a")
print('list with id: ' + str(id(default_kwarg1)) +
' - is now: ' + repr(default_kwarg1))
print('4. example_function compiled: ' + repr(example_function))
if __name__ == '__main__':
print('5. calling example_function twice!:')
example_function()
example_function()
and running it with python example.py:
1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']
Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
def example_function_2(default_kwarg=None):
if default_kwarg is None:
default_kwarg = []
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.
As the tutorial section on control flow says:
If you don’t want the default to be shared between subsequent calls,
you can write the function like this instead:
def f(a, L=None):
if L is None:
L = []
L.append(a)
return L
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
def bar(a=[]):
print id(a)
a = a + [1]
print id(a)
return a
>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object
[1]
>>> id(bar.func_defaults[0])
4484370232
The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:
def a(): return []
def b(x=a()):
print x
Hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.
I agree it's a gotcha when you try to use default constructors, though.
It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?
def print_tuple(some_tuple=(1,2,3)):
print some_tuple
print_tuple() #1
print_tuple((1,2,3)) #2
I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):
#1
0 LOAD_GLOBAL 0 (print_tuple)
3 CALL_FUNCTION 0
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
#2
0 LOAD_GLOBAL 0 (print_tuple)
3 LOAD_CONST 4 ((1, 2, 3))
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)
As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.
This behavior is not surprising if you take the following into consideration:
The behavior of read-only class attributes upon assignment attempts, and that
Functions are objects (explained well in the accepted answer).
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
...all variables found outside of the innermost scope are
read-only (an attempt to write to such a variable will simply create a
new local variable in the innermost scope, leaving the identically
named outer variable unchanged).
Look back to the original example and consider the above points:
def foo(a=[]):
a.append(5)
return a
Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.
Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.
Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
def foo(a, L=None):
if L is None:
L = []
L.append(a)
return L
Taking (1) and (2) into account, one can see why this accomplishes the desired behavior:
When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
A simple workaround using None
>>> def bar(b, data=None):
... data = data or []
... data.append(b)
... return data
...
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]
It may be true that:
Someone is using every language/library feature, and
Switching the behavior here would be ill-advised, but
it is entirely consistent to hold to both of the features above and still make another point:
It is a confusing feature and it is unfortunate in Python.
The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.
It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,
The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.
I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).
As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.
Wrong Method (probably...):
def foo(list_arg=[5]):
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]
# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()
7
You can verify that they are one and the same object by using id:
>>> id(a)
5347866528
>>> id(b)
5347866528
Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)
The convention for achieving the desired result in Python is to
provide a default value of None and to document the actual behaviour
in the docstring.
This implementation ensures that each call to the function either receives the default list or else the list passed to the function.
Preferred Method:
def foo(list_arg=None):
"""
:param list_arg: A list of input values.
If none provided, used a list with a default value of 5.
"""
if not list_arg:
list_arg = [5]
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
>>> b
[5, 7]
c = foo([10])
c.append(11)
>>> c
[10, 11]
There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.
The solutions here are:
Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).
The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)
You can get round this by replacing the object (and therefore the tie with the scope):
def foo(a=[]):
a = list(a)
a.append(5)
return a
Ugly, but it works.
When we do this:
def foo(a=[]):
...
... we assign the argument a to an unnamed list, if the caller does not pass the value of a.
To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?
def foo(a=pavlo):
...
At any time, if the caller doesn't tell us what a is, we reuse pavlo.
If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.
So this is what you see (Remember, pavlo is initialized to []):
>>> foo()
[5]
Now, pavlo is [5].
Calling foo() again modifies pavlo again:
>>> foo()
[5, 5]
Specifying a when calling foo() ensures pavlo is not touched.
>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]
So, pavlo is still [5, 5].
>>> foo()
[5, 5, 5]
I sometimes exploit this behavior as an alternative to the following pattern:
singleton = None
def use_singleton():
global singleton
if singleton is None:
singleton = _make_singleton()
return singleton.use_me()
If singleton is only used by use_singleton, I like the following pattern as a replacement:
# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
return singleton.use_me()
I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.
We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.
import inspect
from copy import deepcopy # copy would fail on deep arguments like nested dicts
def sanify(function):
def wrapper(*a, **kw):
# store the default values
defaults = inspect.getargspec(function).defaults # for python2
# construct a new argument list
new_args = []
for i, arg in enumerate(defaults):
# allow passing positional arguments
if i in range(len(a)):
new_args.append(a[i])
else:
# copy the value
new_args.append(deepcopy(arg))
return function(*new_args, **kw)
return wrapper
Now let's redefine our function using this decorator:
#sanify
def foo(a=[]):
a.append(5)
return a
foo() # '[5]'
foo() # '[5]' -- as desired
This is particularly neat for functions that take multiple arguments. Compare:
# the 'correct' approach
def bar(a=None, b=None, c=None):
if a is None:
a = []
if b is None:
b = []
if c is None:
c = []
# finally do the actual work
with
# the nasty decorator hack
#sanify
def bar(a=[], b=[], c=[]):
# wow, works right out of the box!
It's important to note that the above solution breaks if you try to use keyword args, like so:
foo(a=[4])
The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
def example(errors=[]):
# statements
# Something went wrong
mistake = True
if mistake:
tryToFixIt(errors)
# Didn't work.. let's try again
tryToFixItAnotherway(errors)
# This time it worked
return errors
def tryToFixIt(err):
err.append('Attempt to fix it')
def tryToFixItAnotherway(err):
err.append('Attempt to fix it by another way')
def main():
for item in range(2):
errors = example()
print '\n'.join(errors)
main()
prints the following
Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way
This is not a design flaw. Anyone who trips over this is doing something wrong.
There are 3 cases I see where you might run into this problem:
You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.
The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.
Just change the function to be:
def notastonishinganymore(a = []):
'''The name is just a joke :)'''
a = a[:]
a.append(5)
return a
TLDR: Define-time defaults are consistent and strictly more expressive.
Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:
... # defining scope
def name(parameter=default): # ???
... # execution scope
The def name part must evaluate in the defining scope - we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.
Since parameter is a constant name, we can "evaluate" it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.
Now, when to evaluate default?
Consistency already says "at definition": everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.
The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:
def name(parameter=defined): # set default at definition time
...
def name(parameter=default): # delay default until execution time
parameter = default if parameter is None else parameter
...
Yes, this is a design flaw in Python
I've read all the other answers and I'm not convinced. This design does violate the principle of least astonishment.
The defaults could have been designed to be evaluated when the function is called, rather than when the function is defined. This is how Javascript does it:
function foo(a=[]) {
a.push(5);
return a;
}
console.log(foo()); // [5]
console.log(foo()); // [5]
console.log(foo()); // [5]
As further evidence that this is a design flaw, Python core developers are currently discussing introducing new syntax to fix this problem. See this article: Late-bound argument defaults for Python.
For even more evidence that this a design flaw, if you Google "Python gotchas", this design is mentioned as a gotcha, usually the first gotcha in the list, in the first 9 Google results (1, 2, 3, 4, 5, 6, 7, 8, 9). In contrast, if you Google "Javascript gotchas", the behaviour of default arguments in Javascript is not mentioned as a gotcha even once.
Gotchas, by definition, violate the principle of least astonishment. They astonish. Given there are superiour designs for the behaviour of default argument values, the inescapable conclusion is that Python's behaviour here represents a design flaw.

Where to use the core-python map function?

The best way to build efficiently is to understand the toolkit one is building with. However, while trying to understand the core functions of python, it occurred to me that the map function gave similar, if not the same, results as a generic generator expression.
Take the next bit of code as a simplified example.
These two objects, mapped and generated, behave astoundingly similar in whatever situation you throw them.
def concatenate(string1 = "", string2 = ""):
return string1.join(" ", string2)
foo = ["One", "Two"]
bar = ["Blue", "Green"]
mapped = map(concatenate, foo, bar)
generated = (concatenate(string1 = a, string2 = b) for a, b in zip(foo, bar))
Okay, I know that it is a longer line of code, but I find it hard to believe that's all of map's reason of existence, so in my quest to understand python.
What does map still do in python? Is it really just a relic of olden times, and if not, where can I best put this tool to use?
The reason both exist is because list comprehension returns a new list where as map (in Python3) returns a generator thus map is more memory efficient if you don't need the resulting list right away.
This can be considered a technicality though since often when you use list comprehension to do something a map can do, you overwrite the original variable:
a = [1, 2, 3]
# The following line creates a new list but
# since we assign that list to `a` we give the old list to
# the garbage collector
a = [x**2 for x in a]
# or
a = list(map(lambda x: x**2, a))
# both of which are basically the same.
The power of map can come into play if you aren't working with the same variable:
a = [1, 2, 3] # If we want to save this then we don't want to overwrite it
b = [x**2 for x in a] # A full new list is now in b
c = map(lambda x: x**2, a) # c is just a generator object.
print(a) # [1, 2, 3]
print(b) # [1, 4, 9]
print(c) # <map object ...>
for x in c:
print(x)
# We never create a full list from c, we just use each object as we go thus we save memory.
In the previous example b is a whole new list of memory where as c is just a generator object. When working with such a small a, b and c are probably pretty close in memory but if a was large c would be significantly less memory than b.
Most common use cases of map involve the first case, thus map has no real benefit but in the second case it is much more beneficial to use map.
For the record "one way to do something" from the zen is sorta vague. I could implement a merge sort the right way (i.e. perfect speed and space) but my code my look completely different then someone else who implemented merge sort. "one way" doesn't really mean only one way to code it, it more implies there is only one methodology to be used or one process.

Resources