Unexpected difference in behaviour with/out let block - scope

I have been using Flux.jl and have been confused by the differences when running code within a let block and without. The following example runs without error:
using Flux
p = rand(2)
function f(x)
f, b = p
x*f + b
end
data = reduce(hcat, [[x, f(x)] for x in 0:0.1:1.0])
p = rand(2)
θ = params(p)
loss(y) = sum((y .- f.(data[1,:])).^2)
for n in 1:1000
grads = Flux.gradient(θ) do
loss(data[2,:])
end
Flux.Optimise.update!(ADAM(), θ, grads)
end
However, wrapping the same code in a let block does not work as I expect:
using Flux
let
...
end
and produces the stacktrace:
MethodError: objects of type Float64 are not callable
Maybe you forgot to use an operator such as [36m*, ^, %, / etc. [39m?
Stacktrace:
[1] macro expansion
# ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0 [inlined]
[2] _pullback(ctx::Zygote.Context, f::Float64, args::Float64)
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:9
[3] (::Zygote.var"#1100#1104"{Zygote.Context, Float64})(x::Float64)
# Zygote ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:186
[4] _broadcast_getindex_evalf
# ./broadcast.jl:670 [inlined]
[5] _broadcast_getindex
# ./broadcast.jl:643 [inlined]
[6] getindex
# ./broadcast.jl:597 [inlined]
[7] copy
# ./broadcast.jl:899 [inlined]
[8] materialize
# ./broadcast.jl:860 [inlined]
[9] _broadcast
# ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:163 [inlined]
[10] adjoint
# ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:186 [inlined]
[11] _pullback
# ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:65 [inlined]
[12] _apply
# ./boot.jl:814 [inlined]
[13] adjoint
# ~/.julia/packages/Zygote/bJn8I/src/lib/lib.jl:200 [inlined]
[14] _pullback
# ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:65 [inlined]
[15] _pullback
# ./broadcast.jl:1297 [inlined]
[16] _pullback(::Zygote.Context, ::typeof(Base.Broadcast.broadcasted), ::Float64, ::Vector{Float64})
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[17] _pullback
# ./In[198]:32 [inlined]
[18] _pullback(::Zygote.Context, ::var"#loss#155", ::Vector{Float64}, ::Vector{Float64})
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[19] _pullback
# ./In[198]:37 [inlined]
[20] _pullback(::Zygote.Context, ::var"#152#156"{var"#loss#155", Matrix{Float64}})
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[21] pullback(f::Function, ps::Zygote.Params)
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:351
[22] gradient(f::Function, args::Zygote.Params)
# Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:75
[23] top-level scope
# In[198]:36
[24] eval
# ./boot.jl:373 [inlined]
[25] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
# Base ./loading.jl:1196
Whereas I had expected them to both behave identically (at least in isolation). I cannot tell much myself from the stacktrace, since I have no experience with the implementation of Flux.jl or Zygote.jl. But the problem seems to be something to do with the definition of the function f, since changing the definition of f to:
function f(x)
a, b = p
x*a + b
end
Allows both the let and letless versions to work. Of course, I could fix it like this and call it a day. But I am curious if anyone knows why the two versions work differently?
Note
(#v1.7) pkg> status Flux
Status `~/.julia/environments/v1.7/Project.toml`
[587475ba] Flux v0.12.8

This is very weird. So the main difference is that the first version is in the global scope, and the second version is in a local scope (the let block). For a repeatable local scope, we can put the code in a function, or rather just enough of the code to see the problem:
function g()
p = [1.0, 10.0] # for repeatability
function f(x)
f, b = p
x*f + b
end
println(f)
println(f(1))
println(f)
println(f(2))
end
The results:
julia> g()
f
11.0
1.0
ERROR: MethodError: objects of type Float64 are not callable
So f starts off as a function, and its first call does its job (11.0). But that call reassigns p[1] to f, so by the second call, calling f, or 1.0, fails.
I think this is rooted in Julia's scoping rules.
In the global scope version, the function name f is global, and it just happens to use a local variable f. The full rules are in the link and worth a read, but rest assured, assigning a local variable in a particular local scope does not affect anything in the global scope or in the other isolated local scopes.
In the let block version, everything is nested in one local scope. The function name f and the function variable f are both local. When you assign to a local variable that already exists in the current or enclosing local scopes, the existing variable is used. Just defining f(x) didn't run f, b = p immediately, but upon calling f(1), it ran and reassigned the local f.
Before reviewing the scoping rules, I myself didn't expect reassigning a function name to be possible, let alone by its own call. I suppose the implicit const of method definitions only applies to global scope. Local function names are no different from any other local variables, which are routinely reassigned in nested local scopes like for-loops, comprehensions, and function bodies.
As you've demonstrated, changing the variable name gets around this local reuse, but you could also explicitly create a new local variable in a nested local scope, separate from any that may exist in enclosing local scopes:
function f(x)
local f
f, b = p
x*f + b
end
P.S. An attempt at explaining the scope rules in an order that's easier to follow than the docs, though still intimidatingly long.
A global scope contains global variables and exists at the level of a file or in a module block. A global scope can enclose multiple isolated local scopes, which are created by any other block except if and begin. Any local scope contains its local variables and can enclose other local scopes.
If scope A encloses scope B, A is an enclosing/outer scope relative to B as the nested/inner scope. If scope B encloses a scope C, scope A also encloses scope C.
If a nested scope does not assign a variable of a particular name, it can use an existing variable of the same name from any enclosing scope.
If a nested scope assigns a variable, and no enclosing scope has a variable with the same name, then a new variable is created for that nested scope.
If a nested scope assigns a variable, and an enclosing local scope has a preexisting variable with the same name, then that enclosing local scope's variable is reassigned, by default. Simple example is my_sum = 0; for i in 1:3 my_sum += i end. To create a new variable for that nested scope, instead, use the local <name> statement in the nested local scope .
If a nested scope assigns a variable, and only the global scope has a preexisting variable with the same name, then a new local variable is created, by default. This difference is because unlike a local scope, a global scope can be scattered across multiple included files. Nobody wants to pore over scattered files to prevent one file's code from accidentally reassigning another file's global variable. To reassign a global variable, instead, use the global <name> statement in the nested local scope.
Be aware that each iteration of loops and comprehensions makes its own local scope with its own local variables, though they share enclosing scopes' variables. The docs give an example of anonymous functions capturing iteration-local variables: for j = 1:2 Fs[j] = ()->j end; note that if the rule was different so j is the same variable across iterations (like in Python), then all the anonymous functions return the same j value, not very useful.
At one point for v1.0, they tried to make interactive and non-interactive scope rules consistent, but people complained because they missed pasting local scope code into the global scope of the REPL or notebook. So some blocks were designated "soft" local scopes, and in interactive contexts, soft scopes reassign global variables, by default. In non-interactive contexts (.jl files, eval()), soft scopes act like hard scopes but print warnings. The hard/soft stuff really complicates things, so try to understand the other rules before applying this angle.

Related

Counter-Intuitive : Local variables acting like global variables [duplicate]

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function called with no parameter to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function, or "together" with it?
Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Actually, this is not a design flaw, and it is not because of internals or performance. It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you think of it this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, the effbot (Fredrik Lundh) has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
Suppose you have the following code
fruits = ("apples", "bananas", "loganberries")
def eat(food=fruits):
...
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")
However, suppose later on in the code, I do something like
def some_random_function():
global fruits
fruits = ("blueberries", "mangos")
then if default parameters were bound at function execution rather than function declaration, I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.
The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) ); // does this work?
Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).
Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
The relevant part of the documentation:
Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:
def whats_on_the_telly(penguin=None):
if penguin is None:
penguin = []
penguin.append("property of the zoo")
return penguin
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff.
When you instantiate a list:
a = []
you expect to get a new list referenced by a.
Why should the a=[] in
def x(a=[]):
instantiate a new list on function definition and not on invocation?
It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller".
I think this is ambiguous instead:
def x(a=datetime.datetime.now()):
user, do you want a to default to the datetime corresponding to when you're defining or executing x?
In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation).
On the other hand, if the user wanted the definition-time mapping he could write:
b = datetime.datetime.now()
def x(a=b):
I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:
def x(static a=b):
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
class BananaBunch:
bananas = []
def addBanana(self, banana):
self.bananas.append(banana)
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.
Given a simple little function func defined as:
>>> def func(a = []):
... a.append(5)
When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.
So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).
Function Before Execution:
>>> def func(a = []):
... a.append(5)
...
After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):
>>> func.__defaults__
([],)
O.k, so an empty list as the single entry in __defaults__, just as expected.
Function After Execution:
Let's now execute this function:
>>> func()
Now, let's see those __defaults__ again:
>>> func.__defaults__
([5],)
Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:
>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)
So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use None as the default and then initialize in the function body:
def func(a = None):
# or: a = [] if a is None else a
if a is None:
a = []
Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.
To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you'll see how these are indeed refering to the same list instance:
>>> def func(a = []):
... a.append(5)
... return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True
All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
def bar(a=input('Did you just see me without calling the function?')):
pass # use raw_input in Py2
as you'll notice, input() is called before the process of building the function and binding it to the name bar is made.
I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:
1. Performance
def foo(arg=something_expensive_to_compute())):
...
If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.
2. Forcing bound parameters
A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:
funcs = [ lambda i=i: i for i in range(10)]
This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.
The only way to implement this otherwise would be to create a further closure with the i bound, ie:
def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]
3. Introspection
Consider the code:
def foo(a='test', b=100, c=[]):
print a,b,c
We can get information about the arguments and defaults using the inspect module, which
>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))
This information is very useful for things like document generation, metaprogramming, decorators etc.
Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:
_undefined = object() # sentinel value
def foo(a=_undefined, b=_undefined, c=_undefined)
if a is _undefined: a='test'
if b is _undefined: b=100
if c is _undefined: c=[]
However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.
5 points in defense of Python
Simplicity: The behavior is simple in the following sense:
Most people fall into this trap only once, not several times.
Consistency: Python always passes objects, not names.
The default parameter is, obviously, part of the function
heading (not the function body). It therefore ought to be evaluated
at module load time (and only at module load time, unless nested), not
at function call time.
Usefulness: As Frederik Lundh points out in his explanation
of "Default Parameter Values in Python", the
current behavior can be quite useful for advanced programming.
(Use sparingly.)
Sufficient documentation: In the most basic Python documentation,
the tutorial, the issue is loudly announced as
an "Important warning" in the first subsection of Section
"More on Defining Functions".
The warning even uses boldface,
which is rarely applied outside of headings.
RTFM: Read the fine manual.
Meta-learning: Falling into the trap is actually a very
helpful moment (at least if you are a reflective learner),
because you will subsequently better understand the point
"Consistency" above and that will
teach you a great deal about Python.
This behavior is easy explained by:
function (class etc.) declaration is executed only once, creating all default value objects
everything is passed by reference
So:
def x(a=0, b=[], c=[], d=0):
a = a + 1
b = b + [1]
c.append(1)
print a, b, c
a doesn't change - every assignment call creates new int object - new object is printed
b doesn't change - new array is build from default value and printed
c changes - operation is performed on same object - and it is printed
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
Example:
def foo(a=[]): # the same problematic function
a.append(5)
return a
>>> somevar = [1, 2] # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5] # usually expected [1, 2]
Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.
def foo(a=[]):
a = a[:] # a copy
a.append(5)
return a # or everything safe by one line: "return a + [5]"
Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying
Example problem for a similar SO question
class Test(object): # the original problematic class
def __init__(self, var1=[]):
self._var1 = var1
somevar = [1, 2] # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar # [1, 2, [1]] but usually expected [1, 2]
print t2._var1 # [1, 2, [1]] but usually expected [1, 2]
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )
Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.)
.)
2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..
3) In some cases is the mutable behavior of default parameters useful.
What you're asking is why this:
def func(a=[], b = 2):
pass
isn't internally equivalent to this:
def func(a=None, b = None):
a_default = lambda: []
b_default = lambda: 2
def actual_func(a=None, b=None):
if a is None: a = a_default()
if b is None: b = b_default()
return actual_func
func = func()
except for the case of explicitly calling func(None, None), which we'll ignore.
In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?
One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.
This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
>>> def foo(a):
a.append(5)
print a
>>> a = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]
No default values in sight in this code, but you get exactly the same problem.
The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).
Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.
If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
def foo(mutable_default_argument=[]): # make a list the default argument
"""function that uses a list"""
is almost exactly equivalent to this:
_a_list = [] # create a list in the globals
def foo(mutable_default_argument=_a_list): # make it the default argument
"""function that uses a list"""
del _a_list # remove globals name binding
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
seeing that the list is created before the function has finished compiling to a function object,
observing that the id is the same each time the list is referenced,
observing that the list stays changed when the function that uses it is called a second time,
observing the order in which the output is printed from the source (which I conveniently numbered for you):
example.py
print('1. Global scope being evaluated')
def create_list():
'''noisily create a list for usage as a kwarg'''
l = []
print('3. list being created and returned, id: ' + str(id(l)))
return l
print('2. example_function about to be compiled to an object')
def example_function(default_kwarg1=create_list()):
print('appending "a" in default default_kwarg1')
default_kwarg1.append("a")
print('list with id: ' + str(id(default_kwarg1)) +
' - is now: ' + repr(default_kwarg1))
print('4. example_function compiled: ' + repr(example_function))
if __name__ == '__main__':
print('5. calling example_function twice!:')
example_function()
example_function()
and running it with python example.py:
1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']
Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
def example_function_2(default_kwarg=None):
if default_kwarg is None:
default_kwarg = []
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.
As the tutorial section on control flow says:
If you don’t want the default to be shared between subsequent calls,
you can write the function like this instead:
def f(a, L=None):
if L is None:
L = []
L.append(a)
return L
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
def bar(a=[]):
print id(a)
a = a + [1]
print id(a)
return a
>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object
[1]
>>> id(bar.func_defaults[0])
4484370232
The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:
def a(): return []
def b(x=a()):
print x
Hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.
I agree it's a gotcha when you try to use default constructors, though.
It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?
def print_tuple(some_tuple=(1,2,3)):
print some_tuple
print_tuple() #1
print_tuple((1,2,3)) #2
I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):
#1
0 LOAD_GLOBAL 0 (print_tuple)
3 CALL_FUNCTION 0
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
#2
0 LOAD_GLOBAL 0 (print_tuple)
3 LOAD_CONST 4 ((1, 2, 3))
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)
As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.
This behavior is not surprising if you take the following into consideration:
The behavior of read-only class attributes upon assignment attempts, and that
Functions are objects (explained well in the accepted answer).
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
...all variables found outside of the innermost scope are
read-only (an attempt to write to such a variable will simply create a
new local variable in the innermost scope, leaving the identically
named outer variable unchanged).
Look back to the original example and consider the above points:
def foo(a=[]):
a.append(5)
return a
Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.
Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.
Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
def foo(a, L=None):
if L is None:
L = []
L.append(a)
return L
Taking (1) and (2) into account, one can see why this accomplishes the desired behavior:
When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
A simple workaround using None
>>> def bar(b, data=None):
... data = data or []
... data.append(b)
... return data
...
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]
It may be true that:
Someone is using every language/library feature, and
Switching the behavior here would be ill-advised, but
it is entirely consistent to hold to both of the features above and still make another point:
It is a confusing feature and it is unfortunate in Python.
The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.
It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,
The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.
I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).
As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.
Wrong Method (probably...):
def foo(list_arg=[5]):
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]
# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()
7
You can verify that they are one and the same object by using id:
>>> id(a)
5347866528
>>> id(b)
5347866528
Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)
The convention for achieving the desired result in Python is to
provide a default value of None and to document the actual behaviour
in the docstring.
This implementation ensures that each call to the function either receives the default list or else the list passed to the function.
Preferred Method:
def foo(list_arg=None):
"""
:param list_arg: A list of input values.
If none provided, used a list with a default value of 5.
"""
if not list_arg:
list_arg = [5]
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
>>> b
[5, 7]
c = foo([10])
c.append(11)
>>> c
[10, 11]
There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.
The solutions here are:
Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).
The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)
You can get round this by replacing the object (and therefore the tie with the scope):
def foo(a=[]):
a = list(a)
a.append(5)
return a
Ugly, but it works.
When we do this:
def foo(a=[]):
...
... we assign the argument a to an unnamed list, if the caller does not pass the value of a.
To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?
def foo(a=pavlo):
...
At any time, if the caller doesn't tell us what a is, we reuse pavlo.
If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.
So this is what you see (Remember, pavlo is initialized to []):
>>> foo()
[5]
Now, pavlo is [5].
Calling foo() again modifies pavlo again:
>>> foo()
[5, 5]
Specifying a when calling foo() ensures pavlo is not touched.
>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]
So, pavlo is still [5, 5].
>>> foo()
[5, 5, 5]
I sometimes exploit this behavior as an alternative to the following pattern:
singleton = None
def use_singleton():
global singleton
if singleton is None:
singleton = _make_singleton()
return singleton.use_me()
If singleton is only used by use_singleton, I like the following pattern as a replacement:
# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
return singleton.use_me()
I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.
We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.
import inspect
from copy import deepcopy # copy would fail on deep arguments like nested dicts
def sanify(function):
def wrapper(*a, **kw):
# store the default values
defaults = inspect.getargspec(function).defaults # for python2
# construct a new argument list
new_args = []
for i, arg in enumerate(defaults):
# allow passing positional arguments
if i in range(len(a)):
new_args.append(a[i])
else:
# copy the value
new_args.append(deepcopy(arg))
return function(*new_args, **kw)
return wrapper
Now let's redefine our function using this decorator:
#sanify
def foo(a=[]):
a.append(5)
return a
foo() # '[5]'
foo() # '[5]' -- as desired
This is particularly neat for functions that take multiple arguments. Compare:
# the 'correct' approach
def bar(a=None, b=None, c=None):
if a is None:
a = []
if b is None:
b = []
if c is None:
c = []
# finally do the actual work
with
# the nasty decorator hack
#sanify
def bar(a=[], b=[], c=[]):
# wow, works right out of the box!
It's important to note that the above solution breaks if you try to use keyword args, like so:
foo(a=[4])
The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
def example(errors=[]):
# statements
# Something went wrong
mistake = True
if mistake:
tryToFixIt(errors)
# Didn't work.. let's try again
tryToFixItAnotherway(errors)
# This time it worked
return errors
def tryToFixIt(err):
err.append('Attempt to fix it')
def tryToFixItAnotherway(err):
err.append('Attempt to fix it by another way')
def main():
for item in range(2):
errors = example()
print '\n'.join(errors)
main()
prints the following
Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way
This is not a design flaw. Anyone who trips over this is doing something wrong.
There are 3 cases I see where you might run into this problem:
You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.
The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.
Just change the function to be:
def notastonishinganymore(a = []):
'''The name is just a joke :)'''
a = a[:]
a.append(5)
return a
TLDR: Define-time defaults are consistent and strictly more expressive.
Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:
... # defining scope
def name(parameter=default): # ???
... # execution scope
The def name part must evaluate in the defining scope - we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.
Since parameter is a constant name, we can "evaluate" it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.
Now, when to evaluate default?
Consistency already says "at definition": everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.
The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:
def name(parameter=defined): # set default at definition time
...
def name(parameter=default): # delay default until execution time
parameter = default if parameter is None else parameter
...
Yes, this is a design flaw in Python
I've read all the other answers and I'm not convinced. This design does violate the principle of least astonishment.
The defaults could have been designed to be evaluated when the function is called, rather than when the function is defined. This is how Javascript does it:
function foo(a=[]) {
a.push(5);
return a;
}
console.log(foo()); // [5]
console.log(foo()); // [5]
console.log(foo()); // [5]
As further evidence that this is a design flaw, Python core developers are currently discussing introducing new syntax to fix this problem. See this article: Late-bound argument defaults for Python.
For even more evidence that this a design flaw, if you Google "Python gotchas", this design is mentioned as a gotcha, usually the first gotcha in the list, in the first 9 Google results (1, 2, 3, 4, 5, 6, 7, 8, 9). In contrast, if you Google "Javascript gotchas", the behaviour of default arguments in Javascript is not mentioned as a gotcha even once.
Gotchas, by definition, violate the principle of least astonishment. They astonish. Given there are superiour designs for the behaviour of default argument values, the inescapable conclusion is that Python's behaviour here represents a design flaw.

function to print occurences of substring [duplicate]

What exactly are the Python scoping rules?
If I have some code:
code1
class Foo:
code2
def spam.....
code3
for code4..:
code5
x()
Where is x found? Some possible choices include the list below:
In the enclosing source file
In the class namespace
In the function definition
In the for loop index variable
Inside the for loop
Also there is the context during execution, when the function spam is passed somewhere else. And maybe lambda functions pass a bit differently?
There must be a simple reference or algorithm somewhere. It's a confusing world for intermediate Python programmers.
Actually, a concise rule for Python Scope resolution, from Learning Python, 3rd. Ed.. (These rules are specific to variable names, not attributes. If you reference it without a period, these rules apply.)
LEGB Rule
Local — Names assigned in any way within a function (def or lambda), and not declared global in that function
Enclosing-function — Names assigned in the local scope of any and all statically enclosing functions (def or lambda), from inner to outer
Global (module) — Names assigned at the top-level of a module file, or by executing a global statement in a def within the file
Built-in (Python) — Names preassigned in the built-in names module: open, range, SyntaxError, etc
So, in the case of
code1
class Foo:
code2
def spam():
code3
for code4:
code5
x()
The for loop does not have its own namespace. In LEGB order, the scopes would be
L: Local in def spam (in code3, code4, and code5)
E: Any enclosing functions (if the whole example were in another def)
G: Were there any x declared globally in the module (in code1)?
B: Any builtin x in Python.
x will never be found in code2 (even in cases where you might expect it would, see Antti's answer or here).
Essentially, the only thing in Python that introduces a new scope is a function definition. Classes are a bit of a special case in that anything defined directly in the body is placed in the class's namespace, but they are not directly accessible from within the methods (or nested classes) they contain.
In your example there are only 3 scopes where x will be searched in:
spam's scope - containing everything defined in code3 and code5 (as well as code4, your loop variable)
The global scope - containing everything defined in code1, as well as Foo (and whatever changes after it)
The builtins namespace. A bit of a special case - this contains the various Python builtin functions and types such as len() and str(). Generally this shouldn't be modified by any user code, so expect it to contain the standard functions and nothing else.
More scopes only appear when you introduce a nested function (or lambda) into the picture.
These will behave pretty much as you'd expect however. The nested function can access everything in the local scope, as well as anything in the enclosing function's scope. eg.
def foo():
x=4
def bar():
print x # Accesses x from foo's scope
bar() # Prints 4
x=5
bar() # Prints 5
Restrictions:
Variables in scopes other than the local function's variables can be accessed, but can't be rebound to new parameters without further syntax. Instead, assignment will create a new local variable instead of affecting the variable in the parent scope. For example:
global_var1 = []
global_var2 = 1
def func():
# This is OK: It's just accessing, not rebinding
global_var1.append(4)
# This won't affect global_var2. Instead it creates a new variable
global_var2 = 2
local1 = 4
def embedded_func():
# Again, this doen't affect func's local1 variable. It creates a
# new local variable also called local1 instead.
local1 = 5
print local1
embedded_func() # Prints 5
print local1 # Prints 4
In order to actually modify the bindings of global variables from within a function scope, you need to specify that the variable is global with the global keyword. Eg:
global_var = 4
def change_global():
global global_var
global_var = global_var + 1
Currently there is no way to do the same for variables in enclosing function scopes, but Python 3 introduces a new keyword, "nonlocal" which will act in a similar way to global, but for nested function scopes.
There was no thorough answer concerning Python3 time, so I made an answer here. Most of what is described here is detailed in the 4.2.2 Resolution of names of the Python 3 documentation.
As provided in other answers, there are 4 basic scopes, the LEGB, for Local, Enclosing, Global and Builtin. In addition to those, there is a special scope, the class body, which does not comprise an enclosing scope for methods defined within the class; any assignments within the class body make the variable from there on be bound in the class body.
Especially, no block statement, besides def and class, create a variable scope. In Python 2 a list comprehension does not create a variable scope, however in Python 3 the loop variable within list comprehensions is created in a new scope.
To demonstrate the peculiarities of the class body
x = 0
class X(object):
y = x
x = x + 1 # x is now a variable
z = x
def method(self):
print(self.x) # -> 1
print(x) # -> 0, the global x
print(y) # -> NameError: global name 'y' is not defined
inst = X()
print(inst.x, inst.y, inst.z, x) # -> (1, 0, 1, 0)
Thus unlike in function body, you can reassign the variable to the same name in class body, to get a class variable with the same name; further lookups on this name resolve
to the class variable instead.
One of the greater surprises to many newcomers to Python is that a for loop does not create a variable scope. In Python 2 the list comprehensions do not create a scope either (while generators and dict comprehensions do!) Instead they leak the value in the function or the global scope:
>>> [ i for i in range(5) ]
>>> i
4
The comprehensions can be used as a cunning (or awful if you will) way to make modifiable variables within lambda expressions in Python 2 - a lambda expression does create a variable scope, like the def statement would, but within lambda no statements are allowed. Assignment being a statement in Python means that no variable assignments in lambda are allowed, but a list comprehension is an expression...
This behaviour has been fixed in Python 3 - no comprehension expressions or generators leak variables.
The global really means the module scope; the main python module is the __main__; all imported modules are accessible through the sys.modules variable; to get access to __main__ one can use sys.modules['__main__'], or import __main__; it is perfectly acceptable to access and assign attributes there; they will show up as variables in the global scope of the main module.
If a name is ever assigned to in the current scope (except in the class scope), it will be considered belonging to that scope, otherwise it will be considered to belonging to any enclosing scope that assigns to the variable (it might not be assigned yet, or not at all), or finally the global scope. If the variable is considered local, but it is not set yet, or has been deleted, reading the variable value will result in UnboundLocalError, which is a subclass of NameError.
x = 5
def foobar():
print(x) # causes UnboundLocalError!
x += 1 # because assignment here makes x a local variable within the function
# call the function
foobar()
The scope can declare that it explicitly wants to modify the global (module scope) variable, with the global keyword:
x = 5
def foobar():
global x
print(x)
x += 1
foobar() # -> 5
print(x) # -> 6
This also is possible even if it was shadowed in enclosing scope:
x = 5
y = 13
def make_closure():
x = 42
y = 911
def func():
global x # sees the global value
print(x, y)
x += 1
return func
func = make_closure()
func() # -> 5 911
print(x, y) # -> 6 13
In python 2 there is no easy way to modify the value in the enclosing scope; usually this is simulated by having a mutable value, such as a list with length of 1:
def make_closure():
value = [0]
def get_next_value():
value[0] += 1
return value[0]
return get_next_value
get_next = make_closure()
print(get_next()) # -> 1
print(get_next()) # -> 2
However in python 3, the nonlocal comes to rescue:
def make_closure():
value = 0
def get_next_value():
nonlocal value
value += 1
return value
return get_next_value
get_next = make_closure() # identical behavior to the previous example.
The nonlocal documentation says that
Names listed in a nonlocal statement, unlike those listed in a global statement, must refer to pre-existing bindings in an enclosing scope (the scope in which a new binding should be created cannot be determined unambiguously).
i.e. nonlocal always refers to the innermost outer non-global scope where the name has been bound (i.e. assigned to, including used as the for target variable, in the with clause, or as a function parameter).
Any variable that is not deemed to be local to the current scope, or any enclosing scope, is a global variable. A global name is looked up in the module global dictionary; if not found, the global is then looked up from the builtins module; the name of the module was changed from python 2 to python 3; in python 2 it was __builtin__ and in python 3 it is now called builtins. If you assign to an attribute of builtins module, it will be visible thereafter to any module as a readable global variable, unless that module shadows them with its own global variable with the same name.
Reading the builtin module can also be useful; suppose that you want the python 3 style print function in some parts of file, but other parts of file still use the print statement. In Python 2.6-2.7 you can get hold of the Python 3 print function with:
import __builtin__
print3 = __builtin__.__dict__['print']
The from __future__ import print_function actually does not import the print function anywhere in Python 2 - instead it just disables the parsing rules for print statement in the current module, handling print like any other variable identifier, and thus allowing the print the function be looked up in the builtins.
A slightly more complete example of scope:
from __future__ import print_function # for python 2 support
x = 100
print("1. Global x:", x)
class Test(object):
y = x
print("2. Enclosed y:", y)
x = x + 1
print("3. Enclosed x:", x)
def method(self):
print("4. Enclosed self.x", self.x)
print("5. Global x", x)
try:
print(y)
except NameError as e:
print("6.", e)
def method_local_ref(self):
try:
print(x)
except UnboundLocalError as e:
print("7.", e)
x = 200 # causing 7 because has same name
print("8. Local x", x)
inst = Test()
inst.method()
inst.method_local_ref()
output:
1. Global x: 100
2. Enclosed y: 100
3. Enclosed x: 101
4. Enclosed self.x 101
5. Global x 100
6. global name 'y' is not defined
7. local variable 'x' referenced before assignment
8. Local x 200
The scoping rules for Python 2.x have been outlined already in other answers. The only thing I would add is that in Python 3.0, there is also the concept of a non-local scope (indicated by the 'nonlocal' keyword). This allows you to access outer scopes directly, and opens up the ability to do some neat tricks, including lexical closures (without ugly hacks involving mutable objects).
EDIT: Here's the PEP with more information on this.
Python resolves your variables with -- generally -- three namespaces available.
At any time during execution, there
are at least three nested scopes whose
namespaces are directly accessible:
the innermost scope, which is searched
first, contains the local names; the
namespaces of any enclosing functions,
which are searched starting with the
nearest enclosing scope; the middle
scope, searched next, contains the
current module's global names; and the
outermost scope (searched last) is the
namespace containing built-in names.
There are two functions: globals and locals which show you the contents two of these namespaces.
Namespaces are created by packages, modules, classes, object construction and functions. There aren't any other flavors of namespaces.
In this case, the call to a function named x has to be resolved in the local name space or the global namespace.
Local in this case, is the body of the method function Foo.spam.
Global is -- well -- global.
The rule is to search the nested local spaces created by method functions (and nested function definitions), then search global. That's it.
There are no other scopes. The for statement (and other compound statements like if and try) don't create new nested scopes. Only definitions (packages, modules, functions, classes and object instances.)
Inside a class definition, the names are part of the class namespace. code2, for instance, must be qualified by the class name. Generally Foo.code2. However, self.code2 will also work because Python objects look at the containing class as a fall-back.
An object (an instance of a class) has instance variables. These names are in the object's namespace. They must be qualified by the object. (variable.instance.)
From within a class method, you have locals and globals. You say self.variable to pick the instance as the namespace. You'll note that self is an argument to every class member function, making it part of the local namespace.
See Python Scope Rules, Python Scope, Variable Scope.
Where is x found?
x is not found as you haven't defined it. :-) It could be found in code1 (global) or code3 (local) if you put it there.
code2 (class members) aren't visible to code inside methods of the same class — you would usually access them using self. code4/code5 (loops) live in the same scope as code3, so if you wrote to x in there you would be changing the x instance defined in code3, not making a new x.
Python is statically scoped, so if you pass ‘spam’ to another function spam will still have access to globals in the module it came from (defined in code1), and any other containing scopes (see below). code2 members would again be accessed through self.
lambda is no different to def. If you have a lambda used inside a function, it's the same as defining a nested function. In Python 2.2 onwards, nested scopes are available. In this case you can bind x at any level of function nesting and Python will pick up the innermost instance:
x= 0
def fun1():
x= 1
def fun2():
x= 2
def fun3():
return x
return fun3()
return fun2()
print fun1(), x
2 0
fun3 sees the instance x from the nearest containing scope, which is the function scope associated with fun2. But the other x instances, defined in fun1 and globally, are not affected.
Before nested_scopes — in Python pre-2.1, and in 2.1 unless you specifically ask for the feature using a from-future-import — fun1 and fun2's scopes are not visible to fun3, so S.Lott's answer holds and you would get the global x:
0 0
The Python name resolution only knows the following kinds of scope:
builtins scope which provides the Builtin Functions, such as print, int, or zip,
module global scope which is always the top-level of the current module,
three user-defined scopes that can be nested into each other, namely
function closure scope, from any enclosing def block, lambda expression or comprehension.
function local scope, inside a def block, lambda expression or comprehension,
class scope, inside a class block.
Notably, other constructs such as if, for, or with statements do not have their own scope.
The scoping TLDR: The lookup of a name begins at the scope in which the name is used, then any enclosing scopes (excluding class scopes), to the module globals, and finally the builtins – the first match in this search order is used.
The assignment to a scope is by default to the current scope – the special forms nonlocal and global must be used to assign to a name from an outer scope.
Finally, comprehensions and generator expressions as well as := asignment expressions have one special rule when combined.
Nested Scopes and Name Resolution
These different scopes build a hierarchy, with builtins then global always forming the base, and closures, locals and class scope being nested as lexically defined. That is, only the nesting in the source code matters, not for example the call stack.
print("builtins are available without definition")
some_global = "1" # global variables are at module scope
def outer_function():
some_closure = "3.1" # locals and closure are defined the same, at function scope
some_local = "3.2" # a variable becomes a closure if a nested scope uses it
class InnerClass:
some_classvar = "3.3" # class variables exist *only* at class scope
def inner_function(self):
some_local = "3.2" # locals can replace outer names
print(some_closure) # closures are always readable
return InnerClass
Even though class creates a scope and may have nested classes, functions and comprehensions, the names of the class scope are not visible to enclosed scopes. This creates the following hierarchy:
┎ builtins [print, ...]
┗━┱ globals [some_global]
┗━┱ outer_function [some_local, some_closure]
┣━╾ InnerClass [some_classvar]
┗━╾ inner_function [some_local]
Name resolution always starts at the current scope in which a name is accessed, then goes up the hierarchy until a match is found. For example, looking up some_local inside outer_function and inner_function starts at the respective function - and immediately finds the some_local defined in outer_function and inner_function, respectively. When a name is not local, it is fetched from the nearest enclosing scope that defines it – looking up some_closure and print inside inner_function searches until outer_function and builtins, respectively.
Scope Declarations and Name Binding
By default, a name belongs to any scope in which it is bound to a value. Binding the same name again in an inner scope creates a new variable with the same name - for example, some_local exists separately in both outer_function and inner_function. As far as scoping is concerned, binding includes any statement that sets the value of a name – assignment statements, but also the iteration variable of a for loop, or the name of a with context manager. Notably, del also counts as name binding.
When a name must refer to an outer variable and be bound in an inner scope, the name must be declared as not local. Separate declarations exists for the different kinds of enclosing scopes: nonlocal always refers to the nearest closure, and global always refers to a global name. Notably, nonlocal never refers to a global name and global ignores all closures of the same name. There is no declaration to refer to the builtin scope.
some_global = "1"
def outer_function():
some_closure = "3.2"
some_global = "this is ignored by a nested global declaration"
def inner_function():
global some_global # declare variable from global scope
nonlocal some_closure # declare variable from enclosing scope
message = " bound by an inner scope"
some_global = some_global + message
some_closure = some_closure + message
return inner_function
Of note is that function local and nonlocal are resolved at compile time. A nonlocal name must exist in some outer scope. In contrast, a global name can be defined dynamically and may be added or removed from the global scope at any time.
Comprehensions and Assignment Expressions
The scoping rules of list, set and dict comprehensions and generator expressions are almost the same as for functions. Likewise, the scoping rules for assignment expressions are almost the same as for regular name binding.
The scope of comprehensions and generator expressions is of the same kind as function scope. All names bound in the scope, namely the iteration variables, are locals or closures to the comprehensions/generator and nested scopes. All names, including iterables, are resolved using name resolution as applicable inside functions.
some_global = "global"
def outer_function():
some_closure = "closure"
return [ # new function-like scope started by comprehension
comp_local # names resolved using regular name resolution
for comp_local # iteration targets are local
in "iterable"
if comp_local in some_global and comp_local in some_global
]
An := assignment expression works on the nearest function, class or global scope. Notably, if the target of an assignment expression has been declared nonlocal or global in the nearest scope, the assignment expression honors this like a regular assignment.
print(some_global := "global")
def outer_function():
print(some_closure := "closure")
However, an assignment expression inside a comprehension/generator works on the nearest enclosing scope of the comprehension/generator, not the scope of the comprehension/generator itself. When several comprehensions/generators are nested, the nearest function or global scope is used. Since the comprehension/generator scope can read closures and global variables, the assignment variable is readable in the comprehension as well. Assigning from a comprehension to a class scope is not valid.
print(some_global := "global")
def outer_function():
print(some_closure := "closure")
steps = [
# v write to variable in containing scope
(some_closure := some_closure + comp_local)
# ^ read from variable in containing scope
for comp_local in some_global
]
return some_closure, steps
While the iteration variable is local to the comprehension in which it is bound, the target of the assignment expression does not create a local variable and is read from the outer scope:
┎ builtins [print, ...]
┗━┱ globals [some_global]
┗━┱ outer_function [some_closure]
┗━╾ <listcomp> [comp_local]
In Python,
any variable that is assigned a value is local to the block in which
the assignment appears.
If a variable can't be found in the current scope, please refer to the LEGB order.

List defined in __init__ seems to be global [duplicate]

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function called with no parameter to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function, or "together" with it?
Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Actually, this is not a design flaw, and it is not because of internals or performance. It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you think of it this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, the effbot (Fredrik Lundh) has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
Suppose you have the following code
fruits = ("apples", "bananas", "loganberries")
def eat(food=fruits):
...
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")
However, suppose later on in the code, I do something like
def some_random_function():
global fruits
fruits = ("blueberries", "mangos")
then if default parameters were bound at function execution rather than function declaration, I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.
The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) ); // does this work?
Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).
Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
The relevant part of the documentation:
Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:
def whats_on_the_telly(penguin=None):
if penguin is None:
penguin = []
penguin.append("property of the zoo")
return penguin
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff.
When you instantiate a list:
a = []
you expect to get a new list referenced by a.
Why should the a=[] in
def x(a=[]):
instantiate a new list on function definition and not on invocation?
It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller".
I think this is ambiguous instead:
def x(a=datetime.datetime.now()):
user, do you want a to default to the datetime corresponding to when you're defining or executing x?
In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation).
On the other hand, if the user wanted the definition-time mapping he could write:
b = datetime.datetime.now()
def x(a=b):
I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:
def x(static a=b):
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
class BananaBunch:
bananas = []
def addBanana(self, banana):
self.bananas.append(banana)
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.
Given a simple little function func defined as:
>>> def func(a = []):
... a.append(5)
When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.
So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).
Function Before Execution:
>>> def func(a = []):
... a.append(5)
...
After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):
>>> func.__defaults__
([],)
O.k, so an empty list as the single entry in __defaults__, just as expected.
Function After Execution:
Let's now execute this function:
>>> func()
Now, let's see those __defaults__ again:
>>> func.__defaults__
([5],)
Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:
>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)
So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use None as the default and then initialize in the function body:
def func(a = None):
# or: a = [] if a is None else a
if a is None:
a = []
Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.
To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you'll see how these are indeed refering to the same list instance:
>>> def func(a = []):
... a.append(5)
... return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True
All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
def bar(a=input('Did you just see me without calling the function?')):
pass # use raw_input in Py2
as you'll notice, input() is called before the process of building the function and binding it to the name bar is made.
I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:
1. Performance
def foo(arg=something_expensive_to_compute())):
...
If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.
2. Forcing bound parameters
A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:
funcs = [ lambda i=i: i for i in range(10)]
This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.
The only way to implement this otherwise would be to create a further closure with the i bound, ie:
def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]
3. Introspection
Consider the code:
def foo(a='test', b=100, c=[]):
print a,b,c
We can get information about the arguments and defaults using the inspect module, which
>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))
This information is very useful for things like document generation, metaprogramming, decorators etc.
Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:
_undefined = object() # sentinel value
def foo(a=_undefined, b=_undefined, c=_undefined)
if a is _undefined: a='test'
if b is _undefined: b=100
if c is _undefined: c=[]
However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.
5 points in defense of Python
Simplicity: The behavior is simple in the following sense:
Most people fall into this trap only once, not several times.
Consistency: Python always passes objects, not names.
The default parameter is, obviously, part of the function
heading (not the function body). It therefore ought to be evaluated
at module load time (and only at module load time, unless nested), not
at function call time.
Usefulness: As Frederik Lundh points out in his explanation
of "Default Parameter Values in Python", the
current behavior can be quite useful for advanced programming.
(Use sparingly.)
Sufficient documentation: In the most basic Python documentation,
the tutorial, the issue is loudly announced as
an "Important warning" in the first subsection of Section
"More on Defining Functions".
The warning even uses boldface,
which is rarely applied outside of headings.
RTFM: Read the fine manual.
Meta-learning: Falling into the trap is actually a very
helpful moment (at least if you are a reflective learner),
because you will subsequently better understand the point
"Consistency" above and that will
teach you a great deal about Python.
This behavior is easy explained by:
function (class etc.) declaration is executed only once, creating all default value objects
everything is passed by reference
So:
def x(a=0, b=[], c=[], d=0):
a = a + 1
b = b + [1]
c.append(1)
print a, b, c
a doesn't change - every assignment call creates new int object - new object is printed
b doesn't change - new array is build from default value and printed
c changes - operation is performed on same object - and it is printed
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
Example:
def foo(a=[]): # the same problematic function
a.append(5)
return a
>>> somevar = [1, 2] # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5] # usually expected [1, 2]
Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.
def foo(a=[]):
a = a[:] # a copy
a.append(5)
return a # or everything safe by one line: "return a + [5]"
Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying
Example problem for a similar SO question
class Test(object): # the original problematic class
def __init__(self, var1=[]):
self._var1 = var1
somevar = [1, 2] # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar # [1, 2, [1]] but usually expected [1, 2]
print t2._var1 # [1, 2, [1]] but usually expected [1, 2]
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )
Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.)
.)
2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..
3) In some cases is the mutable behavior of default parameters useful.
What you're asking is why this:
def func(a=[], b = 2):
pass
isn't internally equivalent to this:
def func(a=None, b = None):
a_default = lambda: []
b_default = lambda: 2
def actual_func(a=None, b=None):
if a is None: a = a_default()
if b is None: b = b_default()
return actual_func
func = func()
except for the case of explicitly calling func(None, None), which we'll ignore.
In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?
One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.
This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
>>> def foo(a):
a.append(5)
print a
>>> a = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]
No default values in sight in this code, but you get exactly the same problem.
The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).
Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.
If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
def foo(mutable_default_argument=[]): # make a list the default argument
"""function that uses a list"""
is almost exactly equivalent to this:
_a_list = [] # create a list in the globals
def foo(mutable_default_argument=_a_list): # make it the default argument
"""function that uses a list"""
del _a_list # remove globals name binding
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
seeing that the list is created before the function has finished compiling to a function object,
observing that the id is the same each time the list is referenced,
observing that the list stays changed when the function that uses it is called a second time,
observing the order in which the output is printed from the source (which I conveniently numbered for you):
example.py
print('1. Global scope being evaluated')
def create_list():
'''noisily create a list for usage as a kwarg'''
l = []
print('3. list being created and returned, id: ' + str(id(l)))
return l
print('2. example_function about to be compiled to an object')
def example_function(default_kwarg1=create_list()):
print('appending "a" in default default_kwarg1')
default_kwarg1.append("a")
print('list with id: ' + str(id(default_kwarg1)) +
' - is now: ' + repr(default_kwarg1))
print('4. example_function compiled: ' + repr(example_function))
if __name__ == '__main__':
print('5. calling example_function twice!:')
example_function()
example_function()
and running it with python example.py:
1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']
Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
def example_function_2(default_kwarg=None):
if default_kwarg is None:
default_kwarg = []
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.
As the tutorial section on control flow says:
If you don’t want the default to be shared between subsequent calls,
you can write the function like this instead:
def f(a, L=None):
if L is None:
L = []
L.append(a)
return L
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
def bar(a=[]):
print id(a)
a = a + [1]
print id(a)
return a
>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object
[1]
>>> id(bar.func_defaults[0])
4484370232
The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:
def a(): return []
def b(x=a()):
print x
Hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.
I agree it's a gotcha when you try to use default constructors, though.
It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?
def print_tuple(some_tuple=(1,2,3)):
print some_tuple
print_tuple() #1
print_tuple((1,2,3)) #2
I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):
#1
0 LOAD_GLOBAL 0 (print_tuple)
3 CALL_FUNCTION 0
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
#2
0 LOAD_GLOBAL 0 (print_tuple)
3 LOAD_CONST 4 ((1, 2, 3))
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)
As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.
This behavior is not surprising if you take the following into consideration:
The behavior of read-only class attributes upon assignment attempts, and that
Functions are objects (explained well in the accepted answer).
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
...all variables found outside of the innermost scope are
read-only (an attempt to write to such a variable will simply create a
new local variable in the innermost scope, leaving the identically
named outer variable unchanged).
Look back to the original example and consider the above points:
def foo(a=[]):
a.append(5)
return a
Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.
Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.
Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
def foo(a, L=None):
if L is None:
L = []
L.append(a)
return L
Taking (1) and (2) into account, one can see why this accomplishes the desired behavior:
When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
A simple workaround using None
>>> def bar(b, data=None):
... data = data or []
... data.append(b)
... return data
...
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]
It may be true that:
Someone is using every language/library feature, and
Switching the behavior here would be ill-advised, but
it is entirely consistent to hold to both of the features above and still make another point:
It is a confusing feature and it is unfortunate in Python.
The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.
It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,
The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.
I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).
As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.
Wrong Method (probably...):
def foo(list_arg=[5]):
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]
# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()
7
You can verify that they are one and the same object by using id:
>>> id(a)
5347866528
>>> id(b)
5347866528
Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)
The convention for achieving the desired result in Python is to
provide a default value of None and to document the actual behaviour
in the docstring.
This implementation ensures that each call to the function either receives the default list or else the list passed to the function.
Preferred Method:
def foo(list_arg=None):
"""
:param list_arg: A list of input values.
If none provided, used a list with a default value of 5.
"""
if not list_arg:
list_arg = [5]
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
>>> b
[5, 7]
c = foo([10])
c.append(11)
>>> c
[10, 11]
There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.
The solutions here are:
Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).
The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)
You can get round this by replacing the object (and therefore the tie with the scope):
def foo(a=[]):
a = list(a)
a.append(5)
return a
Ugly, but it works.
When we do this:
def foo(a=[]):
...
... we assign the argument a to an unnamed list, if the caller does not pass the value of a.
To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?
def foo(a=pavlo):
...
At any time, if the caller doesn't tell us what a is, we reuse pavlo.
If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.
So this is what you see (Remember, pavlo is initialized to []):
>>> foo()
[5]
Now, pavlo is [5].
Calling foo() again modifies pavlo again:
>>> foo()
[5, 5]
Specifying a when calling foo() ensures pavlo is not touched.
>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]
So, pavlo is still [5, 5].
>>> foo()
[5, 5, 5]
I sometimes exploit this behavior as an alternative to the following pattern:
singleton = None
def use_singleton():
global singleton
if singleton is None:
singleton = _make_singleton()
return singleton.use_me()
If singleton is only used by use_singleton, I like the following pattern as a replacement:
# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
return singleton.use_me()
I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.
We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.
import inspect
from copy import deepcopy # copy would fail on deep arguments like nested dicts
def sanify(function):
def wrapper(*a, **kw):
# store the default values
defaults = inspect.getargspec(function).defaults # for python2
# construct a new argument list
new_args = []
for i, arg in enumerate(defaults):
# allow passing positional arguments
if i in range(len(a)):
new_args.append(a[i])
else:
# copy the value
new_args.append(deepcopy(arg))
return function(*new_args, **kw)
return wrapper
Now let's redefine our function using this decorator:
#sanify
def foo(a=[]):
a.append(5)
return a
foo() # '[5]'
foo() # '[5]' -- as desired
This is particularly neat for functions that take multiple arguments. Compare:
# the 'correct' approach
def bar(a=None, b=None, c=None):
if a is None:
a = []
if b is None:
b = []
if c is None:
c = []
# finally do the actual work
with
# the nasty decorator hack
#sanify
def bar(a=[], b=[], c=[]):
# wow, works right out of the box!
It's important to note that the above solution breaks if you try to use keyword args, like so:
foo(a=[4])
The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
def example(errors=[]):
# statements
# Something went wrong
mistake = True
if mistake:
tryToFixIt(errors)
# Didn't work.. let's try again
tryToFixItAnotherway(errors)
# This time it worked
return errors
def tryToFixIt(err):
err.append('Attempt to fix it')
def tryToFixItAnotherway(err):
err.append('Attempt to fix it by another way')
def main():
for item in range(2):
errors = example()
print '\n'.join(errors)
main()
prints the following
Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way
This is not a design flaw. Anyone who trips over this is doing something wrong.
There are 3 cases I see where you might run into this problem:
You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.
The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.
Just change the function to be:
def notastonishinganymore(a = []):
'''The name is just a joke :)'''
a = a[:]
a.append(5)
return a
TLDR: Define-time defaults are consistent and strictly more expressive.
Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:
... # defining scope
def name(parameter=default): # ???
... # execution scope
The def name part must evaluate in the defining scope - we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.
Since parameter is a constant name, we can "evaluate" it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.
Now, when to evaluate default?
Consistency already says "at definition": everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.
The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:
def name(parameter=defined): # set default at definition time
...
def name(parameter=default): # delay default until execution time
parameter = default if parameter is None else parameter
...
Yes, this is a design flaw in Python
I've read all the other answers and I'm not convinced. This design does violate the principle of least astonishment.
The defaults could have been designed to be evaluated when the function is called, rather than when the function is defined. This is how Javascript does it:
function foo(a=[]) {
a.push(5);
return a;
}
console.log(foo()); // [5]
console.log(foo()); // [5]
console.log(foo()); // [5]
As further evidence that this is a design flaw, Python core developers are currently discussing introducing new syntax to fix this problem. See this article: Late-bound argument defaults for Python.
For even more evidence that this a design flaw, if you Google "Python gotchas", this design is mentioned as a gotcha, usually the first gotcha in the list, in the first 9 Google results (1, 2, 3, 4, 5, 6, 7, 8, 9). In contrast, if you Google "Javascript gotchas", the behaviour of default arguments in Javascript is not mentioned as a gotcha even once.
Gotchas, by definition, violate the principle of least astonishment. They astonish. Given there are superiour designs for the behaviour of default argument values, the inescapable conclusion is that Python's behaviour here represents a design flaw.

python user define function is replacing variable value [duplicate]

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function called with no parameter to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function, or "together" with it?
Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Actually, this is not a design flaw, and it is not because of internals or performance. It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you think of it this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, the effbot (Fredrik Lundh) has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
Suppose you have the following code
fruits = ("apples", "bananas", "loganberries")
def eat(food=fruits):
...
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")
However, suppose later on in the code, I do something like
def some_random_function():
global fruits
fruits = ("blueberries", "mangos")
then if default parameters were bound at function execution rather than function declaration, I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.
The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) ); // does this work?
Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).
Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
The relevant part of the documentation:
Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:
def whats_on_the_telly(penguin=None):
if penguin is None:
penguin = []
penguin.append("property of the zoo")
return penguin
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff.
When you instantiate a list:
a = []
you expect to get a new list referenced by a.
Why should the a=[] in
def x(a=[]):
instantiate a new list on function definition and not on invocation?
It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller".
I think this is ambiguous instead:
def x(a=datetime.datetime.now()):
user, do you want a to default to the datetime corresponding to when you're defining or executing x?
In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation).
On the other hand, if the user wanted the definition-time mapping he could write:
b = datetime.datetime.now()
def x(a=b):
I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:
def x(static a=b):
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
class BananaBunch:
bananas = []
def addBanana(self, banana):
self.bananas.append(banana)
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.
Given a simple little function func defined as:
>>> def func(a = []):
... a.append(5)
When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.
So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).
Function Before Execution:
>>> def func(a = []):
... a.append(5)
...
After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):
>>> func.__defaults__
([],)
O.k, so an empty list as the single entry in __defaults__, just as expected.
Function After Execution:
Let's now execute this function:
>>> func()
Now, let's see those __defaults__ again:
>>> func.__defaults__
([5],)
Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:
>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)
So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use None as the default and then initialize in the function body:
def func(a = None):
# or: a = [] if a is None else a
if a is None:
a = []
Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.
To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you'll see how these are indeed refering to the same list instance:
>>> def func(a = []):
... a.append(5)
... return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True
All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
def bar(a=input('Did you just see me without calling the function?')):
pass # use raw_input in Py2
as you'll notice, input() is called before the process of building the function and binding it to the name bar is made.
I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:
1. Performance
def foo(arg=something_expensive_to_compute())):
...
If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.
2. Forcing bound parameters
A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:
funcs = [ lambda i=i: i for i in range(10)]
This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.
The only way to implement this otherwise would be to create a further closure with the i bound, ie:
def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]
3. Introspection
Consider the code:
def foo(a='test', b=100, c=[]):
print a,b,c
We can get information about the arguments and defaults using the inspect module, which
>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))
This information is very useful for things like document generation, metaprogramming, decorators etc.
Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:
_undefined = object() # sentinel value
def foo(a=_undefined, b=_undefined, c=_undefined)
if a is _undefined: a='test'
if b is _undefined: b=100
if c is _undefined: c=[]
However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.
5 points in defense of Python
Simplicity: The behavior is simple in the following sense:
Most people fall into this trap only once, not several times.
Consistency: Python always passes objects, not names.
The default parameter is, obviously, part of the function
heading (not the function body). It therefore ought to be evaluated
at module load time (and only at module load time, unless nested), not
at function call time.
Usefulness: As Frederik Lundh points out in his explanation
of "Default Parameter Values in Python", the
current behavior can be quite useful for advanced programming.
(Use sparingly.)
Sufficient documentation: In the most basic Python documentation,
the tutorial, the issue is loudly announced as
an "Important warning" in the first subsection of Section
"More on Defining Functions".
The warning even uses boldface,
which is rarely applied outside of headings.
RTFM: Read the fine manual.
Meta-learning: Falling into the trap is actually a very
helpful moment (at least if you are a reflective learner),
because you will subsequently better understand the point
"Consistency" above and that will
teach you a great deal about Python.
This behavior is easy explained by:
function (class etc.) declaration is executed only once, creating all default value objects
everything is passed by reference
So:
def x(a=0, b=[], c=[], d=0):
a = a + 1
b = b + [1]
c.append(1)
print a, b, c
a doesn't change - every assignment call creates new int object - new object is printed
b doesn't change - new array is build from default value and printed
c changes - operation is performed on same object - and it is printed
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
Example:
def foo(a=[]): # the same problematic function
a.append(5)
return a
>>> somevar = [1, 2] # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5] # usually expected [1, 2]
Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.
def foo(a=[]):
a = a[:] # a copy
a.append(5)
return a # or everything safe by one line: "return a + [5]"
Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying
Example problem for a similar SO question
class Test(object): # the original problematic class
def __init__(self, var1=[]):
self._var1 = var1
somevar = [1, 2] # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar # [1, 2, [1]] but usually expected [1, 2]
print t2._var1 # [1, 2, [1]] but usually expected [1, 2]
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )
Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.)
.)
2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..
3) In some cases is the mutable behavior of default parameters useful.
What you're asking is why this:
def func(a=[], b = 2):
pass
isn't internally equivalent to this:
def func(a=None, b = None):
a_default = lambda: []
b_default = lambda: 2
def actual_func(a=None, b=None):
if a is None: a = a_default()
if b is None: b = b_default()
return actual_func
func = func()
except for the case of explicitly calling func(None, None), which we'll ignore.
In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?
One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.
This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
>>> def foo(a):
a.append(5)
print a
>>> a = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]
No default values in sight in this code, but you get exactly the same problem.
The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).
Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.
If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
def foo(mutable_default_argument=[]): # make a list the default argument
"""function that uses a list"""
is almost exactly equivalent to this:
_a_list = [] # create a list in the globals
def foo(mutable_default_argument=_a_list): # make it the default argument
"""function that uses a list"""
del _a_list # remove globals name binding
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
seeing that the list is created before the function has finished compiling to a function object,
observing that the id is the same each time the list is referenced,
observing that the list stays changed when the function that uses it is called a second time,
observing the order in which the output is printed from the source (which I conveniently numbered for you):
example.py
print('1. Global scope being evaluated')
def create_list():
'''noisily create a list for usage as a kwarg'''
l = []
print('3. list being created and returned, id: ' + str(id(l)))
return l
print('2. example_function about to be compiled to an object')
def example_function(default_kwarg1=create_list()):
print('appending "a" in default default_kwarg1')
default_kwarg1.append("a")
print('list with id: ' + str(id(default_kwarg1)) +
' - is now: ' + repr(default_kwarg1))
print('4. example_function compiled: ' + repr(example_function))
if __name__ == '__main__':
print('5. calling example_function twice!:')
example_function()
example_function()
and running it with python example.py:
1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']
Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
def example_function_2(default_kwarg=None):
if default_kwarg is None:
default_kwarg = []
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.
As the tutorial section on control flow says:
If you don’t want the default to be shared between subsequent calls,
you can write the function like this instead:
def f(a, L=None):
if L is None:
L = []
L.append(a)
return L
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
def bar(a=[]):
print id(a)
a = a + [1]
print id(a)
return a
>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object
[1]
>>> id(bar.func_defaults[0])
4484370232
The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:
def a(): return []
def b(x=a()):
print x
Hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.
I agree it's a gotcha when you try to use default constructors, though.
It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?
def print_tuple(some_tuple=(1,2,3)):
print some_tuple
print_tuple() #1
print_tuple((1,2,3)) #2
I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):
#1
0 LOAD_GLOBAL 0 (print_tuple)
3 CALL_FUNCTION 0
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
#2
0 LOAD_GLOBAL 0 (print_tuple)
3 LOAD_CONST 4 ((1, 2, 3))
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)
As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.
This behavior is not surprising if you take the following into consideration:
The behavior of read-only class attributes upon assignment attempts, and that
Functions are objects (explained well in the accepted answer).
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
...all variables found outside of the innermost scope are
read-only (an attempt to write to such a variable will simply create a
new local variable in the innermost scope, leaving the identically
named outer variable unchanged).
Look back to the original example and consider the above points:
def foo(a=[]):
a.append(5)
return a
Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.
Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.
Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
def foo(a, L=None):
if L is None:
L = []
L.append(a)
return L
Taking (1) and (2) into account, one can see why this accomplishes the desired behavior:
When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
A simple workaround using None
>>> def bar(b, data=None):
... data = data or []
... data.append(b)
... return data
...
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]
It may be true that:
Someone is using every language/library feature, and
Switching the behavior here would be ill-advised, but
it is entirely consistent to hold to both of the features above and still make another point:
It is a confusing feature and it is unfortunate in Python.
The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.
It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,
The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.
I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).
As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.
Wrong Method (probably...):
def foo(list_arg=[5]):
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]
# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()
7
You can verify that they are one and the same object by using id:
>>> id(a)
5347866528
>>> id(b)
5347866528
Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)
The convention for achieving the desired result in Python is to
provide a default value of None and to document the actual behaviour
in the docstring.
This implementation ensures that each call to the function either receives the default list or else the list passed to the function.
Preferred Method:
def foo(list_arg=None):
"""
:param list_arg: A list of input values.
If none provided, used a list with a default value of 5.
"""
if not list_arg:
list_arg = [5]
return list_arg
a = foo()
a.append(6)
>>> a
[5, 6]
b = foo()
b.append(7)
>>> b
[5, 7]
c = foo([10])
c.append(11)
>>> c
[10, 11]
There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.
The solutions here are:
Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).
The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)
You can get round this by replacing the object (and therefore the tie with the scope):
def foo(a=[]):
a = list(a)
a.append(5)
return a
Ugly, but it works.
When we do this:
def foo(a=[]):
...
... we assign the argument a to an unnamed list, if the caller does not pass the value of a.
To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?
def foo(a=pavlo):
...
At any time, if the caller doesn't tell us what a is, we reuse pavlo.
If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.
So this is what you see (Remember, pavlo is initialized to []):
>>> foo()
[5]
Now, pavlo is [5].
Calling foo() again modifies pavlo again:
>>> foo()
[5, 5]
Specifying a when calling foo() ensures pavlo is not touched.
>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]
So, pavlo is still [5, 5].
>>> foo()
[5, 5, 5]
I sometimes exploit this behavior as an alternative to the following pattern:
singleton = None
def use_singleton():
global singleton
if singleton is None:
singleton = _make_singleton()
return singleton.use_me()
If singleton is only used by use_singleton, I like the following pattern as a replacement:
# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
return singleton.use_me()
I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.
We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.
import inspect
from copy import deepcopy # copy would fail on deep arguments like nested dicts
def sanify(function):
def wrapper(*a, **kw):
# store the default values
defaults = inspect.getargspec(function).defaults # for python2
# construct a new argument list
new_args = []
for i, arg in enumerate(defaults):
# allow passing positional arguments
if i in range(len(a)):
new_args.append(a[i])
else:
# copy the value
new_args.append(deepcopy(arg))
return function(*new_args, **kw)
return wrapper
Now let's redefine our function using this decorator:
#sanify
def foo(a=[]):
a.append(5)
return a
foo() # '[5]'
foo() # '[5]' -- as desired
This is particularly neat for functions that take multiple arguments. Compare:
# the 'correct' approach
def bar(a=None, b=None, c=None):
if a is None:
a = []
if b is None:
b = []
if c is None:
c = []
# finally do the actual work
with
# the nasty decorator hack
#sanify
def bar(a=[], b=[], c=[]):
# wow, works right out of the box!
It's important to note that the above solution breaks if you try to use keyword args, like so:
foo(a=[4])
The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
def example(errors=[]):
# statements
# Something went wrong
mistake = True
if mistake:
tryToFixIt(errors)
# Didn't work.. let's try again
tryToFixItAnotherway(errors)
# This time it worked
return errors
def tryToFixIt(err):
err.append('Attempt to fix it')
def tryToFixItAnotherway(err):
err.append('Attempt to fix it by another way')
def main():
for item in range(2):
errors = example()
print '\n'.join(errors)
main()
prints the following
Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way
This is not a design flaw. Anyone who trips over this is doing something wrong.
There are 3 cases I see where you might run into this problem:
You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.
The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.
Just change the function to be:
def notastonishinganymore(a = []):
'''The name is just a joke :)'''
a = a[:]
a.append(5)
return a
TLDR: Define-time defaults are consistent and strictly more expressive.
Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:
... # defining scope
def name(parameter=default): # ???
... # execution scope
The def name part must evaluate in the defining scope - we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.
Since parameter is a constant name, we can "evaluate" it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.
Now, when to evaluate default?
Consistency already says "at definition": everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.
The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:
def name(parameter=defined): # set default at definition time
...
def name(parameter=default): # delay default until execution time
parameter = default if parameter is None else parameter
...
Yes, this is a design flaw in Python
I've read all the other answers and I'm not convinced. This design does violate the principle of least astonishment.
The defaults could have been designed to be evaluated when the function is called, rather than when the function is defined. This is how Javascript does it:
function foo(a=[]) {
a.push(5);
return a;
}
console.log(foo()); // [5]
console.log(foo()); // [5]
console.log(foo()); // [5]
As further evidence that this is a design flaw, Python core developers are currently discussing introducing new syntax to fix this problem. See this article: Late-bound argument defaults for Python.
For even more evidence that this a design flaw, if you Google "Python gotchas", this design is mentioned as a gotcha, usually the first gotcha in the list, in the first 9 Google results (1, 2, 3, 4, 5, 6, 7, 8, 9). In contrast, if you Google "Javascript gotchas", the behaviour of default arguments in Javascript is not mentioned as a gotcha even once.
Gotchas, by definition, violate the principle of least astonishment. They astonish. Given there are superiour designs for the behaviour of default argument values, the inescapable conclusion is that Python's behaviour here represents a design flaw.

scope of julia variables: re-assigning within a loop in open expression

I am struggling with re-assigning a variable in a loop in Julia. I have a following example:
infile = "test.txt"
feature = "----"
for ln in 1:3
println(feature)
feature = "+"
end
open(infile) do f
#if true
# println(feature)
# feature = "----"
println(feature)
for ln in 1:5 #eachline(f)
println("feature")
#fails here
println(feature)
# because of this line:
feature = "+"
end
end
It fails if I do reassignment within the loop. I see that the issue with the variable scope, and because nested scopes are involved. The reference says that the loops introduce 'soft' scope. I cannot find out from the manual what scope open expression belongs to, but seemingly it screws things up, as if I replace open with if true, things run smoothly.
Do I understand correctly that open introduces 'hard' scope and that it is the reason why re-assignment retroactively makes the variable undefined?
You should think of
open("file") do f
...
end
as
open(function (f)
...
end, "file")
that is, do introduces the same kind of hard scope as a function or -> would.
So to be able to write to feature from the function, you need to do
open(infile) do f
global feature # this means: use the global `feature`
println(feature)
for ln in 1:5
println("feature")
println(feature)
feature = "+"
end
end
Note that this is only the case in top-level (module) scope; once inside a function, there are no hard scopes.
(The for loop in this case is a red herring; regardless of the soft scope of the loop, the access to feature will be limited by the hard scope of the anonymous function introduced by do.)

Resources