Instantiation time for mutable default arguments of closures in Python
My understanding is that when Python parses the source code of a function, it compiles it to bytecode but doesn't run this bytecode before the function is called (which is why illegal variable names in functions does not throw an exception unless you call the function).
Default arguments are not instantiated during this initial setup of the function, but only when the function is called for the first time, regardless of whether the arguments are supplied or not. This same instance of the default argument is used for all future calls, which can be seen by using a mutable type as a default argument.
If we put the function inside of another function, however, the default argument now seems to be re-instantiated each time the outer function is called, as the following code shows:
def f(x):
def g(y, a=[]):
a.append(y)
return a
for y in range(x, x + 2):
print('calling g from f:', g(y))
return g(y + 1)
for x in range(2):
print('calling f from module scope:', f(x))
This prints out
calling g from f: [0]
calling g from f: [0, 1]
calling f from module scope: [0, 1, 2]
calling g from f: [1]
calling g from f: [1, 2]
calling f from module scope: [1, 2, 3]
Does this mean that every time f
is called, the bytecode of g
is rebuild? This behavior seems unnecessary, and weird since the bytecode of f
(which include g
?) is only build once. Or perhaps it is only the default argument of g
which is reinstantiated at each call to f
?
First misconception: "when Python parses the source code of a function, it compiles it to bytecode but doesn't run this bytecode before the function is called (which is why illegal variable names in functions does not throw an exception unless you call the function)." To be clear, your misconception is that "illegal variable names in functions does not throw an exception unless you call the function". Unassigned names will not be caught until the function is executed.
Check out this simple test:
In [1]: def g(a):
...: 123onetwothree = a
File "<ipython-input-5-48a83ac30c7b>", line 2
123onetwothree = a
Second misconception: "default arguments are not instantiated during this initial setup of the function, but only when the function is called for the first time...". This is incorrect.
In [7]: def f(x=[]):
...: print(x)
...: x.append(1)
...: print(x)
...:
...:
In [8]: f.__defaults__
Out[8]: ([],)
In [9]: f()
[]
[1]
In [10]: f.__defaults__
Out[10]: ([1],)
In [11]:
As for your example, every time you run f
the default argument is reinstantiated because you define g
inside f
. The best way to think of it is to think of the def
statement as a constructor for function
objects, and the default arguments like parameters to this constructor. Every time you run def some_function
it is like calling the constructor all over again, and the function is redefined as if had written g = function(a=[])
in the body of f
.
In response to comment
In [11]: def f(x=h()): pass
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-ecbb4a9f8673> in <module>()
----> 1 def f(x=h()): pass
NameError: name 'h' is not defined
The inner function is rebuilt using existing bytecode for the inner function. It's easy to see using dis
.
>>> import dis
>>> def make_func():
... def my_func():
... pass
... return my_func
>>> dis.dis(make_func.__code__)
3 0 LOAD_CONST 1 (<code object my_func at [...]", line 3>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (my_func)
5 9 LOAD_FAST 0 (my_func)
12 RETURN_VALUE
Now if you do:
>>> f1 = make_func()
>>> f2 = make_func()
>>> f1 is f2
False
>>> f1.__code__ is f2.__code__
True
Just look at the bytecode for f
with dis
:
dis(f)
2 0 BUILD_LIST 0
3 LOAD_CONST 1 (<code object g at 0x7febd88411e0, file "<ipython-input-21-f2ef9ebb6765>", line 2>)
6 LOAD_CONST 2 ('f.<locals>.g')
9 MAKE_FUNCTION 1
12 STORE_FAST 1 (g)
6 15 SETUP_LOOP 46 (to 64)
18 LOAD_GLOBAL 0 (range)
21 LOAD_FAST 0 (x)
24 LOAD_FAST 0 (x)
27 LOAD_CONST 3 (2)
30 BINARY_ADD
31 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
34 GET_ITER
>> 35 FOR_ITER 25 (to 63)
38 STORE_FAST 2 (y)
(snipped for brevity)
The code object loaded for g
:
3 LOAD_CONST 1 (<code object g at 0x7febd88411e0, file "<ipython-input-21-f2ef9ebb6765>", line 2>)
doesn't contain any mutable structures, it just contains the executable code and other immutable information. You could take a peek at it too:
dis(f.__code__.co_consts[1])
3 0 LOAD_FAST 1 (a)
3 LOAD_ATTR 0 (append)
6 LOAD_FAST 0 (y)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 POP_TOP
4 13 LOAD_FAST 1 (a)
16 RETURN_VALUE
Every time f
is called, MAKE_FUNCTION
is called which re-creates the function from the byte code that already exists there.