How shall I understand this deep binding example in Python?

2018-06-18 02:51:15

From Programming Language Pragmatics by Scott,

在这里输入图像描述

Figure 3.14 Deep binding in Python. At right is a conceptual view of the run-time stack. Referencing environments captured in closures are shown as dashed boxes and arrows. When B is called via formal parameter P, two instances of I exist. Because the closure for P was created in the initial invocation of A , B's static link (solid arrow) points to the frame of that earlier invocation. B uses that invocation's instance of I in its print statement, and the output is a 1.

The catch is that a running program may have more than one instance of an object that is declared within a recursive subroutine. A closure in a language with static scoping captures the current instance of every object, at the time the closure is created. When the closure's subroutine is called, it will ﬁnd these captured instances, even if newer instances have subsequently been created by recursive calls.

So basically, the quote tries to explain that the following program (which is the same as the one in the screenshot) prints out 1 :

def A(I, P):
    def B():
        print(I)
    # body of A:
    if I > 1:
        P()
    else:
        A(2, B)

def C():
    pass # do nothing

A(1, C) # main program

I don't quite understand the reason being "because the closure for P was created in the initial invocation of A, B's static link (solid arrow) points to the frame of that earlier invocation", and "When the closure's subroutine is called, it will ﬁnd these captured instances". So I modify the example, and then the new example prints out 2 instead of 1 :

def A(I, P):
    def B():
        print(I)
    if I > 2:
        P()
    elif I > 1:
        A(3, B)
    else:
        A(2, B)

def C():
    pass

A(1, C)

Another modified example prints 1 :

def A(I, P):
    def B():
        print(I)
    if I > 2:
        P()
    elif I > 1:
        A(3, P)
    else:
        A(2, B)

def C():
    pass

A(1, C)

So how can I know which closure matters?

Generally, is a closure created whenever a function is passed as an argument to another function?

Thanks.

This has become too long for a comment, hence I am adding it as an answer.

I should say that I am answering this from the perspective of someone who learned about these ideas in other languages: I mostly write Python now, but it's possible my terminology is 'wrong' (for which read 'right, but latter-day languages like Python got it wrong'...). In particular I have intentionally skated over a bunch of Python-specific detail, and avoided dealing with things like like the mutability of bindings and the Python 3 nonlocal hack).

I also think the book's usage of the term 'deep binding' is confusing and perhaps wrong: see the end for a note on this. I have therefore largely ignored it.

Bindings, scope and extent

There are three important concepts.

A binding is the association between a name and something it denotes. The most common kind of binding is a variable binding which associates a variable name with a value: there are other kinds (for instance the bindings between exception classes and handlers for them established by try: ... except: ... constructs) but I'll only talk about variable bindings.

The scope of a binding is the region of the code it is accessible in.

The extent of a binding is how long it is accessible for.

There are several options for scope and extent. For variable bindings, Python has:

lexical scope, which means that a binding is accessible only to code for which it is visible in the source code;

and indefinite extent, which means that a binding exists as long as there is any possibility of reference.

(Another common scope is dynamic : a binding with dynamic scope is visible to any code for which it is visible in the source code and any code that is 'down the stack' from that code. Another common extent is definite , which means that a binding goes away the moment control leaves the construct which established it. Bindings for exception handlers have dynamic scope and definite extent in Python, for instance.)

What lexical scope means is that you can (almost) tell by reading the source what bindings a bit of code is referring to.

So consider this:

def outer():
    a = 2
    def inner(b):
        return a + b
    return inner(2)

There are two bindings here: a is bound in outer and b is bound in inner (actually, there are three: inner is also bound, to a function, in outer ). Each of these two bindings is referenced once: in inner (and the inner binding is referenced once, in outer ).

And the important thing is that, by reading the code, you can tell what the reference to a in inner is to: it's to the binding established in outer . That's what 'lexical' means: you (and the compiler, if it is smart enough) can tell what bindings exist by looking at the source.

This is only almost true. Consider this fragment:

def maybe(x):
    return x + y

There is one binding created, in maybe , but two references: x is a reference to a binding which is not known to exist. But it may exist: there may be a top-level binding of x which would make this code work. So there is a special caveat around lexical bindings: there is a 'top-level' environment which all definitions can 'see' and which can contain bindings. So if the fragment above was enlarged to read

x = 4
def maybe(x):
    return x + y

Then this code is fine, because maybe can 'see' the top-level environment (really, in Python, this is the bindings in the module in which it's defined).

Indefinite extent & first-class functions

For the above examples, the results would be the same with definite or indefinite extent. This stops being true if you consider first-class functions, which Python has. It stops being the case because functions are objects which, by being called, can reference bindings. Consider this:

def outer(x):
    def inner(y):
        return x + y
    return inner

There are three bindings here: outer binds x and inner , and inner binds y (and can see x and inner ). So, now, let add4 = outer(4) : what should add4(3) return (or, equivalently, what should outer(4)(3) return)? Well, the answer is 7 . And this is the case because the binding of x exists for as long as it can be referenced, or in other words, it exists for as long as any instances of inner exist, because they reference it. This is what 'indefinite extent' means.

(If Python had only definite extent, then outer(4)(3) would be some kind of error, since it would reference a binding which no longer exists. Languages with only definite extent can't really have first-class functions in any useful way.)

Something that's important to understand is that lexical scope tells you which bindings are visible, but the actual instances of those bindings which are visible are, of course, dynamic. So, if you consider something like this:

def adder(n):
    return lambda e: e + n

a1 = adder(12)
a2 = adder(15)

then a1 and a2 reference different bindings of n : a1(0) is 12 while a2(0) is 15 . So by reading the source code you can tell which bindings are captured, but you need to run it to know which instances of them are captured -- what the values of the variables are, in other words.

Compare that with this:

def signaller(initial):
    s = [initial]
    def setter(new):
        s[0] = new
        return new
    def getter():
        return s[0]
    return (setter, getter)

(str, gtr) = signaller(0)

Now, str and gtr capture the same binding of s , so str(1) will cause gtr() to return 1 .

Closures

So that's actually all there is to know. Except that there is some special terminology which people use, in particular the term 'closure'.

A closure is simply a function which refers to some lexical bindings outside its own definition. Such a function is said to 'close over' these bindings.

I think it would be a good question to ask why this term is needed at all? All you actually need to understand is the scope rules, from which everything else follows: why do you need this special term? I think the reason is partly historical and partly pragmatic:

historically, closures carried a lot of extra baggage in terms of all the closed-over bindings, so it was interesting to people whether a function was, or wasn't a closure;

pragmatically closures can have behaviour which depends on the bindings the close over in slightly non-obvious ways (see the example above) and this perhaps justifies the term.

The example code

The example code was

def A(I, P):
    def B():
        print(I)
    # body of A:
    if I > 1:
        P()
    else:
        A(2, B)

def C():
    pass # do nothing

A(1, C) # main program

So, when A is called, there is a local function, B which can see the binding of I (and also P & B itself, but it does not refer to these so we can ignore them). Each call to A creates new bindings for I , P & B , and these bindings are different for each call. This includes recursive calls, which is the trick that is being done here to confuse you.

So, what does A(1, C) , do?

It binds I to 1 , and B to a closure which can see this binding of I . It also binds P to the global (module) value of C , which is a function, but nothing refers to this binding.

Then it calls itself recursively (because I is 1 ) with arguments of 2 & the value of B , which is the closure that just got created.

In the recursive call, there are new bindings: I is now bound to 2 , and P is bound to the closure from the outer call.

A new closure, is created, which captures these bindings and is itself bound, in this inner call, to B . Nothing refers to this binding.

The closure bound to P is then called. It is the closure created in the outer call, and the binding it can see for I is the binding visible to it, whose value is 1 . So it prints 1 , and we're done.

You can see what is happening by changing the definition to print some useful information:

from __future__ import print_function # Python 2

def A(I, P):
    def B():
        print(I)
    print("{I} {P} {B}".format(I=I, P=P, B=B))
    if I > 1:
        P()
    else:
        A(2, B)

def C():
    pass

A(1, C)

This prints (for example):

1 <function C at 0x7f7a03768e60> <function B at 0x7f7a03768d70>
  recursing with (2, <function B at 0x7f7a03768d70>)
2 <function B at 0x7f7a03768d70> <function B at 0x7f7a03651a28>
  calling <function B at 0x7f7a03768d70>
1

Note that, in the inner call, there are two functions which identify themselves as B : one of them is the same as the function created in the outer call (which will be called), while the other one is the just-created closure, which is never referenced again.

Deep binding

I think the book's usage of the term 'deep binding' is at best confusing and in fact probably outright wrong: it is possible that this term has changed meaning, but it certainly did not originally mean what the book thinks it means.

The terms 'deep binding' and 'shallow binding' refer to implementation strategies for languages with dynamic scope. In a system with dynamic scope, a 'free' variable reference (one which is not bound by a particular bit of code) is looked up by searching, dynamically, up the call stack until a binding for it is found. So in a language with dynamic binding you can't tell by looking at a bit of code what bindings it can see (and neither can the compiler!), because the bindings it can see depend on what the call stack looks like at the moment it is run.

Dynamic binding is great for things like exception handlers but usually terrible for most variable bindings.

One reason it's terrible is that a naïve implementation technique makes it inherently slow, and a clever implementation technique needs to be very clever to work in a system with more than one thread of control.

Deep binding is the naïve implementation technique. In deep binding, variable access works the way you think: the system searches up the stack until it finds the binding for the name it's looking for, and then uses that. If the stack is deep and the binding is a long way up it, this is slow.

Shallow binding is the clever implementation technique. This works by, instead of storing the current binding in the stack, storing the previous value in the stack, and smashing the current value into a slot associated with the variable name which is always in the same place. So now looking up a binding just involves looking up the name: there is no search. But creating and destroying bindings may be slower since the old values need to be stashed away. Additionally, shallow binding is not obviously safe in the presence of multiple threads of control: if all the threads share the slot for the binding then catastrophe will follow. So instead each thread needs to have its own slot, or slots need to be indexed by thread as well as name somehow.

I suspect that, for cases where dynamic scope is used such as exception handlers, systems use deep binding because it is much simpler to get right in multithreaded systems and performance is not critical.

Here is the classic early paper on deep & shallow binding, by Henry Baker.

链接地址: http://www.djcxy.com/p/51128.html

上一篇: 变量在python中？

下一篇: 我该如何理解Python中这个深度绑定的例子？