What is the difference between a 'closure' and a 'lambda'?

Could someone explain? I understand the basic concepts behind them but I often see them used interchangeably and I get confused.

And now that we're here, how do they differ from a regular function?


A lambda is just an anonymous function - a function defined with no name. In some languages, such as Scheme, they are equivalent to named functions. In fact, the function definition is re-written as binding a lambda to a variable internally. In other languages, like Python, there are some (rather needless) distinctions between them, but they behave the same way otherwise.

A closure is any function which closes over the environment in which it was defined. This means that it can access variables not in its parameter list. Examples:

def func(): return h
def anotherfunc(h):
   return func()

This will cause an error, because func does not close over the environment in anotherfunc - h is undefined. func only closes over the global environment. This will work:

def anotherfunc(h):
    def func(): return h
    return func()

Because here, func is defined in anotherfunc , and in python 2.3 and greater (or some number like this) when they almost got closures correct (mutation still doesn't work), this means that it closes over anotherfunc 's environment and can access variables inside of it. In Python 3.1+, mutation works too when using the nonlocal keyword.

Another important point - func will continue to close over anotherfunc 's environment even when it's no longer being evaluated in anotherfunc . This code will also work:

def anotherfunc(h):
    def func(): return h
    return func

print anotherfunc(10)()

This will print 10.

This, as you notice, has nothing to do with lambdas - they are two different (although related) concepts.


When most people think of functions, they think of named functions :

function foo() { return "This string is returned from the 'foo' function"; }

These are called by name, of course:

foo(); //returns the string above

With lambda expressions, you can have anonymous functions :

 @foo = lambda() {return "This is returned from a function without a name";}

With the above example, you can call the lambda through the variable it was assigned to:

foo();

More useful than assigning anonymous functions to variables, however, are passing them to or from higher-order functions, ie, functions that accept/return other functions. In a lot of these cases, naming a function is unecessary:

function filter(list, predicate) 
 { @filteredList = [];
   for-each (@x in list) if (predicate(x)) filteredList.add(x);
   return filteredList;
 }

//filter for even numbers
filter([0,1,2,3,4,5,6], lambda(x) {return (x mod 2 == 0)}); 

A closure may be a named or anonymous function, but is known as such when it "closes over" variables in the scope where the function is defined, ie, the closure will still refer to the environment with any outer variables that are used in the closure itself. Here's a named closure:

@x = 0;

function incrementX() { x = x + 1;}

incrementX(); // x now equals 1

That doesn't seem like much but what if this was all in another function and you passed incrementX to an external function?

function foo()
 { @x = 0;

   function incrementX() 
    { x = x + 1;
      return x;
    }

   return incrementX;
 }

@y = foo(); // y = closure of incrementX over foo.x
y(); //returns 1 (y.x == 0 + 1)
y(); //returns 2 (y.x == 1 + 1)

This is how you get stateful objects in functional programming. Since naming "incrementX" isn't needed, you can use a lambda in this case:

function foo()
 { @x = 0;

   return lambda() 
           { x = x + 1;
             return x;
           };
 }

There is a lot of confusion around lambdas and closures, even in the answers to this StackOverflow question here. Instead of asking random programmers who learned about closures from practice with certain programming languages or other clueless programmers, take a journey to the source (where it all begun). And since lambdas and closures come from Lambda Calculus invented by Alonzo Church back in the '30s before first electronic computers even existed, this is the source I'm talking about.

Lambda Calculus is the simplest programming language in the world. The only things you can do in it:

  • APPLICATION: Applying one expression to another, denoted fx .
    (Think of it as a function call, where f is the function and x is its only parameter)
  • ABSTRACTION: Binds a symbol occurring in an expression to mark that this symbol is just a "slot", a blank box waiting to be filled with value, a "variable" as it were. It is done by prepending a Greek letter λ (lambda), then the symbolic name (eg x ), then a dot . before the expression. This then converts the expression into a function expecting one parameter.
    For example: λx.x+2 takes the expression x+2 and tells that the symbol x in this expression is a bound variable – it can be substituted with a value you supply as a parameter.
    Note that the function defined this way is anonymous – it doesn't have a name, so you can't refer to it yet, but you can immediately call it (remember application?) by supplying it the parameter it is waiting for, like this: (λx.x+2) 7 . Then the expression (in this case a literal value) 7 is substituted as x in the subexpression x+2 of the applied lambda, so you get 7+2 , which then reduces to 9 by common arithmetics rules.
  • So we've solved one of the mysteries:
    lambda is the anonymous function from the example above, λx.x+2 .


    In different programming languages, the syntax for functional abstraction (lambda) may differ. For example, in JavaScript it looks like this:

    function(x) { return x+2; }
    

    and you can immediately apply it to some parameter like this:

    function(x) { return x+2; } (7)
    

    or you can store this anonymous function (lambda) into some variable:

    var f = function(x) { return x+2; }
    

    which effectively gives it a name f , allowing you to refer to it and call it multiple times later, eg:

    alert(  f(7) + f(10)  );   // should print 21 in the message box
    

    But you didn't have to name it. You could call it immediately:

    alert(  function(x) { return x+2; } (7)  );  // should print 9 in the message box
    

    In LISP, lambdas are made like this:

    (lambda (x) (+ x 2))
    

    and you can call such a lambda by applying it immediately to a parameter:

    (  (lambda (x) (+ x 2))  7  )
    


    OK, now it's time to solve the other mystery: what is a closure. In order to do that, let's talk about symbols (variables) in lambda expressions.

    As I said, what the lambda abstraction does is binding a symbol in its subexpression, so that it becomes a substitutible parameter. Such a symbol is called bound. But what if there are other symbols in the expression? For example: λx.x/y+2 . In this expression, the symbol x is bound by the lambda abstraction λx. preceding it. But the other symbol, y , is not bound – it is free. We don't know what it is and where it comes from, so we don't know what it means and what value it represents, and therefore we cannot evaluate that expression until we figure out what y means.

    In fact, the same goes with the other two symbols, 2 and + . It's just that we are so familiar with these two symbols that we usually forget that the computer doesn't know them and we need to tell it what they mean by defining them somewhere, eg in a library or the language itself.

    You can think of the free symbols as defined somewhere else, outside the expression, in its "surrounding context", which is called its environment . The environment might be a bigger expression that this expression is a part of (as Qui-Gon Jinn said: "There's always a bigger fish" ;) ), or in some library, or in the language itself (as a primitive).

    This lets us divide lambda expressions into two categories:

  • CLOSED expressions: every symbol that occurs in these expressions is bound by some lambda abstraction. In other words, they are self-contained; they don't require any surrounding context to be evaluated. They are also called combinators.
  • OPEN expressions: some symbols in these expressions are not bound – that is, some of the symbols occurring in them are free and they require some external information, and thus they cannot be evaluated until you supply the definitions of these symbols.
  • You can CLOSE an open lambda expression by supplying the environment , which defines all these free symbols by binding them to some values (which may be numbers, strings, anonymous functions aka lambdas, whatever…).

    And here comes the closure part:
    The closure of a lambda expression is this particular set of symbols defined in the outer context (environment) that give values to the free symbols in this expression, making them non-free anymore. It turns an open lambda expression, which still contains some "undefined" free symbols, into a closed one, which doesn't have any free symbols anymore.

    For example, if you have the following lambda expression: λx.x/y+2 , the symbol x is bound, while the symbol y is free, therefore the expression is open and cannot be evaluated unless you say what y means (and the same with + and 2 , which are also free). But suppose that you also have an environment like this:

    {  y: 3,  +: [built-in addition],  2: [built-in number],  q: 42,  w: 5  }
    

    This environment supplies definitions for all the "undefined" (free) symbols from our lambda expression ( y , + , 2 ), and several extra symbols ( q , w ). The symbols that we need to be defined are this subset of the environment:

    {  y: 3,  +: [built-in addition],  2: [built-in number]  }
    

    and this is precisely the closure of our lambda expression :>

    In other words, it closes an open lambda expression. This is where the name closure came from in the first place, and this is why so many people's answers in this thread are not quite correct :P


    So why are they mistaken? Why do so many of them say that closures are some data structures in memory, or some features of the languages they use, or why do they confuse closures with lambdas? :P

    Well, the corporate marketoids of Sun/Oracle, Microsoft, Google etc. are to blame, because that's what they called these constructs in their languages (Java, C#, Go etc.). They often call "closures" what are supposed to be just lambdas. Or they call "closures" a particular technique they used to implement lexical scoping, that is, the fact that a function can access the variables that were defined in its outer scope at the time of its definition. They often say that the function "encloses" these variables, that is, captures them into some data structure to save them from being destroyed after the outer function finishes executing. But this is just made-up post factum "folklore etymology" and marketing, which only makes things more confusing, because every language vendor uses its own terminology.

    And it's even worse because of the fact that there's always a bit of truth in what they say, which does not allow you to easily dismiss it as false :P Let me explain:

    If you want to implement a language that uses lambdas as first-class citizens, you need to allow them to use symbols defined in their surrounding context (that is, to use free variables in your lambdas). And these symbols must be there even when the surrounding function returns. The problem is that these symbols are bound to some local storage of the function (usually on the call stack), which won't be there anymore when the function returns. Therefore, in order for a lambda to work the way you expect, you need to somehow "capture" all these free variables from its outer context and save them for later, even when the outer context will be gone. That is, you need to find the closure of your lambda (all these external variables it uses) and store it somewhere else (either by making a copy, or by preparing space for them upfront, somewhere else than on the stack). The actual method you use to achieve this goal is an "implementation detail" of your language. What's important here is the closure, which is the set of free variables from the environment of your lambda that need to be saved somewhere.

    It didn't took too long for people to start calling the actual data structure they use in their language's implementations to implement closure as the "closure" itself. The structure usually looks something like this:

    Closure {
       [pointer to the lambda function's machine code],
       [pointer to the lambda function's environment]
    }
    

    and these data structures are being passed around as parameters to other functions, returned from functions, and stored in variables, to represent lambdas, and allowing them to access their enclosing environment as well as the machine code to run in that context. But it's just a way (one of many) to implement closure, not the closure itself.

    As I explained above, the closure of a lambda expression is the subset of definitions in its environment that give values to the free variables contained in that lambda expression, effectively closing the expression (turning an open lambda expression, which cannot be evaluated yet, into a closed lambda expression, which can then be evaluated, since all the symbols contained in it are now defined).

    Anything else is just a "cargo cult" and "voo-doo magic" of programmers and language vendors unaware of the real roots of these notions.

    I hope that answers your questions. But if you had any follow-up questions, feel free to ask them in the comments, and I'll try to explain it better.

    链接地址: http://www.djcxy.com/p/1456.html

    上一篇: LINQ在特定属性上的Distinct()

    下一篇: '封'和'lambda'有什么区别?