What Exactly Is Contained Within a Obj._Closure_

What exactly is contained within a obj.__closure__?

Closure cells refer to values needed by the function but are taken from the surrounding scope.

When Python compiles a nested function, it notes any variables that it references but are only defined in a parent function (not globals) in the code objects for both the nested function and the parent scope. These are the co_freevars and co_cellvars attributes on the __code__ objects of these functions, respectively.

Then, when you actually create the nested function (which happens when the parent function is executed), those references are then used to attach a closure to the nested function.

A function closure holds a tuple of cells, one each for each free variable (named in co_freevars); cells are special references to local variables of a parent scope, that follow the values those local variables point to. This is best illustrated with an example:

def foo():
def bar():
print(spam)

spam = 'ham'
bar()
spam = 'eggs'
bar()
return bar

b = foo()
b()

In the above example, the function bar has one closure cell, which points to spam in the function foo. The cell follows the value of spam. More importantly, once foo() completes and bar is returned, the cell continues to reference the value (the string eggs) even though the variable spam inside foo no longer exists.

Thus, the above code outputs:

>>> b=foo()
ham
eggs
>>> b()
eggs

and b.__closure__[0].cell_contents is 'eggs'.

Note that the closure is dereferenced when bar() is called; the closure doesn't capture the value here. That makes a difference when you produce nested functions (with lambda expressions or def statements) that reference the loop variable:

def foo():
bar = []
for spam in ('ham', 'eggs', 'salad'):
bar.append(lambda: spam)
return bar

for bar in foo():
print bar()

The above will print salad three times in a row, because all three lambda functions reference the spam variable, not the value it was bound to when the function object was created. By the time the for loop finishes, spam was bound to 'salad', so all three closures will resolve to that value.

Array-like object in javascript

The author of the code used empty JavaScript object as a basis of a array like object, i.e. the one that can be accessed by index and has a length property.

There could be two reasons that I could think of:

  1. memory use - if array is backed by implementation that allocates n elements, when it reaches it's limits it grows by some factor to increase it's capacity, and thusly wastes capacity - length of memory
  2. cpu time - implementor choose insertion speed over random access speed -- a return from this method is more likely to be iterated over sequentially than random accessed, and resizing the array in insertion causes allocation-copy-deallocation which has a cpu penalty

I'm betting that similar code would be found in other JavaScript libraries, and that it's result of benchmarking and finding the best to fit solution across different browsers.

edited after comment by Justin

Upon further googling it appears that array-like objects are common among JavaScript developers: checkout JavaScript: the definitive guide by David Flanagan, it has a whole sub-chapter on Array-like objects. Also these guys mention them.

No mention of why should one prefer array-like vs array object. This could be a good SO question.

So a third option could be the key: compliance with norms of JavaScript API's.

JavaScript closure and the this object

As soon as you call:

return func(par);

You're creating a new scope (with its own this) and in this case because you haven't specified an object, this === window usually or undefined in strict mode. The called function does not inherit whatever this was in the calling scope.

Ways to set a value for this are:

myobj.func(par);  // this === myobj

or

func.call(myobj, ...)  // this === myobj

There are also:

  • apply
  • bind
  • arrow functions, where this is set to the same value as the outer execution context (also see MDN:Arrow functions )

What exactly happens in the JVM when invoking an object's instance method?

Ordinary instance method invocations get compiled to invokevirtual instructions.

This has been described in JVMS, §3.7. Invoking Methods:

The normal method invocation for a instance method dispatches on the run-time type of the object. (They are virtual, in C++ terms.) Such an invocation is implemented using the invokevirtual instruction, which takes as its argument an index to a run-time constant pool entry giving the internal form of the binary name of the class type of the object, the name of the method to invoke, and that method's descriptor (§4.3.3). To invoke the addTwo method, defined earlier as an instance method, we might write:

int add12and13() {
return addTwo(12, 13);
}

This compiles to:

Method int add12and13()
0 aload_0 // Push local variable 0 (this)
1 bipush 12 // Push int constant 12
3 bipush 13 // Push int constant 13
5 invokevirtual #4 // Method Example.addtwo(II)I
8 ireturn // Return int on top of operand stack;
// it is the int result of addTwo()

The invocation is set up by first pushing a reference to the current instance, this, on to the operand stack. The method invocation's arguments, int values 12 and 13, are then pushed. When the frame for the addTwo method is created, the arguments passed to the method become the initial values of the new frame's local variables. That is, the reference for this and the two arguments, pushed onto the operand stack by the invoker, will become the initial values of local variables 0, 1, and 2 of the invoked method.


It’s up to the particular JVM implementation, how to perform the invocation at runtime, but using a vtable is very common. This basically matches the graphic in your question. The reference to the receiver object, which will become the this reference for the invoked method, is used to retrieve a method table.

In the HotSpot JVM, the metadata structure is called Klass (actually a common name, even across different implementations). See “Object header layout” on the OpenJDK Wiki:

An object header consists of a native-sized mark word, a klass word, a 32-bit length word (if the object is an array), a 32-bit gap (if required by alignment rules), and then zero or more instance fields, array elements, or metadata fields. (Interesting Trivia: Klass metaobjects contain a C++ vtable immediately after the klass word.)

When resolving a symbolic reference to a method, its corresponding index in the table will be identified and remembered for subsequent invocations, as it never changes. Then, the entry of the actual object’s class can be used for the invocation. Subclasses will have the entries of the superclass, new methods appended to the end, with the entries of overridden methods replaced.


This is the simple, unoptimized scenario. Most runtime optimizations work better when methods are inlined, to have the context of caller and callee in one piece of code to transform. Therefore, the HotSpot JVM will attempt inlining even for invokevirtual instructions to potentially overridable methods. As the wiki says:

  • Virtual (and interface) invocations are often demoted to "special" invocations, if the class hierarchy permits it. A dependency is registered in case further class loading spoils things.
  • Virtual (and interface) invocations with a lopsided type profile are compiled with an optimistic check in favor of the historically common type (or two types).
  • Depending on the profile, a failure of the optimistic check will either deoptimize or run through a (slow) vtable/itable call.
  • On the fast path of an optimistically typed call, inlining is common. The best case is a de facto monomorphic call which is inlined. Such calls, if back-to-back, will perform the receiver type check only once.

This aggressive or optimistic inlining will sometime require Deoptimization but will usually yield an overall higher performance.

How to determine if object is in array

Use something like this:

function containsObject(obj, list) {
var i;
for (i = 0; i < list.length; i++) {
if (list[i] === obj) {
return true;
}
}

return false;
}

In this case, containsObject(car4, carBrands) is true. Remove the carBrands.push(car4); call and it will return false instead. If you later expand to using objects to store these other car objects instead of using arrays, you could use something like this instead:

function containsObject(obj, list) {
var x;
for (x in list) {
if (list.hasOwnProperty(x) && list[x] === obj) {
return true;
}
}

return false;
}

This approach will work for arrays too, but when used on arrays it will be a tad slower than the first option.

What is a 'Closure'?

Variable scope

When you declare a local variable, that variable has a scope. Generally, local variables exist only within the block or function in which you declare them.

function() {
var a = 1;
console.log(a); // works
}
console.log(a); // fails

If I try to access a local variable, most languages will look for it in the current scope, then up through the parent scopes until they reach the root scope.

var a = 1;
function() {
console.log(a); // works
}
console.log(a); // works

When a block or function is done with, its local variables are no longer needed and are usually blown out of memory.

This is how we normally expect things to work.

A closure is a persistent local variable scope

A closure is a persistent scope which holds on to local variables even after the code execution has moved out of that block. Languages which support closure (such as JavaScript, Swift, and Ruby) will allow you to keep a reference to a scope (including its parent scopes), even after the block in which those variables were declared has finished executing, provided you keep a reference to that block or function somewhere.

The scope object and all its local variables are tied to the function and will persist as long as that function persists.

This gives us function portability. We can expect any variables that were in scope when the function was first defined to still be in scope when we later call the function, even if we call the function in a completely different context.

For example

Here's a really simple example in JavaScript that illustrates the point:

outer = function() {
var a = 1;
var inner = function() {
console.log(a);
}
return inner; // this returns a function
}

var fnc = outer(); // execute outer to get inner
fnc();

Here I have defined a function within a function. The inner function gains access to all the outer function's local variables, including a. The variable a is in scope for the inner function.

Normally when a function exits, all its local variables are blown away. However, if we return the inner function and assign it to a variable fnc so that it persists after outer has exited, all of the variables that were in scope when inner was defined also persist. The variable a has been closed over -- it is within a closure.

Note that the variable a is totally private to fnc. This is a way of creating private variables in a functional programming language such as JavaScript.

As you might be able to guess, when I call fnc() it prints the value of a, which is "1".

In a language without closure, the variable a would have been garbage collected and thrown away when the function outer exited. Calling fnc would have thrown an error because a no longer exists.

In JavaScript, the variable a persists because the variable scope is created when the function is first declared and persists for as long as the function continues to exist.

a belongs to the scope of outer. The scope of inner has a parent pointer to the scope of outer. fnc is a variable which points to inner. a persists as long as fnc persists. a is within the closure.

Further reading (watching)

I made a YouTube video looking at this code with some practical examples of usage.

What do lambda function closures capture?

What do the closures capture exactly?

Closures in Python use lexical scoping: they remember the name and scope of the closed-over variable where it is created. However, they are still late binding: the name is looked up when the code in the closure is used, not when the closure is created. Since all the functions in your example are created in the same scope and use the same variable name, they always refer to the same variable.

There are at least two ways to get early binding instead:

  1. The most concise, but not strictly equivalent way is the one recommended by Adrien Plisson. Create a lambda with an extra argument, and set the extra argument's default value to the object you want preserved.

  2. More verbosely but also more robustly, we can create a new scope for each created lambda:

    >>> adders = [0,1,2,3]
    >>> for i in [0,1,2,3]:
    ... adders[i] = (lambda b: lambda a: b + a)(i)
    ...
    >>> adders[1](3)
    4
    >>> adders[2](3)
    5

    The scope here is created using a new function (another lambda, for brevity), which binds its argument, and passing the value you want to bind as the argument. In real code, though, you most likely will have an ordinary function instead of the lambda to create the new scope:

    def createAdder(x):
    return lambda y: y + x
    adders = [createAdder(i) for i in range(4)]


Related Topics



Leave a reply



Submit