Accessing Class Variables from a List Comprehension in the Class Definition

Accessing class variables from a list comprehension in the class definition

Class scope and list, set or dictionary comprehensions, as well as generator expressions do not mix.

The why; or, the official word on this

In Python 3, list comprehensions were given a proper scope (local namespace) of their own, to prevent their local variables bleeding over into the surrounding scope (see List comprehension rebinds names even after scope of comprehension. Is this right?). That's great when using such a list comprehension in a module or in a function, but in classes, scoping is a little, uhm, strange.

This is documented in pep 227:

Names in class scope are not accessible. Names are resolved in
the innermost enclosing function scope. If a class definition
occurs in a chain of nested scopes, the resolution process skips
class definitions.

and in the class compound statement documentation:

The class’s suite is then executed in a new execution frame (see section Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains only function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [4] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary.

Emphasis mine; the execution frame is the temporary scope.

Because the scope is repurposed as the attributes on a class object, allowing it to be used as a nonlocal scope as well leads to undefined behaviour; what would happen if a class method referred to x as a nested scope variable, then manipulates Foo.x as well, for example? More importantly, what would that mean for subclasses of Foo? Python has to treat a class scope differently as it is very different from a function scope.

Last, but definitely not least, the linked Naming and binding section in the Execution model documentation mentions class scopes explicitly:

The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
a = 42
b = list(a + i for i in range(10))

So, to summarize: you cannot access the class scope from functions, list comprehensions or generator expressions enclosed in that scope; they act as if that scope does not exist. In Python 2, list comprehensions were implemented using a shortcut, but in Python 3 they got their own function scope (as they should have had all along) and thus your example breaks. Other comprehension types have their own scope regardless of Python version, so a similar example with a set or dict comprehension would break in Python 2.

# Same error, in Python 2 or 3
y = {x: x for i in range(1)}

The (small) exception; or, why one part may still work

There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable. In your example, it's the range(1):

y = [x for i in range(1)]
# ^^^^^^^^

Thus, using x in that expression would not throw an error:

# Runs fine
y = [i for i in range(x)]

This only applies to the outermost iterable; if a comprehension has multiple for clauses, the iterables for inner for clauses are evaluated in the comprehension's scope:

# NameError
y = [i for i in range(1) for j in range(x)]
# ^^^^^^^^^^^^^^^^^ -----------------
# outer loop inner, nested loop

This design decision was made in order to throw an error at genexp creation time instead of iteration time when creating the outermost iterable of a generator expression throws an error, or when the outermost iterable turns out not to be iterable. Comprehensions share this behavior for consistency.

Looking under the hood; or, way more detail than you ever wanted

You can see this all in action using the dis module. I'm using Python 3.3 in the following examples, because it adds qualified names that neatly identify the code objects we want to inspect. The bytecode produced is otherwise functionally identical to Python 3.2.

To create a class, Python essentially takes the whole suite that makes up the class body (so everything indented one level deeper than the class <name>: line), and executes that as if it were a function:

>>> import dis
>>> def foo():
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo)
2 0 LOAD_BUILD_CLASS
1 LOAD_CONST 1 (<code object Foo at 0x10a436030, file "<stdin>", line 2>)
4 LOAD_CONST 2 ('Foo')
7 MAKE_FUNCTION 0
10 LOAD_CONST 2 ('Foo')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_FAST 0 (Foo)

5 19 LOAD_FAST 0 (Foo)
22 RETURN_VALUE

The first LOAD_CONST there loads a code object for the Foo class body, then makes that into a function, and calls it. The result of that call is then used to create the namespace of the class, its __dict__. So far so good.

The thing to note here is that the bytecode contains a nested code object; in Python, class definitions, functions, comprehensions and generators all are represented as code objects that contain not only bytecode, but also structures that represent local variables, constants, variables taken from globals, and variables taken from the nested scope. The compiled bytecode refers to those structures and the python interpreter knows how to access those given the bytecodes presented.

The important thing to remember here is that Python creates these structures at compile time; the class suite is a code object (<code object Foo at 0x10a436030, file "<stdin>", line 2>) that is already compiled.

Let's inspect that code object that creates the class body itself; code objects have a co_consts structure:

>>> foo.__code__.co_consts
(None, <code object Foo at 0x10a436030, file "<stdin>", line 2>, 'Foo')
>>> dis.dis(foo.__code__.co_consts[1])
2 0 LOAD_FAST 0 (__locals__)
3 STORE_LOCALS
4 LOAD_NAME 0 (__name__)
7 STORE_NAME 1 (__module__)
10 LOAD_CONST 0 ('foo.<locals>.Foo')
13 STORE_NAME 2 (__qualname__)

3 16 LOAD_CONST 1 (5)
19 STORE_NAME 3 (x)

4 22 LOAD_CONST 2 (<code object <listcomp> at 0x10a385420, file "<stdin>", line 4>)
25 LOAD_CONST 3 ('foo.<locals>.Foo.<listcomp>')
28 MAKE_FUNCTION 0
31 LOAD_NAME 4 (range)
34 LOAD_CONST 4 (1)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 GET_ITER
41 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
44 STORE_NAME 5 (y)
47 LOAD_CONST 5 (None)
50 RETURN_VALUE

The above bytecode creates the class body. The function is executed and the resulting locals() namespace, containing x and y is used to create the class (except that it doesn't work because x isn't defined as a global). Note that after storing 5 in x, it loads another code object; that's the list comprehension; it is wrapped in a function object just like the class body was; the created function takes a positional argument, the range(1) iterable to use for its looping code, cast to an iterator. As shown in the bytecode, range(1) is evaluated in the class scope.

From this you can see that the only difference between a code object for a function or a generator, and a code object for a comprehension is that the latter is executed immediately when the parent code object is executed; the bytecode simply creates a function on the fly and executes it in a few small steps.

Python 2.x uses inline bytecode there instead, here is output from Python 2.7:

  2           0 LOAD_NAME                0 (__name__)
3 STORE_NAME 1 (__module__)

3 6 LOAD_CONST 0 (5)
9 STORE_NAME 2 (x)

4 12 BUILD_LIST 0
15 LOAD_NAME 3 (range)
18 LOAD_CONST 1 (1)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 12 (to 40)
28 STORE_NAME 4 (i)
31 LOAD_NAME 2 (x)
34 LIST_APPEND 2
37 JUMP_ABSOLUTE 25
>> 40 STORE_NAME 5 (y)
43 LOAD_LOCALS
44 RETURN_VALUE

No code object is loaded, instead a FOR_ITER loop is run inline. So in Python 3.x, the list generator was given a proper code object of its own, which means it has its own scope.

However, the comprehension was compiled together with the rest of the python source code when the module or script was first loaded by the interpreter, and the compiler does not consider a class suite a valid scope. Any referenced variables in a list comprehension must look in the scope surrounding the class definition, recursively. If the variable wasn't found by the compiler, it marks it as a global. Disassembly of the list comprehension code object shows that x is indeed loaded as a global:

>>> foo.__code__.co_consts[1].co_consts
('foo.<locals>.Foo', 5, <code object <listcomp> at 0x10a385420, file "<stdin>", line 4>, 'foo.<locals>.Foo.<listcomp>', 1, None)
>>> dis.dis(foo.__code__.co_consts[1].co_consts[2])
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_GLOBAL 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

This chunk of bytecode loads the first argument passed in (the range(1) iterator), and just like the Python 2.x version uses FOR_ITER to loop over it and create its output.

Had we defined x in the foo function instead, x would be a cell variable (cells refer to nested scopes):

>>> def foo():
... x = 2
... class Foo:
... x = 5
... y = [x for i in range(1)]
... return Foo
...
>>> dis.dis(foo.__code__.co_consts[2].co_consts[2])
5 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_DEREF 0 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

The LOAD_DEREF will indirectly load x from the code object cell objects:

>>> foo.__code__.co_cellvars               # foo function `x`
('x',)
>>> foo.__code__.co_consts[2].co_cellvars # Foo class, no cell variables
()
>>> foo.__code__.co_consts[2].co_consts[2].co_freevars # Refers to `x` in foo
('x',)
>>> foo().y
[2]

The actual referencing looks the value up from the current frame data structures, which were initialized from a function object's .__closure__ attribute. Since the function created for the comprehension code object is discarded again, we do not get to inspect that function's closure. To see a closure in action, we'd have to inspect a nested function instead:

>>> def spam(x):
... def eggs():
... return x
... return eggs
...
>>> spam(1).__code__.co_freevars
('x',)
>>> spam(1)()
1
>>> spam(1).__closure__
>>> spam(1).__closure__[0].cell_contents
1
>>> spam(5).__closure__[0].cell_contents
5

So, to summarize:

  • List comprehensions get their own code objects in Python 3, and there is no difference between code objects for functions, generators or comprehensions; comprehension code objects are wrapped in a temporary function object and called immediately.
  • Code objects are created at compile time, and any non-local variables are marked as either global or as free variables, based on the nested scopes of the code. The class body is not considered a scope for looking up those variables.
  • When executing the code, Python has only to look into the globals, or the closure of the currently executing object. Since the compiler didn't include the class body as a scope, the temporary function namespace is not considered.

A workaround; or, what to do about it

If you were to create an explicit scope for the x variable, like in a function, you can use class-scope variables for a list comprehension:

>>> class Foo:
... x = 5
... def y(x):
... return [x for i in range(1)]
... y = y(x)
...
>>> Foo.y
[5]

The 'temporary' y function can be called directly; we replace it when we do with its return value. Its scope is considered when resolving x:

>>> foo.__code__.co_consts[1].co_consts[2]
<code object y at 0x10a5df5d0, file "<stdin>", line 4>
>>> foo.__code__.co_consts[1].co_consts[2].co_cellvars
('x',)

Of course, people reading your code will scratch their heads over this a little; you may want to put a big fat comment in there explaining why you are doing this.

The best work-around is to just use __init__ to create an instance variable instead:

def __init__(self):
self.y = [self.x for i in range(1)]

and avoid all the head-scratching, and questions to explain yourself. For your own concrete example, I would not even store the namedtuple on the class; either use the output directly (don't store the generated class at all), or use a global:

from collections import namedtuple
State = namedtuple('State', ['name', 'capital'])

class StateDatabase:
db = [State(*args) for args in [
('Alabama', 'Montgomery'),
('Alaska', 'Juneau'),
# ...
]]

Scope of class variable with list comprehension

This isn't so much about the variable scope in list comprehensions as about the way classes work. In Python 3 (but not in Python 2!), list comprehensions don't affect the scope around them:

>>> [i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined
>>> i = 0
>>> [i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
0

However, when you do that in a class, it won't look for b among the class attributes the way it would in the local scope of a module or function. To do what you are trying to do, use the @property decorator:

>>> class a:
... s = 'python'
... b = 'py'
... @property
... def c(self):
... return [x for x in self.s if x in self.b]
...
>>> A = a()
>>> A.c
['p', 'y']

Also, remember that strings are iterable too (they're just lists of their component characters), so no need to explicitly make b a list.

Reference class variable in a comprehension of another class variable

A basic list comprehension has the following syntax

[expression for var in iterable]

When a list comprehension occurs inside a class, the attributes of the class
can be used in iterable. This is true in Python2 and Python3.

However, the attributes of the class can be used (i.e. accessed) in expression in Python2 but not in Python3.

The story is a bit different for generator expressions:

(expression for var in iterable)

While the class attributes can still be accessed from iterable, the class attributes are not accessible from expression. (This is true for Python2 and Python3).

This can all be summarized as follows:

                             Python2      Python3
Can access class attributes
--------------------------------------------------
list comp. iterable Y Y
list comp. expression Y N
gen expr. iterable Y Y
gen expr. expression N N
dict comp. iterable Y Y
dict comp. expression N N

(Dict comprehensions behave the same as generator expressions in this respect.)


Now how does this relate to your question:

In your example,

second_d = dict((k,first_d[k]) for k in (2,3))

a NameError occurs because first_d is not accessible from the expression part of a generator expression.

A workaround for Python2 would be to change the generator expression to a list comprehension:

second_d = dict([(k,first_d[k]) for k in (2,3)])

However, I don't find this a very comfortable solution since this code will fail in Python3.

You could do as Joel Cornett suggests:

second_d = {k: v for k, v in first_d.items() if k in (2, 3)}

since this uses first_d in the iterable rather than the expression part of the dict comprehension. But this may loop through many more items than necessary if first_d contains many items. Neverthess, this solution might be just fine if first_d is small.

In general, you can avoid this problem by defining a helper function which can be defined inside or outside the class:

def partial_dict(dct, keys):
return {k:dct[k] for k in keys}

class Example(object):
first_d = {1:1,2:2,3:3,4:4}
second_d = partial_dict(first_d, (2,3))

class Example2(object):
a = [1,2,3,4,5]
b = [2,4]
def myfunc(A, B):
return [x for x in A if x not in B]
c = myfunc(a, b)

print(Example().second_d)
# {2: 2, 3: 3}

print(Example2().c)
# [1, 3, 5]

Functions work because they define a local scope and
variables in this local scope can be accessed from within the dict comprehension.

This was explained here, but I am not entirely comfortable with this since it does not explain why the expression part behaves differently than the iterable part of a list comprehension, generator expression or dict comprehension.

Thus I can not explain (completely) why Python behaves this way, only that this is the way it appears to behave.

Python list comprehension with class member values

Use itertools.product() and a list comprehension:

import itertools

resultList = [f for f, b in itertools.product(FooList, BarList) if f.KeyFrameTime == b.KeyFrameTime]

Python list comprehension using a class variable throws NameError

This dictionary comprehension will be suitable for your needs:

order2 = {choice[0]: idx for idx, choice in enumerate(choices)}

NameError referencing class variables in comprehension

Note on Resolution of Names in Python:

Class definition blocks and arguments to exec() and eval() are special
in the context of name resolution. A class definition is an executable
statement that may use and define names. These references follow the
normal rules for name resolution with an exception that unbound local
variables are looked up in the global namespace. The namespace of the
class definition becomes the attribute dictionary of the class.
The scope of names defined in a class block is limited to the class
block; it does not extend to the code blocks of methods – this
includes comprehensions and generator expressions since they are
implemented using a function scope. This means that the following will
fail.

You have declared a variable ID inside the class definition that makes it an class or static variable. So its scope is limited the to class block, and hence you can't access it inside the comprehensions.

you can read more about it at python docs

Consider the examples,

Example #1:

class A:
ID = "This is class A."
print(ID)

Now when you execute >>>A() the output will be This is class A which is totally fine because the scope of variable ID is limited to class A


Example #2:

class B:
L = [ 1, 2, 3, 4, 5, 6, 7]
print([i * 2 for i in L])

Now when you execute >>>B() the output will be [2, 4, 6, 8, 10, 12, 14] which is totally fine because the scope of list L is limited to class B


Example #3:

class C:
L = [ 1, 2, 3, 4, 5, 6, 7]
print([L[i] * 2 for i in range(7)])

Now executing >>>C() will raise a NameError stating that name L is not defined.

Why is one class variable not defined in list comprehension but another is?

data is the source of the list comprehension; it is the one parameter that is passed to the nested scope created.

Everything in the list comprehension is run in a separate scope (as a function, basically), except for the iterable used for the left-most for loop. You can see this in the byte code:

>>> def foo():
... return [i for i in data]
...
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x105390390, file "<stdin>", line 2>)
3 LOAD_CONST 2 ('foo.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (data)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 RETURN_VALUE

The <listcomp> code object is called like a function, and iter(data) is passed in as the argument (CALL_FUNCTION is executed with 1 positional argument, the GET_ITER result).

The <listcomp> code object looks for that one argument:

>>> dis.dis(foo.__code__.co_consts[1])
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (i)
12 LOAD_FAST 1 (i)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

The LOAD_FAST call refers to the first and only positional argument passed in; it is unnamed here because there never was a function definition to give it a name.

Any additional names used in the list comprehension (or set or dict comprehension, or generator expression, for that matter) are either locals, closures or globals, not parameters.

If you go back to my answer to that question, look for the section titled The (small) exception; or, why one part may still work; I tried to cover this specific point there:

There's one part of a comprehension or generator expression that executes in the surrounding scope, regardless of Python version. That would be the expression for the outermost iterable.



Related Topics



Leave a reply



Submit