Why Does Defining _Getitem_ on a Class Make It Iterable in Python

Why does defining __getitem__ on a class make it iterable in python?

If you take a look at PEP234 defining iterators, it says:

1. An object can be iterated over with "for" if it implements
__iter__() or __getitem__().

2. An object can function as an iterator if it implements next().

What's the difference between __iter__ and __getitem__?

Yes, this is an intended design. It is documented, well-tested, and relied upon by sequence types such as str.

The __getitem__ version is a legacy before Python had modern iterators. The idea was that any sequence (something that is indexable and has a length) would be automatically iterable using the series s[0], s[1], s[2], ... until IndexError or StopIteration is raised.

In Python 2.7 for example, strings are iterable because of the __getitem__ method (the str type does not have an __iter__ method).

In contrast, the iterator protocol lets any class be iterable without necessarily being indexable (dicts and sets for example).

Here is how to make an iterable class using the legacy style for sequences:

>>> class A:
def __getitem__(self, index):
if index >= 10:
raise IndexError
return index * 111

>>> list(A())
[0, 111, 222, 333, 444, 555, 666, 777, 888, 999]

Here is how to make an iterable using the __iter__ approach:

>>> class B:
def __iter__(self):
yield 10
yield 20
yield 30

>>> list(B())
[10, 20, 30]

For those who are interested in the details, the relevant code is in Objects/iterobject.c:

static PyObject *
iter_iternext(PyObject *iterator)
{
seqiterobject *it;
PyObject *seq;
PyObject *result;

assert(PySeqIter_Check(iterator));
it = (seqiterobject *)iterator;
seq = it->it_seq;
if (seq == NULL)
return NULL;

result = PySequence_GetItem(seq, it->it_index);
if (result != NULL) {
it->it_index++;
return result;
}
if (PyErr_ExceptionMatches(PyExc_IndexError) ||
PyErr_ExceptionMatches(PyExc_StopIteration))
{
PyErr_Clear();
Py_DECREF(seq);
it->it_seq = NULL;
}
return NULL;
}

and in Objects/abstract.c:

int
PySequence_Check(PyObject *s)
{
if (s == NULL)
return 0;
if (PyInstance_Check(s))
return PyObject_HasAttrString(s, "__getitem__");
if (PyDict_Check(s))
return 0;
return s->ob_type->tp_as_sequence &&
s->ob_type->tp_as_sequence->sq_item != NULL;
}

Why does making a class iterable produce this output?

Because the for-loop is implemented for objects that define __getitem__ but not __iter__ by passing successive indices to the object's __getitem__ method. See the effbot. (What really happens under the covers IIUC is a bit more complicated: if the object doesn't provide __iter__, then iter is called on the object, and the iterator that iter returns does the calling of the underlying object's __getitem__.)

Is there a way to disable iteration for classes that define a __getitem__ method without putting constraints on the key?

If the goal is to stop the object from being an iterable, you can just force an error on __iter__ method:

class Test:
def __getitem__(self, key):
if key > 9:
raise KeyError
return key
def __iter__(self):
raise TypeError('Object is not iterable')

Test run:

>>> t = Test()
>>> for x in t:
print(x)

Traceback (most recent call last):
File "<pyshell#126>", line 1, in <module>
for x in t:
File "<pyshell#122>", line 7, in __iter__
raise TypeError('Object is not iterable')
TypeError: Object is not iterable

But __getitem__ will still work:

>>> t[0]
0
>>> t[1]
1
>>> t[10]
Traceback (most recent call last):
File "<pyshell#129>", line 1, in <module>
t[10]
File "<pyshell#122>", line 4, in __getitem__
raise KeyError
KeyError

How do dunder methods __getitem__ and __len__ provide iteration?

A iterable is a class which defines either __iter__ or __getitem__, no need for __len__.

The difference between the __iter__ implementation and the __getitem__ implementataion is:
__iter__ calls __next__ on the object that returned from __iter__ (aka iterator), until it reaches StopIteration and that's where the for loop stops.
However __getitem__, starts from zero (always), and each iteration it increments by one, until it reaches IndexError, and it does that by obj[idx].

For instance:

class GetItem:
def __getitem__(self, idx):
if idx == 10:
raise IndexError
return idx

for i in GetItem():
print(i)

The result will be

0
1
2
...
9

because as soon as the index gets to 10, it raises IndexError and the loop stops.

__iter__ on the other hand,

class Iter:
def __iter__(self):
self.n = 0
return self

def __next__(self):
self.n += 1
if self.n == 10:
raise StopIteration
return self.n

for i in Iter():
print(i)

Here, you need to keep track of the state by yourself, whereas in __getitem__ it does it by itself, it's better for counting/indexing and such.

Iterating over dictionary using __getitem__ in python

A for loop works with iterators, objects you can pass to next. An object is an iterator if it has a __next__ method.

Neither of your classes does, so Python will first pass your object to iter to get an iterator. The first thing iter tries to do is call the object's __iter__ method.

Neither of your classes defines __iter__, either, so iter next checks if its object defines __getitem__. Both of your classes do, so iter returns an object of type iterator, whose __next__ method can be imagined to be something like

def __next__(self):
try:
rv = self.thing.__getitem__(self.i)
except IndexError:
raise StopIteration
self.i += 1
return rv

(The iterator holds a reference to the thing which defined __getitem__, as well as the value of i to track state between calls to __next__. i is presumed to be initialized to 0.)

For Array, this works, because it has integer indices. For Dictionary, though, 0 is not a key, and instead of raising an IndexError, you get a KeyError with the __next__ method does not catch.

(This is alluded to in the documentation for __getitem__:

Note for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence.

)

To make your Dictionary class iterable, define __iter__

class Dictionary:
def __init__(self):
self.dictionary = {'a' : 1, 'b' : 2, 'c': 3}

def __getitem__(self,key):
return self.dictionary[key]

def __iter__(self):
return iter(self.dictionary)

dict.__iter__ returns a value of type dict_keyiterator, which is the thing that yields the dict's keys, which you can use with Dictionary.__getitem__.

How can I make a class in python support __getitem__, but not allow iteration?

I think a slightly better solution would be to raise a TypeError rather than a plain exception (this is what normally happens with a non-iterable class:

class A(object):
# show what happens with a non-iterable class with no __getitem__
pass

class B(object):
def __getitem__(self, k):
return k
def __iter__(self):
raise TypeError('%r object is not iterable'
% self.__class__.__name__)

Testing:

>>> iter(A())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'A' object is not iterable
>>> iter(B())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "iter.py", line 9, in __iter__
% self.__class__.__name__)
TypeError: 'B' object is not iterable

Why does defining __getitem__ on a class make it iterable in python?

If you take a look at PEP234 defining iterators, it says:

1. An object can be iterated over with "for" if it implements
__iter__() or __getitem__().

2. An object can function as an iterator if it implements next().


Related Topics



Leave a reply



Submit