Is There Any Built-In Way to Get the Length of an Iterable in Python

Is there any built-in way to get the length of an iterable in python?

Short of iterating through the iterable and counting the number of iterations, no. That's what makes it an iterable and not a list. This isn't really even a python-specific problem. Look at the classic linked-list data structure. Finding the length is an O(n) operation that involves iterating the whole list to find the number of elements.

As mcrute mentioned above, you can probably reduce your function to:

def count_iterable(i):
return sum(1 for e in i)

Of course, if you're defining your own iterable object you can always implement __len__ yourself and keep an element count somewhere.

Getting number of elements in an iterator in Python

No. It's not possible.

Example:

import random

def gen(n):
for i in xrange(n):
if random.randint(0, 1) == 0:
yield i

iterator = gen(10)

Length of iterator is unknown until you iterate through it.

How can I find length of iterator in python without exhausting it?

To get a copy of an iterator so that you can operate on it independently of the original, you can use itertools.tee. You can test if an iterator is empty by seeing if it throws StopIteration.

So you could do something like:

def isempty(it):
try:
itcpy = itertools.tee(it,1)[0]
itcpy.next()
return False
except StopIteration:
return True

def empty_iterator():
if False:
yield

it = empty_iterator()
if not isempty(it):
# won't print
print(len(list(it)))

it = xrange(4)
if not isempty(it):
# will print
print(len(list(it)))

Are there relevant iterables with no length in Python?

A file has no length:

>>> with open("test") as f:
... print(len(f))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type '_io.TextIOWrapper' has no len()

Iterating through a file like that in open iterates over lines, i.e. chunks of text delimited by newline characters. To know how many lines there are, the file would have to be read entirely and then iterated through - depending on the size of the file this could take a long time or the computer could run out of RAM.

Does the len() built-in function iterates through the collection to calculate its length, or does it access a collection's attribute?

Python built-in collections cache the length in an attribute. Using len() will not iterate over all elements to count them, no.

Keeping the length as an attribute is cheap and easy to maintain. Given that the Python built-in collection types are used so widely, it'd be unwise for them not to do this.

Python built-in types with a length are typically built on top of the PyObject_VAR_HEAD struct, which includes an ob_size entry. The Py_SIZE macro can then be used to implement the object.__len__ method (e.g. the PySequenceMethods.sq_length slot in the C-API). See the listobject.c implementation for list_length for example.

Length of generator output

There isn't one because you can't do it in the general case - what if you have a lazy infinite generator? For example:

def fib():
a, b = 0, 1
while True:
a, b = b, a + b
yield a

This never terminates but will generate the Fibonacci numbers. You can get as many Fibonacci numbers as you want by calling next().

If you really need to know the number of items there are, then you can't iterate through them linearly one time anyway, so just use a different data structure such as a regular list.

Understanding len function with iterators

The biggest reason is that it reduces type safety.

How many programs have you written where you actually needed to consume an iterable just to know how many elements it had, throwing away anything else?

I, in quite a few years of coding in Python, never needed that. It's a non-sensical operation in normal programs. An iterator may not have a length (e.g. infinite iterators or generators that expects inputs via send()), so asking for it doesn't make much sense. The fact that len(an_iterator) produces an error means that you can find bugs in your code. You can see that in a certain part of the program you are calling len on the wrong thing, or maybe your function actually needs a sequence instead of an iterator as you expected.

Removing such errors would create a new class of bugs where people, calling len, erroneously consume an iterator, or use an iterator as if it were a sequence without realizing.

If you really need to know the length of an iterator, what's wrong with len(list(iterator))? The extra 6 characters? It's trivial to write your own version that works for iterators, but, as I said, 99% of the time this simply means that something with your code is wrong, because such an operation doesn't make much sense.

The second reason is that, with that change, you are violating two nice properties of len that currently hold for all (known) containers:

  • It is known to be cheap on all containers ever implemented in Python (all built-ins, standard library, numpy & scipy and all other big third party libraries do this on both dynamically sized and static sized containers). So when you see len(something) you know that the len call is cheap. Making it work with iterators would mean that suddenly all programs might become inefficient due to computations of the length.

    Also note that you can, trivially, implement O(1) __len__ on every container. The cost to pre-compute the length is often negligible, and generally worth paying.
    The only exception would be if you implement immutable containers that have part of their internal representation shared with other instances (to save memory). However, I don't know of any implementation that does this, and most of the time you can achieve better than O(n) time anyway.

    In summary: currently everybody implements __len__ in O(1) and it's easy to continue to do so. So there is an expectation for calls to len to be O(1). Even if it's not part of the standard. Python developers intentionally avoid C/C++'s style legalese in their documentation and trust the users. In this case, if your __len__ isn't O(1), it's expected that you document that.

  • It is known to be not destructive. Any sensible implementation of __len__ doesn't change its argument. So you can be sure that len(x) == len(x), or that n = len(x);len(list(x)) == n.

    Even this property is not defined in the documentation, however it's expected by everyone, and currently, nobody violates it.

Such properties are good, because you can reason and make assumptions about code using them.
They can help you ensure the correctness of a piece of code, or understand its asymptotic complexity. The change you propose would make it much harder to look at some code and understand whether it's correct or what would be it's complexity, because you have to keep in mind the special cases.

In summary, the change you are proposing has one, really small, pro: saving few characters in very particular situations, but it has several, big, disadvantages which would impact a huge portion of existing code.


An other minor reason. If len consumes iterators I'm sure that some people would start to abuse this for its side-effects (replacing the already ugly use of map or list-comprehensions). Suddenly people can write code like:

len(print(something) for ... in ...)

to print text, which is really just ugly. It doesn't read well. Stateful code should be relagated to statements, since they provide a visual cue of side-effects.



Related Topics



Leave a reply



Submit