What's the Shortest Way to Count the Number of Items in a Generator/Iterator

What's the shortest way to count the number of items in a generator/iterator?

Calls to itertools.imap() in Python 2 or map() in Python 3 can be replaced by equivalent generator expressions:

sum(1 for dummy in it)

This also uses a lazy generator, so it avoids materializing a full list of all iterator elements in memory.

Length of generator output

There isn't one because you can't do it in the general case - what if you have a lazy infinite generator? For example:

def fib():
a, b = 0, 1
while True:
a, b = b, a + b
yield a

This never terminates but will generate the Fibonacci numbers. You can get as many Fibonacci numbers as you want by calling next().

If you really need to know the number of items there are, then you can't iterate through them linearly one time anyway, so just use a different data structure such as a regular list.

How to len(generator())

Generators have no length, they aren't collections after all.

Generators are functions with a internal state (and fancy syntax). You can repeatedly call them to get a sequence of values, so you can use them in loop. But they don't contain any elements, so asking for the length of a generator is like asking for the length of a function.

if functions in Python are objects, couldn't I assign the length to a
variable of this object that would be accessible to the new generator?

Functions are objects, but you cannot assign new attributes to them. The reason is probably to keep such a basic object as efficient as possible.

You can however simply return (generator, length) pairs from your functions or wrap the generator in a simple object like this:

class GeneratorLen(object):
def __init__(self, gen, length):
self.gen = gen
self.length = length

def __len__(self):
return self.length

def __iter__(self):
return self.gen

g = some_generator()
h = GeneratorLen(g, 1)
print len(h), list(h)

How to count the items in a generator consumed by other code

Here is another way using itertools.count() example:

import itertools

def generator():
for i in range(10):
yield i

def process(l):
for i in l:
if i == 5:
break

def counter_value(counter):
import re
return int(re.search('\d+', repr(counter)).group(0))

counter = itertools.count()
process(i for i, v in itertools.izip(generator(), counter))

print "Element consumed by process is : %d " % counter_value(counter)
# output: Element consumed by process is : 6

Hope this was helpful.

Getting number of elements in an iterator in Python

No. It's not possible.

Example:

import random

def gen(n):
for i in xrange(n):
if random.randint(0, 1) == 0:
yield i

iterator = gen(10)

Length of iterator is unknown until you iterate through it.

Need a fast way to count and sum an iterable in a single pass

Thanks for all the great answers, but I decided to use my original count_and_sum function, called as follows:

>>> cc, cs = count_and_sum(c.width for c in cols if not c.hide) 

As explained in the edits to my original question this turned out to be the fastest and most readable solution.

Is there any built-in way to get the length of an iterable in python?

Short of iterating through the iterable and counting the number of iterations, no. That's what makes it an iterable and not a list. This isn't really even a python-specific problem. Look at the classic linked-list data structure. Finding the length is an O(n) operation that involves iterating the whole list to find the number of elements.

As mcrute mentioned above, you can probably reduce your function to:

def count_iterable(i):
return sum(1 for e in i)

Of course, if you're defining your own iterable object you can always implement __len__ yourself and keep an element count somewhere.

Python - Count Elements in Iterator Without Consuming

I have not been able to come up with an exact solution (because iterators may be immutable types), but here are my best attempts. I believe the second should be faster, according to the documentation (final paragraph of itertools.tee).

Option 1

def it_count(it):
tmp_it, new_it = itertools.tee(it)
return sum(1 for _ in tmp_it), new_it

Option 2

def it_count2(it):
lst = list(it)
return len(lst), lst

It functions well, but has the slight annoyance of returning the pair rather than simply the count.

ita = iter([1, 2, 3])
count, ita = it_count(ita)
print(count)

Output: 3

count, ita = it_count2(ita)
print(count)

Output: 3

count, ita = it_count(ita)
print(count)

Output: 3

print(list(ita))

Output: [1, 2, 3]


Related Topics



Leave a reply



Submit