How to Create a "Clone"-Able Enumerator for External Iteration

How can I create an enumerator that does certain things after iteration?

You could do something like this.

def foo a, &pr
if pr
a.map(&pr).join
else
o = Object.new
o.instance_variable_set :@a, a
def o.each *y
foo @a.map { |z| yield z, *y } { |e| e }
end
o.to_enum
end
end

Then we have

enum = foo([1,2,3])
enum.each { |x| 2 * x } # "246"

or

enum = foo([1,2,3])
enum.with_index { |x, i| x * i } # "026"

Inspiration was drawn from the Enumerator documentation. Note that all of your expectations about enumerators like you asked for hold, because .to_enum takes care of all that. enum is now a legitimate Enumerator!

enum.class # Enumerator

How does Ruby's Enumerator object iterate externally over an internal iterator?

It's not exactly magic, but it is beautiful nonetheless. Instead of making a copy of some sort, a Fiber is used to first execute each on the target enumerable object. After receiving the next object of each, the Fiber yields this object and thereby returns control back to where the Fiber was resumed initially.

It's beautiful because this approach doesn't require a copy or other form of "backup" of the enumerable object, as one could imagine obtaining by for example calling #to_a on the enumerable. The cooperative scheduling with fibers allows to switch contexts exactly when needed without the need to keep some form of lookahead.

It all happens in the C code for Enumerator. A pure Ruby version that would show roughly the same behavior could look like this:

class MyEnumerator
def initialize(enumerable)
@fiber = Fiber.new do
enumerable.each { |item| Fiber.yield item }
end
end

def next
@fiber.resume || raise(StopIteration.new("iteration reached an end"))
end
end

class MyEnumerable
def each
yield 1
yield 2
yield 3
end
end

e = MyEnumerator.new(MyEnumerable.new)
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # => StopIteration is raised

Enumerator without collection

With the current setup you can't pass any collection to it. You can't change the collection of the enumerator once instantiated.

The current code only works because is block is not instantly executed, therefore you see the error when you try to start iterating (or retrieving items).

enumerator = Enumerator.new(&:map)
enumerator.take(1)
# NoMethodError (undefined method `map' for #<Enumerator::Yielder:0x00000000055b6e90>)

This is because Enumerator::new yields a Enumerator::Yielder which doesn't has the method map.

The above could also be written as:

enumerator = Enumerator.new { |yielder| yielder.map }

If you would like to create an enumerator from a collection the easiest way is to call each without block. Other methods like map also create enumerators without block given.

enumerator = [1, 2, 3].each
#=> #<Enumerator: [1, 2, 3]:each>

If you for some reason still want to create the enumerator by hand it could look like this:

enumerator = Enumerator.new { |yielder| [1, 2, 3].each { |number| yielder << number } }

If the intent was to preselect a method of iterating before you receive the collection, you can do so in the following manner:

# assuming both the collection and block are passed by the user
map = :map.to_proc
result = map.call(collection, &block)

# which is equivalent to
result = collection.map(&block)

How to use an enumerator

The main distinction between an Enumerator and most other data structures in the Ruby core library (Array, Hash) and standard library (Set, SortedSet) is that an Enumerator can be infinite. You cannot have an Array of all even numbers or a stream of zeroes or all prime numbers, but you can definitely have such an Enumerator:

evens = Enumerator.new do |y|
i = -2
y << i += 2 while true
end

evens.take(10)
# => [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

zeroes = [0].cycle

zeroes.take(10)
# => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

So, what can you do with such an Enumerator? Well, three things, basically.

  1. Enumerator mixes in Enumerable. Therefore, you can use all Enumerable methods such as map, inject, all?, any?, none?, select, reject and so forth. Just be aware that an Enumerator may be infinite whereas map returns an Array, so trying to map an infinite Enumerator may create an infinitely large Array and take an infinite amount of time.

  2. There are wrapping methods which somehow "enrich" an Enumerator and return a new Enumerator. For example, Enumerator#with_index adds a "loop counter" to the block and Enumerator#with_object adds a memo object.

  3. You can use an Enumerator just like you would use it in other languages for external iteration by using the Enumerator#next method which will give you either the next value (and move the Enumerator forward) or raise a StopIteration exception if the Enumerator is finite and you have reached the end.

Eg., an infinite range: (1..1.0/0)

Is there a built-in way to check if #next or #peek will raise StopIteration?

You can rescue the StopIteration explicitly, but there's also the idea that the loop method internally rescues a StopIteration exception by simply exiting the loop. (Inside loop, raise StopIteration has the same effect as break.)

This code simply exits the loop when you try to peek past the end:

a = %w(a b c d e).to_enum

loop do
print a.peek
a.next
end

The code outputs abcde. (It also transparently raises and rescues StopIteration.)

So, if you want to simply ignore the StopIteration exception when you try to peek past the end, just use loop.

Of course, once you peek past the end, you'll get dumped out of the loop. If you don't want that, you can use while and rescue to customize behavior. For example, if you want to avoid exiting if you peek past the end, and exit when you iterate past the end using next, you could do something like this:

a = %w(a b c d e).to_enum

while true
begin
print a.peek
rescue StopIteration
print "\nTried to peek past the end of the enum.\nWe're gonna overlook that.\n"
end
x = a.next rescue $!
break if x.class == StopIteration
end

p 'All done!'

The last two lines in the loop do the same thing as this, which you could use instead:

begin
a.next
rescue StopIteration
break
end

A point to make is that handling StopIteration is Ruby's intended way of dealing with getting to the end of an iterator. Quoting from Matz's book The Ruby Programming Language:

External iterators are quite simple to use: just call next each time you want another
element. When there are no more elements left, next will raise a StopIteration exception.
This may seem unusual—an exception is raised for an expected termination
condition rather than an unexpected and exceptional event. (StopIteration is a descendant
of StandardError and IndexError; note that it is one of the only exception
classes that does not have the word “error” in its name.) Ruby follows Python in this
external iteration technique. By treating loop termination as an exception, it makes
your looping logic extremely simple; there is no need to check the return value of
next for a special end-of-iteration value, and there is no need to call some kind of
next? predicate before calling next.

Ruby peek with include? acts like next

It's not the .include?, it's how you get your enumerator (a new one each time). Observe:

@file.each_line.peek # => "Extension Date\n"
@file.each_line.peek # => "State\n"
@file.each_line.peek # => "CO\n"
@file.each_line.peek # => "COLORADO\n"
@file.each_line.peek # => "\n"

The problem here is that when each_line is called, it reads a line. And since file position is maintained between invocations, the second time you call it, it reads one more line. And so on.

Get enumerator once and hold on to it.

enum = @file.each_line

enum.peek # => "Extension Date\n"
enum.peek # => "Extension Date\n"
enum.peek # => "Extension Date\n"
enum.peek # => "Extension Date\n"
enum.peek.include?('foo') # => false
enum.peek # => "Extension Date\n"


Related Topics



Leave a reply



Submit