How to Make a Ruby Enumerator That Does Lazy Iteration Through Two Other Enumerators

How can I make a ruby enumerator that does lazy iteration through two other enumerators?

This seems to work just how I want;

enums.lazy.flat_map{|enum| enum.lazy }

Here's the demonstration. Define these yielding methods with side-effects;

def test_enum
return enum_for __method__ unless block_given?
puts 'hi'
yield 1
puts 'hi again'
yield 2
end

def test_enum2
return enum_for __method__ unless block_given?
puts :a
yield :a
puts :b
yield :b
end

concated_enum = [test_enum, test_enum2].lazy.flat_map{|en| en.lazy }

Then call next on the result, showing that the side effects happen lazily;

[5] pry(main)> concated_enum.next
hi
=> 1
[6] pry(main)> concated_enum.next
hi again
=> 2

Built in way to concatenate two Enumerators

You can define a new enumerator, iterating through your existing enumerators. Something like:

enum = Enumerator.new { |y|
enum1.each { |e| y << e }
enum2.each { |e| y << e }
}

How can I create an enumerator that does certain things after iteration?

You could do something like this.

def foo a, &pr
if pr
a.map(&pr).join
else
o = Object.new
o.instance_variable_set :@a, a
def o.each *y
foo @a.map { |z| yield z, *y } { |e| e }
end
o.to_enum
end
end

Then we have

enum = foo([1,2,3])
enum.each { |x| 2 * x } # "246"

or

enum = foo([1,2,3])
enum.with_index { |x, i| x * i } # "026"

Inspiration was drawn from the Enumerator documentation. Note that all of your expectations about enumerators like you asked for hold, because .to_enum takes care of all that. enum is now a legitimate Enumerator!

enum.class # Enumerator

Ruby Enumerator-based lazy flatten method

  1. This doesn't seem lazy to me, as you are still performing old (non-lazy) flatten beneath.
  2. Enumerator is Enumerable, so I think you don't need to handle it separately.
  3. I would expect lazy_flatten to be method on Enumerable.

Here's how I would implement it:

module Enumerable
def lazy_flatten
Enumerator.new do |yielder|
each do |element|
if element.is_a? Enumerable
element.lazy_flatten.each do |e|
yielder.yield(e)
end
else
yielder.yield(element)
end
end
end
end
end

Enumerator as an infinite generator in Ruby

I think I've found something that you may find interesting.

This article: 'Ruby 2.0 Works Hard So You Can Be Lazy' by Pat Shaughnessy explains the ideas behind Eager and Lazy evaluation, and also explains how that relates to the "framework classes" like Enumerale, Generator or Yielder. It is mostly focused on explaining how to achieve LazyEvaluation, but still, it's quite detailed.



Original Source: 'Ruby 2.0 Works Hard So You Can Be Lazy' by Pat Shaughnessy

Ruby 2.0 implements lazy evaluation using an object called Enumerator::Lazy. What makes this special is that it plays both roles! It is an enumerator, and also contains a series of Enumerable methods. It calls each to obtain data from an enumeration source, and it yields data to the rest of an enumeration.
Since Enumerator::Lazy plays both roles, you can chain them up together to produce a single enumeration.

This is the key to lazy evaluation in Ruby. Each value from the data source is yielded to my block, and then the result is immediately passed along down the enumeration chain. This enumeration is not eager – the Enumerator::Lazy#collect method does not collect the values into an array. Instead, each value is passed one at a time along the chain of Enumerator::Lazy objects, via repeated yields. If I had chained together a series of calls to collect or other Enumerator::Lazy methods, each value would be passed along the chain from one of my blocks to the next, one at a time

Enumerable#first both starts the iteration by calling each on the lazy enumerators, and ends the iteration by raising an exception when it has enough values.

At the end of the day, this is the key idea behind lazy evaluation: the function or method at the end of a calculation chain starts the execution process, and the program’s flow works backwards through the chain of function calls until it obtains just the data inputs it needs. Ruby achieves this using a chain of Enumerator::Lazy objects.

How to use an enumerator

The main distinction between an Enumerator and most other data structures in the Ruby core library (Array, Hash) and standard library (Set, SortedSet) is that an Enumerator can be infinite. You cannot have an Array of all even numbers or a stream of zeroes or all prime numbers, but you can definitely have such an Enumerator:

evens = Enumerator.new do |y|
i = -2
y << i += 2 while true
end

evens.take(10)
# => [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

zeroes = [0].cycle

zeroes.take(10)
# => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

So, what can you do with such an Enumerator? Well, three things, basically.

  1. Enumerator mixes in Enumerable. Therefore, you can use all Enumerable methods such as map, inject, all?, any?, none?, select, reject and so forth. Just be aware that an Enumerator may be infinite whereas map returns an Array, so trying to map an infinite Enumerator may create an infinitely large Array and take an infinite amount of time.

  2. There are wrapping methods which somehow "enrich" an Enumerator and return a new Enumerator. For example, Enumerator#with_index adds a "loop counter" to the block and Enumerator#with_object adds a memo object.

  3. You can use an Enumerator just like you would use it in other languages for external iteration by using the Enumerator#next method which will give you either the next value (and move the Enumerator forward) or raise a StopIteration exception if the Enumerator is finite and you have reached the end.

Eg., an infinite range: (1..1.0/0)

What's the best way to return an Enumerator::Lazy when your class doesn't define #each?

I think you should return a normal Enumerator using to_enum:

class Calendar
# ...
def each_from(first)
return to_enum(:each_from, first) unless block_given?
loop do
yield first if include?(first)
first += step
end
end
end

This is what most rubyists would expect. Even though it's an infinite Enumerable, it is still usable, for example:

Calendar.new.each_from(1.year.from_now).first(10) # => [...first ten dates...]

If they actually need a lazy enumerator, they can call lazy themselves:

Calendar.new.each_from(1.year.from_now)
.lazy
.map{...}
.take_while{...}

If you really want to return a lazy enumerator, you can call lazy from you method:

  # ...
def each_from(first)
return to_enum(:each_from, first).lazy unless block_given?
#...

I would not recommend it though, since it would be unexpected (IMO), could be an overkill and will be less performant.

Finally, there are a couple of misconceptions in your question:

  • All methods of Enumerable assume an each, not just lazy.

  • You can define an each method that requires a parameter if you like and include Enumerable. Most methods of Enumerable won't work, but each_with_index and a couple of others will forward arguments so these would be usable immediately.

  • The Enumerator.new without a block is gone because to_enum is what one should use. Note that the block form remains. There's also a constructor for Lazy, but it's meant to start from an existing Enumerable.

  • You state that to_enum never creates a lazy enumerator, but that's not entirely true. Enumerator::Lazy#to_enum is specialized to return a lazy enumerator. Any user method on Enumerable that calls to_enum will keep a lazy enumerator lazy.

Enumerator::Lazy and Garbage Collection

When you iterate over a plain old array, the garbage collector has no chance to do anything.
You can help the garbage collector by writing nil into the array position after you no longer need the element, so that the object in this position may now be free for collection.

When you correctly use lazy enumerator, you are not iterate over an array of hashes. Instead you enumerate over the hashes, handling one after the other, and each one is read on demand.

So you have the chance to use much less memory (depending on your further processing, and that it does not hold the objects in memory anyway)

the structure may look like this:

enum = Enumerator.new do |yielder|
csv.read(...) do
...
yielder.yield hash
end
end

enum.lazy.map{|hash| do_something(hash); nil}.count

You also need to make sure that you are not generate the array again in the last step of the chain.



Related Topics



Leave a reply



Submit