Enumerator as an Infinite Generator in Ruby

Enumerator as an infinite generator in Ruby

I think I've found something that you may find interesting.

This article: 'Ruby 2.0 Works Hard So You Can Be Lazy' by Pat Shaughnessy explains the ideas behind Eager and Lazy evaluation, and also explains how that relates to the "framework classes" like Enumerale, Generator or Yielder. It is mostly focused on explaining how to achieve LazyEvaluation, but still, it's quite detailed.



Original Source: 'Ruby 2.0 Works Hard So You Can Be Lazy' by Pat Shaughnessy

Ruby 2.0 implements lazy evaluation using an object called Enumerator::Lazy. What makes this special is that it plays both roles! It is an enumerator, and also contains a series of Enumerable methods. It calls each to obtain data from an enumeration source, and it yields data to the rest of an enumeration.
Since Enumerator::Lazy plays both roles, you can chain them up together to produce a single enumeration.

This is the key to lazy evaluation in Ruby. Each value from the data source is yielded to my block, and then the result is immediately passed along down the enumeration chain. This enumeration is not eager – the Enumerator::Lazy#collect method does not collect the values into an array. Instead, each value is passed one at a time along the chain of Enumerator::Lazy objects, via repeated yields. If I had chained together a series of calls to collect or other Enumerator::Lazy methods, each value would be passed along the chain from one of my blocks to the next, one at a time

Enumerable#first both starts the iteration by calling each on the lazy enumerators, and ends the iteration by raising an exception when it has enough values.

At the end of the day, this is the key idea behind lazy evaluation: the function or method at the end of a calculation chain starts the execution process, and the program’s flow works backwards through the chain of function calls until it obtains just the data inputs it needs. Ruby achieves this using a chain of Enumerator::Lazy objects.

Infinite generator speed in ruby - Method vs Enumerator vs Lazy

The latter two forms both have block bindings, which does have some overhead; when you create an Enumerator with a block, Ruby converts the block to a Proc and assigns it to an Enumerator::Generator, which then iterates by calling the proc. This has overhead that calling a method directly doesn't.

If we eliminate the block forms, the performance penalty is also eliminated:

def method
return to_enum __method__ unless block_given?
n = 0
loop do
n += 1
yield n ** 2
end
end

def method_sans_enum
n = 0
loop do
n += 1
yield n ** 2
end
end

method_enum = Enumerator.new(self, :method_sans_enum)

enum = Enumerator.new do |yielder|
n = 0
loop do
n += 1
yielder.yield n ** 2
end
end

lazy = (1..Float::INFINITY).lazy.map do |n|
n ** 2
end

p method.take 50
p enum.take 50
p method_enum.take 50
p lazy.first 50

require 'benchmark/ips'
Benchmark.ips do |bm|
bm.report('Method'){ method.take 50 }
bm.report('Enumerator'){ enum.take 50 }
bm.report('Enumerator 2'){ method_enum.take 50 }
bm.report('Lazy'){ lazy.first 50 }
bm.compare!
end

And results:

      Method    10.874k i/100ms
Enumerator 6.152k i/100ms
Enumerator 2 11.733k i/100ms
Lazy 3.885k i/100ms

Comparison:
Enumerator 2: 132050.2 i/s
Method: 124784.1 i/s - 1.06x slower
Enumerator: 65961.9 i/s - 2.00x slower
Lazy: 40063.6 i/s - 3.30x slower

Invoking procs involves overhead that invoking methods doesn't; for example:

class Foo
def meth; end
end

instance = Foo.new
pr = instance.method(:meth).to_proc

require 'benchmark/ips'
Benchmark.ips do |bm|
bm.report('Method'){ instance.meth }
bm.report('Proc'){ pr.call }
bm.compare!
end

Results:

Calculating -------------------------------------
Method 121.016k i/100ms
Proc 104.612k i/100ms
-------------------------------------------------
Method 6.823M (± 0.1%) i/s - 34.127M
Proc 3.443M (± 6.4%) i/s - 17.156M

Comparison:
Method: 6822666.0 i/s
Proc: 3442578.2 i/s - 1.98x slower

Calling a method that has been converted to a proc is 2x slower than calling the method directly - just nearly the exact performance deviation that you observed.

Enumerator behavior with generator block

Ok, I think I finally got it. What happens is the following:

Internally, e.take(n) on enumerator is interpreted by Ruby as something like that:

result = []
e.each do |item|
result << item
break if result.size == 2
end
result

Returning to the way we initialized the enumerator, the whole block, given with the constructor is the responsibility of instance of Enumerator::Generator class, that was created along with the enumerator itself. The y passed to this block is the instance of Enumerator::Yielder class.

When each is called on enumerator, the generator code is executed first, and when it comes to the y << total part, y passes (yields, as if from method definition, but not quite) this total value to the block, given with the each call. When this block finishes execution and returns, control goes back to the generator code, which performs another loop, and pushes new value to the yielder, which yields it to the each block again etc etc. And when the condition in the each block becomes true, everything stops, thus leaving part of the original array unchanged. So no need to use Fibers in this, just some cool yielding of control back and forth between two blocks of code.

Though it's worth noting that "each block" I was talking about is not an actual ruby block, it's C function take_i, being treated as if it was a block, passed to the each method.

You can read more about this, with diagrams and some extra information here: http://patshaughnessy.net/2013/4/3/ruby-2-0-works-hard-so-you-can-be-lazy

Infinite Enumerable

Brad,

Here are two ways you could produce the hash. I will use the following as an example:

class ContestStanding
def checkit
puts "hi"
end
end

my_keys = [1,2,3]

Use Enumerable#each_with_object

h = my_keys.each_with_object({}) { |k,h| h[k] = ContestStanding.new }
#=> {1=>#<ContestStanding:0x000001010efdd8>,
# 2=>#<ContestStanding:0x000001010efdb0>,
# 3=>#<ContestStanding:0x000001010efd88>}
h[1].checkit #=> "hi"

each_with_object creates and empty array which is referenced by the block parameter h. The first value passed into the block (and assigned to the block parameter k) is my_keys.first => 1, so have

h[1] = ContestStanding.new

The other elements of the hash are created similarly.

Use Array.zip

Hash[my_keys.zip(Array.new(my_keys.size) {ContestStanding.new})]
#=> {1=>#<ContestStanding:0x0000010280f720>,
# 2=>#<ContestStanding:0x0000010280f6f8>,
# 3=>#<ContestStanding:0x0000010280f6d0>}

or, for Ruby v2.0+

my_keys.zip(Array.new(my_keys.size) {ContestStanding.new}).to_h
#=> {1=>#<ContestStanding:0x0000010184bd48>,
# 2=>#<ContestStanding:0x0000010184bd20>,
# 3=>#<ContestStanding:0x0000010184bcf8>}

Here the following steps are performed:

a = Array.new(my_keys.size) {ContestStanding.new}
#=> [#<ContestStanding:0x0000010185b248>,
# #<ContestStanding:0x0000010185b220>,
# #<ContestStanding:0x0000010185b1f8>]
b = my_keys.zip(a)
#=> [[1, #<ContestStanding:0x0000010185b248>],
# [2, #<ContestStanding:0x0000010185b220>],
# [3, #<ContestStanding:0x0000010185b1f8>]]
b.to_h
#=> {1=>#<ContestStanding:0x0000010185b248>,
# 2=>#<ContestStanding:0x0000010185b220>,
# 3=>#<ContestStanding:0x0000010185b1f8>}

Your solution

I found your solution interesting. This is one one way of explaining how it works:

enum = Enumerator.new { |y| loop { y << ContestStanding.new } }
#=> #<Enumerator: #<Enumerator::Generator:0x000001011a9530>:each>
a1 = my_keys.size.times.with_object([]) { |k,a| a << enum.next }
#=> [#<ContestStanding:0x000001018820a0>,
# #<ContestStanding:0x00000101882028>,
# #<ContestStanding:0x00000101881fb0>
a2 = my_keys.zip(a1)
#=> [[1, #<ContestStanding:0x000001018820a0>],
# [2, #<ContestStanding:0x00000101882028>],
# [3, #<ContestStanding:0x00000101881fb0>]]
Hash[a2]
#=> {1=>#<ContestStanding:0x000001018820a0>,
# 2=>#<ContestStanding:0x00000101882028>,
# 3=>#<ContestStanding:0x00000101881fb0>}

What's the common fast way of expressing the infinite enumerator `(1..Inf)` in Ruby?

Use #to_enum or #lazy to convert your Range to an Enumerable. For example:

(1..Float::INFINITY).to_enum
(1..Float::INFINITY).lazy

infinite enumerator rewind

This should work:

n = Enumerator.new do |y|
number = 1
loop do
y.yield number
number += 1
end
end

n.next #=> 1
n.next #=> 2
n.next #=> 3
n.rewind
n.next #=> 1

Enumerator new and yield

Consider the following code:

fib = Enumerator.new do |y| 
puts "Enter enumerator"
a = b = 1
loop do
puts "Inside loop"
y << a
puts "y: #{y.inspect}, a: #{a}, b: #{b}"
a, b = b, a + b
end
end
puts fib.take(5)

It prints:

# Enter enumerator
# Inside loop
# y: #<Enumerator::Yielder:0x000000059a27e8>, a: 1, b: 1
# Inside loop
# y: #<Enumerator::Yielder:0x000000059a27e8>, a: 1, b: 2
# Inside loop
# y: #<Enumerator::Yielder:0x000000059a27e8>, a: 2, b: 3
# Inside loop
# y: #<Enumerator::Yielder:0x000000059a27e8>, a: 3, b: 5
# Inside loop

# 1
# 1
# 2
# 3
# 5

Apparently, this output actually gives hints on all the question you’ve stated. Note, that we entered a yielder only once. Let’s dig into:

Why loop is infinite?

Because Fibonacci’s number sequence is infinite. This enumerator is intended to be used with Enumerable#take (see an example above.)

What is Enumerator::Yielder?

It is an abstraction. It’s method yield actually calls back the block of callee, passing a parameter as block parameters.

What does << method does?

Yields once. In other words, it calls back the caller code, passing it’s parameter to the caller’s block. In this particular example, it will call back each block, passing a from Yielder instance as block parameter (e as I named it there.)

Why does infinite loop happen when y << a is removed?

Because there are no yields happened. In my example, the callee will stop after yielding five (5 as parameter of take) times.

How can I make a ruby enumerator that does lazy iteration through two other enumerators?

This seems to work just how I want;

enums.lazy.flat_map{|enum| enum.lazy }

Here's the demonstration. Define these yielding methods with side-effects;

def test_enum
return enum_for __method__ unless block_given?
puts 'hi'
yield 1
puts 'hi again'
yield 2
end

def test_enum2
return enum_for __method__ unless block_given?
puts :a
yield :a
puts :b
yield :b
end

concated_enum = [test_enum, test_enum2].lazy.flat_map{|en| en.lazy }

Then call next on the result, showing that the side effects happen lazily;

[5] pry(main)> concated_enum.next
hi
=> 1
[6] pry(main)> concated_enum.next
hi again
=> 2


Related Topics



Leave a reply



Submit