Ruby - Compare Two Enumerators Elegantly

Ruby - Compare two Enumerators elegantly

Just a slight refactoring to your code, assuming that your streams do not include an element :eof.

def equal_streams?(s1, s2)
  loop do
    e1 = s1.next rescue :eof
    e2 = s2.next rescue :eof
    return false unless e1 == e2
    return true if e1 == :eof
  end
end

Using a keyword like loop should be faster than using a method like each.

Best way to interleave two enums in ruby?

Here's a shorter and more general approach:

def combine_enums_with_ratio(ratios)
  return enum_for(__method__, ratios) unless block_given?

  counts = ratios.transform_values { |value| Rational(1, value) }

  until counts.empty?
    begin
      enum, _ = counts.min_by(&:last)
      yield enum.next
      counts[enum] += Rational(1, ratios[enum])
    rescue StopIteration
      counts.delete(enum)
    end
  end
end

Instead of two enums, it takes a hash of enum => ratio pairs.

At first, it creates a counts hash using the ratio's reciprocal, i.e. enum_a => 3, enum_b => 2 becomes:

counts = { enum_a => 1/3r, enum_b => 1/2r }

Then, within a loop, it fetches the hash's minimum value, which is enum_a in the above example. It yields its next value and increment its counts ratio value:

counts[enum_a] += 1/3r

counts #=> {:enum_a=>(2/3), :enum_b=>(1/2)}

On the next iteration, enum_b has the smallest value, so its next value will be yielded and its ratio be incremented:

counts[enum_b] += 1/2r

counts #=> {:enum_a=>(2/3), :enum_b=>(1/1)}

If you keep incrementing enum_a by (1/3) and enum_b by (1/2), the yield ratio of their elements will be 3:2.

Finally, the rescue clause handles enums running out of elements. If this happens, that enum is removed from the counts hash.

Once the counts hash is empty, the loop stops.

Example usage with 3 enums:

enum_a = (1..10).each
enum_b = ('a'..'f').each
enum_c = %i[foo bar baz].each

combine_enums_with_ratio(enum_a => 3, enum_b => 2, enum_c => 1).to_a
#=> [1, "a", 2, 3, "b", :foo, 4, "c", 5, 6, "d", :bar, 7, "e", 8, 9, "f", :baz, 10]
#    <--------------------->  <--------------------->  <--------------------->
#             3:2:1                    3:2:1                     3:2:1

Built in way to concatenate two Enumerators

You can define a new enumerator, iterating through your existing enumerators. Something like:

enum = Enumerator.new { |y|
  enum1.each { |e| y << e }
  enum2.each { |e| y << e }
}

Fibers vs. explicit enumerators

I would use Enumerator, it allows you to use take, take_while, even each if your sequence is finite. While Fiber is designed for light weight concurrency and is pretty limited as enumerator.

prime_enum.take(ARGV[0].to_i).each { |x| puts x }

prime_enum.take_while { |x| x < ARGV[0].to_i }.each { |x| puts x }

Mapping enumerators

The only way I know of doing this is to do the following:

a = [[1, 2, 3], [4, 5, 6]]
a.map { |b| b.map(&:succ) } # => [[2, 3, 4], [5, 6, 7]]

Mainly because of the combination of Array#map/Enumerable#map and Symbol#to_proc, you cannot pass a second variable to the block that #map yields for, and thus pass another variable to the inner #map:

a.map(1) { |b, c| c } # c => 1, but this doesn't work :(

So you have to use the block syntax; Symbol#to_proc actually returns a proc that takes any number of arguments (you can test this by doing :succ.to_proc.arity, which returns -1). The first argument is used as the receiver, and the next few arguments are used as arguments to the method - this is demonstrated in [1, 2, 3].inject(&:+). However,

:map.to_proc.call([[1, 2, 3], [4, 5, 6]], &:size) #=> [3, 3]

How? :map.to_proc creates this:

:map.to_proc # => proc { |receiver, *args, &block| receiver.send(:map, *args, &block) }

This is then called with the array of arrays as an argument, with this block:

:size.to_proc # => proc { |receiver, *args, &block| receiver.send(:size, *args, &block) }

This results in .map { |receiver| receiver.size } being effectively called.

This all leads to this - since #map doesn't take extra arguments, and passes them to the block as parameters, you have to use a block.

How can I make a ruby enumerator that does lazy iteration through two other enumerators?

This seems to work just how I want;

enums.lazy.flat_map{|enum| enum.lazy }

Here's the demonstration. Define these yielding methods with side-effects;

def test_enum
  return enum_for __method__ unless block_given?
  puts 'hi'
  yield 1
  puts 'hi again'
  yield 2
end  

def test_enum2
  return enum_for __method__ unless block_given?
  puts :a
  yield :a
  puts :b
  yield :b
end  

concated_enum = [test_enum, test_enum2].lazy.flat_map{|en| en.lazy }

Then call next on the result, showing that the side effects happen lazily;

[5] pry(main)> concated_enum.next
hi
=> 1
[6] pry(main)> concated_enum.next
hi again
=> 2

Ruby - Optimize the comparison of two arrays with duplicates

A similar question was posted a few weeks ago and I got the accepted answer with something like:

def is_subset?(a,b)
  !a.find{|x| a.count(x) > b.count(x)}
end

Benchmark Update

require 'benchmark'

def random_char
  ('a'..'z').to_a.sample
end

A = 8.times.map{random_char}
B = 8.times.map{random_char}

def ary_subset?(a,b) # true iff a is subset of b
  a_counts = a.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  b_counts = b.reduce(Hash.new(0)) { |m,v| m[v] += 1; m }
  a_counts.all? { |a_key,a_ct| a_ct <= b_counts[a_key] }
end

Benchmark.bm do |x|
  x.report('me')     {100000.times{is_subset?(A,B)}}
  x.report('dbenhur'){100000.times{ary_subset?(A,B)}}
end

       user     system      total        real
me  0.375000   0.000000   0.375000 (  0.384022)
dbenhur  2.558000   0.000000   2.558000 (  2.550146)

ruby enumerators: immediately skip multiple iterations (or start iterating from n)

I first construct a helper method, change_base, with three arguments:

off, the base-10 offset into the sequence of repeated permutations of the given array arr,
m, a number system base; and
p, the permutation size.

The method performs three steps to construct an array off_m:

converts off to base m (radix m);
separates the digits of the base m value into an array; and
if necessary, pads the array with leading 0s to make it of size p.

By setting m = arr.size, each digit of off_m is an offset into arr, so off_m maps the base-10 offset to a unique permutation of size p.

def change_base(m, p, off)
  arr = off.to_s(m).chars.map { |c| c.to_i(m) }
  arr.unshift(*[0]*(p-arr.size)) 
end

Some examples:

change_base(16, 2, 32)
  #=> [2, 0]
change_base(16, 3, 255)
  #=> [0, 15, 15]
change_base(36, 4, 859243)
  #=> [18, 14, 35, 31]
18*36**3 + 14*36**2 + 35*36**1 + 31  
  #=> 859243

This implementation of change_base requires that m <= 36. I assume that will be sufficient, but algorithms are available to convert base-10 numbers to numbers with arbitrarily-large bases.

We now construct a method which accepts the given array, arr, the size of each permutation, p and a given base-10 offset into the sequence of permutations. The method returns a permutation, namely, an array of size p whose elements are elements of arr.

def offset_to_perm(arr, p, off)
  arr.values_at(*change_base(arr.size, p, off))
end

We can now try this with an example.

arr = (0..3).to_a
p = 2

(arr.size**p).times do |off|
  print "perm for off = "
  print " " if off < 10
  print "#{off}: "
  p offset_to_perm(arr, p, off)
end

perm for off =  0: [0, 0]
perm for off =  1: [0, 1]
perm for off =  2: [0, 2]
perm for off =  3: [0, 3]
perm for off =  4: [0, 1]
perm for off =  5: [1, 1]
perm for off =  6: [2, 1]
perm for off =  7: [3, 1]
perm for off =  8: [0, 2]
perm for off =  9: [1, 2]
perm for off = 10: [2, 2]
perm for off = 11: [3, 2]
perm for off = 12: [0, 3]
perm for off = 13: [1, 3]
perm for off = 14: [2, 3]
perm for off = 15: [3, 3]

If we wish to begin at, say, offset 5, we can write:

i = 5
p offset_to_perm(arr, p, i)
[1, 1]
i = i.next #=> 6
p offset_to_perm(arr, p, i)
[2, 1]
...

Does Ruby's Enumerable#zip create arrays internally?

I'll be using 1.9.2-p0 as that's what I have on hand.

The rb_check_array_type function looks like this:

VALUE
rb_check_array_type(VALUE ary)
{
    return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary");  
}

And rb_check_convert_type looks like this:

VALUE
rb_check_convert_type(VALUE val, int type, const char *tname, const char *method)
{
    VALUE v;

    /* always convert T_DATA */
    if (TYPE(val) == type && type != T_DATA) return val;
    v = convert_type(val, tname, method, FALSE);
    if (NIL_P(v)) return Qnil;
    if (TYPE(v) != type) {
        const char *cname = rb_obj_classname(val);
        rb_raise(rb_eTypeError, "can't convert %s to %s (%s#%s gives %s)",
                 cname, tname, cname, method, rb_obj_classname(v));
    }
    return v;
}

Note the convert_type call. This looks a lot like C version of Array.try_convert and try_convert just happens to look like this:

/*   
 *  call-seq:
 *     Array.try_convert(obj) -> array or nil
 *
 *  Try to convert <i>obj</i> into an array, using +to_ary+ method. 
 *  Returns converted array or +nil+ if <i>obj</i> cannot be converted
 *  for any reason. This method can be used to check if an argument is an
 *  array.
 *   
 *     Array.try_convert([1])   #=> [1]
 *     Array.try_convert("1")   #=> nil
 *
 *     if tmp = Array.try_convert(arg)
 *       # the argument is an array
 *     elsif tmp = String.try_convert(arg)
 *       # the argument is a string
 *     end
 *
 */
static VALUE
rb_ary_s_try_convert(VALUE dummy, VALUE ary)
{
    return rb_check_array_type(ary);
}

So, yes, the first loop is looking for anything in argv that is not an array and setting the allary flag if it finds such a thing.

In enum.c, we see this:

id_each = rb_intern("each");

So id_each is an internal reference for the Ruby each iterator method. And in vm_eval.c, we have this:

/*!  
 * Calls a method 
 * \param recv   receiver of the method
 * \param mid    an ID that represents the name of the method
 * \param n      the number of arguments
 * \param ...    arbitrary number of method arguments  
 *
 * \pre each of arguments after \a n must be a VALUE.
 */
VALUE
rb_funcall(VALUE recv, ID mid, int n, ...)

So this:

argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));

Is calling to_enum (with, essentially, the default argument) on whatever is in argv[i].

So, the end result of the first for and if blocks is that argv is either full of arrays or full of enumerators rather than possibly being a mix of the two. But note how the logic works: if something is found that isn't an array, then everything becomes an enumerator. The first part of the enum_zip function will wrap arrays in enumerators (which is essentially free or at least cheap enough not to worry about) but won't expand enumerators into arrays (which could be quite expensive). Earlier versions might have gone the other way (prefer arrays over enumerators), I'll leave that as an exercise for the reader or historians.

The next part:

if (!rb_block_given_p()) {
    result = rb_ary_new();
}

Creates a new empty array and leaves it in result if zip is being called without a block. And here we should note what zip returns:

enum.zip(arg, ...) → an_array_of_array
enum.zip(arg, ...) {|arr| block } → nil

If there is a block, then there is nothing to return and result can stay as Qnil; if there isn't a block, then we need an array in result so that an array can be returned.

From parse.c, we see that NODE_DOT2 is a double-dot range but it looks like they're just using the new node as a simple three element struct; rb_new_node just allocates an object, sets some bits, and assigns three values in a struct:

NODE*
rb_node_newnode(enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
    NODE *n = (NODE*)rb_newobj();

    n->flags |= T_NODE;
    nd_set_type(n, type);

    n->u1.value = a0;
    n->u2.value = a1;
    n->u3.value = a2;

    return n;
}

nd_set_type is just a bit fiddling macro. Now we have memo as just a three element struct. This use of NODE_DOT2 appears to be a convenient kludge.

The rb_block_call function appears to be the core internal iterator. And we see our friend id_each again so we'll be doing an each iteration. Then we see a choice between zip_i and zip_ary; this is where the inner arrays are created and pushed onto result. The only difference between zip_i and zip_ary appears to be the StopIteration exception handling in zip_i.

At this point we've done the zipping and we either have the array of arrays in result (if there was no block) or we have Qnil in result (if there was a block).

Executive Summary: The first loop explicitly avoids expanding enumerators into arrays. The zip_i and zip_ary calls will only work with non-temporary arrays if they have to build an array of arrays as a return value. So, if you call zip with at least one non-array enumerator and use the block form, then it is enumerators all the way down and the "problem with zip is that it creates arrays internally" does not happen. Reviewing 1.8 or other Ruby implementations is left as an exercise for the reader.

Loop through array and compare consecutive elements in Ruby

Here is a "direct" translation of the python code to ruby - note that the syntax is almost identical! (The only subtle differences are elif vs elsif and the print/puts syntax.)

amounts = [10, 9, 10, 3, 100, 100, 90, 80, 10, 30, 10]
for i in 0..(amounts.count)-2 do
  if amounts[i+1] > amounts[i]
    puts "up #{amounts[i+1]-amounts[i]}"
  elsif amounts[i+1] < amounts[i]
    puts "down #{amounts[i]-amounts[i+1]}"
  else
    puts "stay"
  end
end

There was nothing wrong with using Array#at, but it's just an alias for Array#[]. So if your question is about "translating" the code, then I don't see the point in making changes-for-the-sake-of-changes.

Note that I also fixed the off-by-one error that's present in both versions of your code -- you can only run this up to len(amounts)-2, not len(amounts)-1, or you'll go out of bounds!

However, this is not really written in "the ruby way". for loops are rarely used in ruby, because the language has other highly expressive iterators. Here is how I would have written it, using Enumerable#each_cons:

amounts = [10, 9, 10, 3, 100, 100, 90, 80, 10, 30, 10]
amounts.each_cons(2) do |first, second|
  if first < second 
    puts "up #{second - first}"
  elsif first > second
    puts "down #{first - second}"
  else
    puts "stay"
  end
end

# Output:
down 1
up 1
down 7
up 97
stay
down 10
down 10
down 70
up 20
down 20

One advantage to this syntax is that off-by-one errors (like you had in the python!) aren't really possible, because you're just saying "loop over all elements" without having to worry about tracking the indexes yourself.

This is actually a really good example for appreciating the elegance of ruby, if code is written in "the ruby way" rather than merely a direct translation.

Ruby - Compare Two Enumerators Elegantly