Are There Something Like Python Generators in Ruby

Are there something like Python generators in Ruby?

Ruby's yield keyword is something very different from the Python keyword with the same name, so don't be confused by it. Ruby's yield keyword is syntactic sugar for calling a block associated with a method.

The closest equivalent is Ruby's Enumerator class. For example, the equivalent of the Python:

def eternal_sequence():
i = 0
while True:
yield i
i += 1

is this:

def eternal_sequence
Enumerator.new do |enum|
i = 0
while true
enum.yield i # <- Notice that this is the yield method of the enumerator, not the yield keyword
i +=1
end
end
end

You can also create Enumerators for existing enumeration methods with enum_for. For example, ('a'..'z').enum_for(:each_with_index) gives you an enumerator of the lowercase letters along with their place in the alphabet. You get this for free with the standard Enumerable methods like each_with_index in 1.9, so you can just write ('a'..'z').each_with_index to get the enumerator.

Python yield vs Ruby yield

In ruby, yield is a shortcut that is used to call an anonymous function. Ruby has a special syntax for passing an anonymous function to a method; the syntax is known as a block. Because the function has no name, you use the name yield to call the function:

def do_stuff(val)
puts "Started executing do_stuff"
yield(val+3)
yield(val+4)
puts "Finshed executing do_stuff"
end

do_stuff(10) {|x| puts x+3} #<= This is a block, which is an anonymous function
#that is passed as an additional argument to the
#method do_stuff

--output:--
Started executing do_stuff
16
17
Finshed executing do_stuff

In python, when you see yield inside a function definition, that means that the function is a generator. A generator is a special type of function that can be stopped mid execution and restarted. Here's an example:

def do_stuff(val):
print("Started execution of do_stuff()")

yield val + 3
print("Line after 'yield val + 3'")
yield val + 4
print("Line after 'yield val + 4'")

print("Finished executing do_stuff()")


my_gen = do_stuff(10)

val = next(my_gen)
print("--received {} from generator".format(val))

output:

Started execution of do_stuff()
--received 13 from generator

More code:

val = next(my_gen)    
print("--received {} from generator".format(val))

output:

Line after 'yield val + 3'
--received 14 from generator

From the output, you can see that yield causes a result to be returned; then execution is immediately halted. When you call next() again on the generator, execution continues until the next yield statement is encountered, which returns a value, then execution halts again.

Python yield (migrating from Ruby): How can I write a function without arguments and only with yield to do prints?

yield in Ruby and yield in Python are two very different things.

In Ruby yield runs a block passed as a parameter to the function.

Ruby:

def three
yield
yield
yield
end

three { puts 'hello '} # runs block (prints "hello") three times

In Python yield throws a value from a generator (which is a function that uses yield) and stops execution of the function. So it's something completely different, more likely you want to pass a function as a parameter to the function in Python.

Python:

def three(func):
func()
func()
func()

three(lambda: print('hello')) # runs function (prints "hello") three times

Python Generators

The code below (code you've provided) is a generator which returns None three times:

def three():
yield
yield
yield

g = three() #=> <generator object three at 0x7fa3e31cb0a0>
next(g) #=> None
next(g) #=> None
next(g) #=> None
next(g) #=> StopIteration

The only way that I can imagine how it could be used for printing "Hello" three times -- using it as an iterator:

for _ in three():
print('Hello')

Ruby Analogy

You can do a similar thing in Ruby using Enumerator.new:

def three
Enumerator.new do |e|
e.yield # or e << nil
e.yield # or e << nil
e.yield # or e << nil
end
end

g = three
g.next #=> nil
g.next #=> nil
g.next #=> nil
g.next #=> StopIteration

three.each do
puts 'Hello'
end

Does Ruby have something like Python's list comprehensions?

The common way in Ruby is to properly combine Enumerable and Array methods to achieve the same:

digits.product(chars).select{ |d, ch| d >= 2 && ch == 'a' }.map(&:join)

This is only 4 or so characters longer than the list comprehension and just as expressive (IMHO of course, but since list comprehensions are just a special application of the list monad, one could argue that it's probably possible to adequately rebuild that using Ruby's collection methods), while not needing any special syntax.

Ruby equivalent of Python's dict comprehension

Let's take a couple of steps back and ignore the specifics of Ruby and Python for now.

Mathematical set-builder notation

The concept of comprehension originally comes from mathematical set-builder notation, e.g. something like this: E = { n ∈ ℕ | 2∣n } which defines E to be the set of all even natural numbers, as does E = { 2n | n ∈ ℕ }.

List comprehensions in Programming Languages

This set-builder notation inspired similar constructs in many programming languages all the way back to 1969, although it wasn't until the 1970s that Phil Wadler coined the term comprehensions for these. List comprehensions ended up being implemented in Miranda in the early 1980s, which was a hugely influential programming language.

However, it is important to understand that these comprehensions do not add any new semantic features to the world of programming languages. In general, there is no program you can write with a comprehension that you cannot also write without. Comprehensions provide a very convenient syntax for expressing these kinds of transformations, but they don't do anything that couldn't also be achieved with the standard recursion patterns like fold, map, scan, unfold, and friends.

So, let's first look at how the various features of Python's comprehensions compare to the standard recursion patterns, and then see how those recursion patterns are available in Ruby.

Python

[Note: I will use Python list comprehension syntax here, but it doesn't really matter since list, set, dict comprehensions and generator expressions all work the same. I will also use the convention from functional programming to use single-letter variables for collection elements and the plural for collections, i.e. x for an element and xs for "a collection of x-es".]

Transforming each element the same way

[f(x) for x in xs]

This transforms each element of the original collection using a transformation function into a new element of a new collection. This new collection has the same number of elements as the original collection and there is a 1:1 correspondence between the elements of the original collection and the elements of the new collection.

One could say that each element of the original collection is mapped to an element of the new collection. Hence, this is typically called map in many programming languages, and in fact, it is called that in Python as well:

map(f, xs)

The same, but nested

Python allows you to have multiple for / ins in a single comprehension. This is more or less equivalent to having nested mappings which then get flattened into a single collection:

[f(x, y) for x in xs for y in ys]
# or
[f(y) for ys in xs for y in ys]

This combination of mapping and then flattening the collection is commonly known as flatMap (when applied to collections) or bind (when applied to Monads).

Filtering

The last operation that Python comprehensions support is filtering:

[x for x in xs if p(x)]

This will filter the collection xs into a collection which contains a subset of the original elements which satisfy the predicate p. This operation is commonly known as filter.

Combine as you like

Obviously, you can combine all of these, i.e. you can have a comprehension with multiple nested generators that filter out some elements and then transform them.

Ruby

Ruby also provides all of the recursion patterns (or collection operations) mentioned above, and many more. In Ruby, an object that can be iterated over, is called an enumerable, and the Enumerable mixin in the core library provides a lot of useful and powerful collection operations.

Ruby was originally heavily inspired by Smalltalk, and some of the older names of Ruby's original collection operations still go back to this Smalltalk heritage. In the Smalltalk collections framework, there is an in-joke about all the collections methods rhyming with each other, thus, the fundamental collections method in Smalltalk are called [listed here with their more standard equivalents from functional programming]:

  • collect:, which "collects" all elements returned from a block into a new collection, i.e. this is the equivalent to map.
  • select:, which "selects" all elements that satisfy a block, i.e. this is the equivalent to filter.
  • reject:, which "rejects" all elements that satisfy a block, i.e. this is the opposite of select: and thus equivalent to what is sometimes called filterNot.
  • detect:, which "detects" whether an element which satisfies a block is inside the collection, i.e. this is the equivalent to contains. Except, it actually returns the element as well, so it is more like findFirst.
  • inject:into: … where the nice naming schema breaks down somewhat …: it does "inject" a starting value "into" a block but that's a somewhat strained connection to what it actually does. This is the equivalent to fold.

So, Ruby has all of those, and more, and it uses some of the original naming, but thankfully, it also provides aliases.

Map

In Ruby, map is originally named Enumerable#collect but is also available as Enumerable#map, which is the name preferred by most Rubyists.

As mentioned above, this is also available in Python as the map built-in function.

FlatMap

In Ruby, flatMap is originally named Enumerable#collect_concat but is also available as Enumerable#flat_map, which is the name preferred by most Rubyists.

Filter

In Ruby, filter is originally named Enumerable#select, which is the name preferred by most Rubyists, but is also available as Enumerable#find_all.

FilterNot

In Ruby, filterNot is named Enumerable#reject.

FindFirst

In Ruby, findFirst is originally named Enumerable#detect, but is also available as Enumerable#find.

Fold

In Ruby, fold is originally named Enumerable#inject, but is also available as Enumerable#reduce.

It also exists in Python as functools.reduce.

Unfold

In Ruby, unfold is named Enumerator::produce.

Scan

Scan is unfortunately not available in Ruby. It is available in Python as itertools.accumulate.

A deep dive into recursion patterns

Armed with our nomenclature from above, we now know that what you wrote is called a fold:

squares = original.inject ({}) do |squared, (name, value)| 
squared[name] = value ** 2
squared
end

What you wrote here works. And that sentence I just wrote is actually surprisingly deep! Because fold has a very powerful property: everything which can be expressed as iterating over a collection can be expressed as a fold. In other words, everything that can be expressed as recursing over a collection (in a functional language), everything that can be expressed as looping / iterating over a collection (in an imperative language), everything that can be expressed using any of the afore-mentioned functions (map, filter, find), everything that can be expressed using Python's comprehensions, everything that can be expressed using some of the additional functions we haven't discussed yet (e.g. groupBy) can by expressed using fold.

If you have fold, you don't need anything else! If you were to remove every method from Enumerable except Enumerable#inject, you could still write everything you could write before; you could actually re-implement all the methods you just removed only by using Enumerable#inject. In fact, I did that once for fun as an exercise. You could also implement the missing scan operation mentioned above.

It is not necessarily obvious that fold really is general, but think of it this way: a collection can be either empty or not. fold has two arguments, one which tells it what to do when the collection is empty, and one which tells it what to do when the collection is not empty. Those are the only two cases, so every possible case is handled. Therefore, fold can do everything!

Or a different viewpoint: a collection is a stream of instructions, either the EMPTY instruction or the ELEMENT(value) instruction. fold is a skeleton interpreter for that instruction set, and you as a programmer can supply the implementation for the interpretation of both those instructions, namely the two arguments to fold are the interpretation of those instructions. [I was introduced to this eye-opening interpretation of fold as an interpreter and a collection as an instruction stream is due to Rúnar Bjarnason, co-author of Functional Programming in Scala and co-designer of the Unison Programming Language. Unfortunately, I cannot find the original talk anymore, but The Interpreter Pattern Revisited presents a much more general idea that should also bring it across.]

Note that the way you are using fold here is somewhat awkward, because you are using mutation (i.e. a side-effect) for an operation that is deeply rooted in functional programming. Fold uses the return value of one iteration as the starting value for the next iteration. But the operation you are doing is a mutation which doesn't actually return a useful value for the next iteration. That's why you have to then return the accumulator which you just modified.

If you were to express this in a functional way using Hash#merge, without mutation, it would look cleaner:

squares = original.inject ({}) do |squared, (name, value)| 
squared.merge({ name => value ** 2})
end

However, for the specific use-case where instead of returning a new accumulator on each iteration and using that for the next iteration, you want to just mutate the same accumulator over and over again, Ruby offers a different variant of fold under the name Enumerable#each_with_object, which completely ignores the return value of the block and just passes the same accumulator object every time. Confusingly, the order of the arguments in the block is reversed between Enumerable#inject (accumulator first, element second) and Enumerable#each_with_object (element first, accumulator second):

squares = original.each_with_object ({}) do |(name, value), squared| 
squared[name] = value ** 2}
end

However, it turns out, we can make this even simpler. I explained above that fold is general, i.e. it can solve every problem. Then why do we have those other operations in the first place? We have them for the same reason that we have subroutines, conditionals, exceptions, and loops, even though we could do everything with just GOTO: expressivity.

If you read some code using only GOTO, you have to "reverse engineer" what every particular usage of GOTO means: is it checking a condition, is it doing something multiple times? By having different, more specialized constructs, you can recognize at a glance what a particular piece of code does.

The same applies to these collection operations. In your case, for example, you are transforming each element of the original collection into a new element of the result collection. But, you have to actually read and understand what the block does, in order to recognize this.

However, as we discussed above, there is a more specialized operation available which does this: map. And everybody who sees map immediately understands "oh, this is mapping each element 1:1 to a new element", without having to even look at what the block does. So, we can write your code like this instead:

squares = original.map do |name, value| 
[name, value ** 2]
end.to_h

Note: Ruby's collection operations are for the most part not type-preserving, i.e. transforming a collection will typically not yield the same type of collection. Instead, in general, collection operations mostly return Arrays, which is why we have to call Array#to_h here at the end.

As you can see, because this operation is more specialized than fold (which can do everything), it is both simpler to read and also simpler to write (i.e. the inside of the block, the part that you as the programmer have to write, is simpler than what you had above).

But we are actually not done! It turns out that for this particular case, where we only want to transform the values of a Hash, there is actually an even more specialized operation available: Hash#transform_values:

squares = original.transform_values do |value| 
value ** 2
end

Epilogue

One of the things programmers do most often is iterate over collections. Practically every program ever written in any programming language odes this in some form or another. Therefore, it is very valuable to study the operations your particular programming language offers for doing this.

In Ruby, this means studying the Enumerable mixin as well as the additional methods provided by Array and Hash.

Also, study Enumerators and how to combine them.

But it is also very helpful to study the history of where these operations come from, which is mostly functional programming. If you understand the history of those operations, you will be able to quickly familiarize yourself with collection operations in many languages, since they all borrow from that same history, e.g. ECMAScript, Python, .NET LINQ, Java Streams, C++ STL algorithms, Swift, and many more.

What does Ruby have that Python doesn't, and vice versa?

You can have code in the class definition in both Ruby and Python. However, in Ruby you have a reference to the class (self). In Python you don't have a reference to the class, as the class isn't defined yet.

An example:

class Kaka
puts self
end

self in this case is the class, and this code would print out "Kaka". There is no way to print out the class name or in other ways access the class from the class definition body in Python.



Related Topics



Leave a reply



Submit