Does Ruby Have Something Like Python's List Comprehensions

List comprehension in Ruby

If you really want to, you can create an Array#comprehend method like this:

class Array
def comprehend(&block)
return self if block.nil?
self.collect(&block).compact
end
end

some_array = [1, 2, 3, 4, 5, 6]
new_array = some_array.comprehend {|x| x * 3 if x % 2 == 0}
puts new_array

Prints:

6
12
18

I would probably just do it the way you did though.

Does Ruby have something like Python's list comprehensions?

The common way in Ruby is to properly combine Enumerable and Array methods to achieve the same:

digits.product(chars).select{ |d, ch| d >= 2 && ch == 'a' }.map(&:join)

This is only 4 or so characters longer than the list comprehension and just as expressive (IMHO of course, but since list comprehensions are just a special application of the list monad, one could argue that it's probably possible to adequately rebuild that using Ruby's collection methods), while not needing any special syntax.

Ruby equivalent of Python's dict comprehension

Let's take a couple of steps back and ignore the specifics of Ruby and Python for now.

Mathematical set-builder notation

The concept of comprehension originally comes from mathematical set-builder notation, e.g. something like this: E = { n ∈ ℕ | 2∣n } which defines E to be the set of all even natural numbers, as does E = { 2n | n ∈ ℕ }.

List comprehensions in Programming Languages

This set-builder notation inspired similar constructs in many programming languages all the way back to 1969, although it wasn't until the 1970s that Phil Wadler coined the term comprehensions for these. List comprehensions ended up being implemented in Miranda in the early 1980s, which was a hugely influential programming language.

However, it is important to understand that these comprehensions do not add any new semantic features to the world of programming languages. In general, there is no program you can write with a comprehension that you cannot also write without. Comprehensions provide a very convenient syntax for expressing these kinds of transformations, but they don't do anything that couldn't also be achieved with the standard recursion patterns like fold, map, scan, unfold, and friends.

So, let's first look at how the various features of Python's comprehensions compare to the standard recursion patterns, and then see how those recursion patterns are available in Ruby.

Python

[Note: I will use Python list comprehension syntax here, but it doesn't really matter since list, set, dict comprehensions and generator expressions all work the same. I will also use the convention from functional programming to use single-letter variables for collection elements and the plural for collections, i.e. x for an element and xs for "a collection of x-es".]

Transforming each element the same way

[f(x) for x in xs]

This transforms each element of the original collection using a transformation function into a new element of a new collection. This new collection has the same number of elements as the original collection and there is a 1:1 correspondence between the elements of the original collection and the elements of the new collection.

One could say that each element of the original collection is mapped to an element of the new collection. Hence, this is typically called map in many programming languages, and in fact, it is called that in Python as well:

map(f, xs)

The same, but nested

Python allows you to have multiple for / ins in a single comprehension. This is more or less equivalent to having nested mappings which then get flattened into a single collection:

[f(x, y) for x in xs for y in ys]
# or
[f(y) for ys in xs for y in ys]

This combination of mapping and then flattening the collection is commonly known as flatMap (when applied to collections) or bind (when applied to Monads).

Filtering

The last operation that Python comprehensions support is filtering:

[x for x in xs if p(x)]

This will filter the collection xs into a collection which contains a subset of the original elements which satisfy the predicate p. This operation is commonly known as filter.

Combine as you like

Obviously, you can combine all of these, i.e. you can have a comprehension with multiple nested generators that filter out some elements and then transform them.

Ruby

Ruby also provides all of the recursion patterns (or collection operations) mentioned above, and many more. In Ruby, an object that can be iterated over, is called an enumerable, and the Enumerable mixin in the core library provides a lot of useful and powerful collection operations.

Ruby was originally heavily inspired by Smalltalk, and some of the older names of Ruby's original collection operations still go back to this Smalltalk heritage. In the Smalltalk collections framework, there is an in-joke about all the collections methods rhyming with each other, thus, the fundamental collections method in Smalltalk are called [listed here with their more standard equivalents from functional programming]:

  • collect:, which "collects" all elements returned from a block into a new collection, i.e. this is the equivalent to map.
  • select:, which "selects" all elements that satisfy a block, i.e. this is the equivalent to filter.
  • reject:, which "rejects" all elements that satisfy a block, i.e. this is the opposite of select: and thus equivalent to what is sometimes called filterNot.
  • detect:, which "detects" whether an element which satisfies a block is inside the collection, i.e. this is the equivalent to contains. Except, it actually returns the element as well, so it is more like findFirst.
  • inject:into: … where the nice naming schema breaks down somewhat …: it does "inject" a starting value "into" a block but that's a somewhat strained connection to what it actually does. This is the equivalent to fold.

So, Ruby has all of those, and more, and it uses some of the original naming, but thankfully, it also provides aliases.

Map

In Ruby, map is originally named Enumerable#collect but is also available as Enumerable#map, which is the name preferred by most Rubyists.

As mentioned above, this is also available in Python as the map built-in function.

FlatMap

In Ruby, flatMap is originally named Enumerable#collect_concat but is also available as Enumerable#flat_map, which is the name preferred by most Rubyists.

Filter

In Ruby, filter is originally named Enumerable#select, which is the name preferred by most Rubyists, but is also available as Enumerable#find_all.

FilterNot

In Ruby, filterNot is named Enumerable#reject.

FindFirst

In Ruby, findFirst is originally named Enumerable#detect, but is also available as Enumerable#find.

Fold

In Ruby, fold is originally named Enumerable#inject, but is also available as Enumerable#reduce.

It also exists in Python as functools.reduce.

Unfold

In Ruby, unfold is named Enumerator::produce.

Scan

Scan is unfortunately not available in Ruby. It is available in Python as itertools.accumulate.

A deep dive into recursion patterns

Armed with our nomenclature from above, we now know that what you wrote is called a fold:

squares = original.inject ({}) do |squared, (name, value)| 
squared[name] = value ** 2
squared
end

What you wrote here works. And that sentence I just wrote is actually surprisingly deep! Because fold has a very powerful property: everything which can be expressed as iterating over a collection can be expressed as a fold. In other words, everything that can be expressed as recursing over a collection (in a functional language), everything that can be expressed as looping / iterating over a collection (in an imperative language), everything that can be expressed using any of the afore-mentioned functions (map, filter, find), everything that can be expressed using Python's comprehensions, everything that can be expressed using some of the additional functions we haven't discussed yet (e.g. groupBy) can by expressed using fold.

If you have fold, you don't need anything else! If you were to remove every method from Enumerable except Enumerable#inject, you could still write everything you could write before; you could actually re-implement all the methods you just removed only by using Enumerable#inject. In fact, I did that once for fun as an exercise. You could also implement the missing scan operation mentioned above.

It is not necessarily obvious that fold really is general, but think of it this way: a collection can be either empty or not. fold has two arguments, one which tells it what to do when the collection is empty, and one which tells it what to do when the collection is not empty. Those are the only two cases, so every possible case is handled. Therefore, fold can do everything!

Or a different viewpoint: a collection is a stream of instructions, either the EMPTY instruction or the ELEMENT(value) instruction. fold is a skeleton interpreter for that instruction set, and you as a programmer can supply the implementation for the interpretation of both those instructions, namely the two arguments to fold are the interpretation of those instructions. [I was introduced to this eye-opening interpretation of fold as an interpreter and a collection as an instruction stream is due to Rúnar Bjarnason, co-author of Functional Programming in Scala and co-designer of the Unison Programming Language. Unfortunately, I cannot find the original talk anymore, but The Interpreter Pattern Revisited presents a much more general idea that should also bring it across.]

Note that the way you are using fold here is somewhat awkward, because you are using mutation (i.e. a side-effect) for an operation that is deeply rooted in functional programming. Fold uses the return value of one iteration as the starting value for the next iteration. But the operation you are doing is a mutation which doesn't actually return a useful value for the next iteration. That's why you have to then return the accumulator which you just modified.

If you were to express this in a functional way using Hash#merge, without mutation, it would look cleaner:

squares = original.inject ({}) do |squared, (name, value)| 
squared.merge({ name => value ** 2})
end

However, for the specific use-case where instead of returning a new accumulator on each iteration and using that for the next iteration, you want to just mutate the same accumulator over and over again, Ruby offers a different variant of fold under the name Enumerable#each_with_object, which completely ignores the return value of the block and just passes the same accumulator object every time. Confusingly, the order of the arguments in the block is reversed between Enumerable#inject (accumulator first, element second) and Enumerable#each_with_object (element first, accumulator second):

squares = original.each_with_object ({}) do |(name, value), squared| 
squared[name] = value ** 2}
end

However, it turns out, we can make this even simpler. I explained above that fold is general, i.e. it can solve every problem. Then why do we have those other operations in the first place? We have them for the same reason that we have subroutines, conditionals, exceptions, and loops, even though we could do everything with just GOTO: expressivity.

If you read some code using only GOTO, you have to "reverse engineer" what every particular usage of GOTO means: is it checking a condition, is it doing something multiple times? By having different, more specialized constructs, you can recognize at a glance what a particular piece of code does.

The same applies to these collection operations. In your case, for example, you are transforming each element of the original collection into a new element of the result collection. But, you have to actually read and understand what the block does, in order to recognize this.

However, as we discussed above, there is a more specialized operation available which does this: map. And everybody who sees map immediately understands "oh, this is mapping each element 1:1 to a new element", without having to even look at what the block does. So, we can write your code like this instead:

squares = original.map do |name, value| 
[name, value ** 2]
end.to_h

Note: Ruby's collection operations are for the most part not type-preserving, i.e. transforming a collection will typically not yield the same type of collection. Instead, in general, collection operations mostly return Arrays, which is why we have to call Array#to_h here at the end.

As you can see, because this operation is more specialized than fold (which can do everything), it is both simpler to read and also simpler to write (i.e. the inside of the block, the part that you as the programmer have to write, is simpler than what you had above).

But we are actually not done! It turns out that for this particular case, where we only want to transform the values of a Hash, there is actually an even more specialized operation available: Hash#transform_values:

squares = original.transform_values do |value| 
value ** 2
end

Epilogue

One of the things programmers do most often is iterate over collections. Practically every program ever written in any programming language odes this in some form or another. Therefore, it is very valuable to study the operations your particular programming language offers for doing this.

In Ruby, this means studying the Enumerable mixin as well as the additional methods provided by Array and Hash.

Also, study Enumerators and how to combine them.

But it is also very helpful to study the history of where these operations come from, which is mostly functional programming. If you understand the history of those operations, you will be able to quickly familiarize yourself with collection operations in many languages, since they all borrow from that same history, e.g. ECMAScript, Python, .NET LINQ, Java Streams, C++ STL algorithms, Swift, and many more.

List comprehension in Haskell, Python and Ruby

For Haskell I like

let s n = sum [0,n..999] in s 3 + s 5 - s 15

or

sum $ filter ((>1).(gcd 15)) [0..999]

For fun the Rube-Goldberg version:

import Data.Bits

sum $ zipWith (*) [1..999] $ zipWith (.|.) (cycle [0,0,1]) (cycle [0,0,0,0,1])

Okay, explanation time.

The first version defines a little function s that sums up all multiples of n up to 999. If we sum all multiples of 3 and all multiples of 5, we included all multiples of 15 twice (once in every list), hence we need to subtract them one time.

The second version uses the fact that 3 and 5 are primes. If a number contains one or both of the factors 3 and 5, the gcd of this number and 15 will be 3, 5 or 15, so in every case the gcd will be bigger than one. For other numbers without a common factor with 15 the gcd becomes 1. This is a nice trick to test both conditions in one step. But be careful, it won't work for arbitrary numbers, e.g. when we had 4 and 9, the test gdc x 36 > 1 won't work, as gcd 6 36 == 6, but neither mod 6 4 == 0 nor mod 6 9 == 0.

The third version is quite funny. cycle repeats a list over and over. cycle [0,0,1] codes the "divisibility pattern" for 3, and cycle [0,0,0,0,1] does the same for 5. Then we "or" both lists together using zipWith, which gives us [0,0,1,0,1,1,0,0,1,1,0,1...]. Now we use zipWith again to multiply this with the actual numbers, resulting in [0,0,3,0,5,6,0,0,9,10,0,12...]. Then we just add it up.

Knowing different ways to do the same thing might be wasteful for other languages, but for Haskell it is essential. You need to spot patterns, pick up tricks and idioms, and play around a lot in order to gain the mental flexibility to use this language effectively. Challenges like the project Euler problems are a good opportunity to do so.

Erlang vs Ruby list comprehensions

First off, your data structures aren't equivalent. The equivalent Ruby data structure to your Erlang example would be more like

weather = [[:toronto, :rain], [:montreal, :storms], [:london, :fog], 
[:paris, :sun], [:boston, :fog], [:vancouver, :snow]]

Secondly, yes, Ruby doesn't have list comprehensions nor pattern matching. So, the example will probably be more complex. Your list comprehension first filters all foggy cities, then projects the name. Let's do the same in Ruby:

weather.select {|_, weather| weather == :fog }.map(&:first)
# => [:london, :boston]

However, Ruby is centered around objects, but you are using abstract data types. With a more object-oriented data abstraction, the code would probably look more like

weather.select(&:foggy?).map(&:city)

which isn't too bad, is it?

Python list comprehension = Ruby select / reject on index rather than element

The method order is relevant:

arr.each_with_index.select { |e, i| i % 3 == 0 }
#=> [[10, 0], [40, 3], [70, 6], [100, 9]]

versus:

arr.select.each_with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]

Since select returns an enumerator, you could also use Enumerator#with_index:

arr.select.with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]

Regarding your slice equivalent, you can use map (or its alias collect) to collect the items in an array:

(0..arr.length).step(3).map { |e| arr[e] }
#=> [10, 40, 70, 100]

or values_at to fetch the items at the given indices:

arr.values_at(*(0..arr.length).step(3))
#=> [10, 40, 70, 100]

* turns the argument into an array (via to_a) and then into an argument list, i.e.:

arr.values_at(*(0..arr.length).step(3))
arr.values_at(*(0..arr.length).step(3).to_a)
arr.values_at(*[0, 3, 6, 9])
arr.values_at(0, 3, 6, 9)

Slightly shorter:

arr.values_at(*0.step(arr.size, 3))
#=> [10, 40, 70, 100]

What does Ruby have that Python doesn't, and vice versa?

You can have code in the class definition in both Ruby and Python. However, in Ruby you have a reference to the class (self). In Python you don't have a reference to the class, as the class isn't defined yet.

An example:

class Kaka
puts self
end

self in this case is the class, and this code would print out "Kaka". There is no way to print out the class name or in other ways access the class from the class definition body in Python.



Related Topics



Leave a reply



Submit