Python Equivalent of Ruby's .Select

Python equivalent of Ruby's .select

Python has a built-in filter function:

lst = [1, 2, 3, 4, 5, 6]
filtered = filter(lambda x: x < 5, lst)

But list comprehensions might flow better, especially when combining with map operations:

mapped_and_filtered = [x*2 for x in lst if x < 5]
# compare to:
mapped_and_filtered = map(lambda y: y*2, filter(lambda x: x < 5, lst))

Python list comprehension = Ruby select / reject on index rather than element

The method order is relevant:

arr.each_with_index.select { |e, i| i % 3 == 0 }
#=> [[10, 0], [40, 3], [70, 6], [100, 9]]

versus:

arr.select.each_with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]

Since select returns an enumerator, you could also use Enumerator#with_index:

arr.select.with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]

Regarding your slice equivalent, you can use map (or its alias collect) to collect the items in an array:

(0..arr.length).step(3).map { |e| arr[e] }
#=> [10, 40, 70, 100]

or values_at to fetch the items at the given indices:

arr.values_at(*(0..arr.length).step(3))
#=> [10, 40, 70, 100]

* turns the argument into an array (via to_a) and then into an argument list, i.e.:

arr.values_at(*(0..arr.length).step(3))
arr.values_at(*(0..arr.length).step(3).to_a)
arr.values_at(*[0, 3, 6, 9])
arr.values_at(0, 3, 6, 9)

Slightly shorter:

arr.values_at(*0.step(arr.size, 3))
#=> [10, 40, 70, 100]

Numpy equivalents of Ruby array functions

Initially this looked like a bincount or histogram, but the output is the bins where each value fits, not the number of items per bin:

In [3]: eq_width_bin(data,3)                                                    
Out[3]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]

Your bins:

In [10]: np.linspace(np.min(data),np.max(data),4)                               
Out[10]: array([ 10.,  50.,  90., 130.])

we can identify the bin for each value with a simple integer division:

In [12]: (data-10)//40                                                          
Out[12]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 3, 1])

and correct that 3 with:

In [16]: np.minimum((data-10)//40,2)                                            
Out[16]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])

But that doesn't answer you question about .select .collect .inject .sort_by. Off hand I'm not familiar with those (though I was a fan of Squeak years ago, and dabbled in Ruby a bit). They sound more like iterators, such as those collected in itertools.

With numpy we don't usually take an iterative approach. Rather we try to look at the arrays as a whole, doing things like division and min/max for the whole thing.

===

searchsorted also works for this problem:

In [19]: np.searchsorted(Out[10],data)                                              
Out[19]: array([2, 3, 2, 1, 1, 0, 2, 2, 3, 3, 3, 2])

In [21]: np.maximum(0,np.searchsorted(Out[10],data)-1)                              
Out[21]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])

A (possibly) cleaner expression of your Python loop:

def foo(i, edges):
    for j,n in enumerate(edges):
        if i<n:
            return j-1
    return j-1
In [34]: edges = np.linspace(np.min(data),np.max(data),4).tolist()              
In [35]: [foo(i,edges) for i in data]                                           
Out[35]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]

I converted edges to a list, because it's faster to iterate that way.

With itertools:

In [55]: [len(list(itertools.takewhile(lambda x: x<i,edges)))-1 for i in data]  
Out[55]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]

===

Another approach

In [45]: edges = np.linspace(np.min(data),np.max(data),4)                       
In [46]: data[:,None]<-edges                                                    
Out[46]: 
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])
In [47]: np.argmax(data[:,None]<edges, axis=1)-1                                
Out[47]: array([ 1,  2,  1,  0,  0,  0,  1,  1,  2,  2, -1,  1])

That -1 needs cleaning (the row where there's no True).

edit

Lists have an index method; with that we can get an expression that's a lot like your last Ruby line. Looks like list comprehension is a lot like the Ruby collect:

In [88]: [[i<j for i in edges].index(False)-1 for j in data]                    
Out[88]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]

Does Ruby have something like Python's list comprehensions?

The common way in Ruby is to properly combine Enumerable and Array methods to achieve the same:

digits.product(chars).select{ |d, ch| d >= 2 && ch == 'a' }.map(&:join)

This is only 4 or so characters longer than the list comprehension and just as expressive (IMHO of course, but since list comprehensions are just a special application of the list monad, one could argue that it's probably possible to adequately rebuild that using Ruby's collection methods), while not needing any special syntax.

List comprehension in Ruby

If you really want to, you can create an Array#comprehend method like this:

class Array
  def comprehend(&block)
    return self if block.nil?
    self.collect(&block).compact
  end
end

some_array = [1, 2, 3, 4, 5, 6]
new_array = some_array.comprehend {|x| x * 3 if x % 2 == 0}
puts new_array

Prints:

6
12
18

I would probably just do it the way you did though.

What is the Clojure equivalent of Ruby's select?

(filter #(or (zero? (mod % 3)) (zero? (mod % 5))) (range 1000))

List comprehension in Haskell, Python and Ruby

For Haskell I like

let s n = sum [0,n..999] in s 3 + s 5 - s 15

sum $ filter ((>1).(gcd 15)) [0..999]

For fun the Rube-Goldberg version:

import Data.Bits

sum $ zipWith (*) [1..999] $ zipWith (.|.) (cycle [0,0,1]) (cycle [0,0,0,0,1])

Okay, explanation time.

The first version defines a little function s that sums up all multiples of n up to 999. If we sum all multiples of 3 and all multiples of 5, we included all multiples of 15 twice (once in every list), hence we need to subtract them one time.

The second version uses the fact that 3 and 5 are primes. If a number contains one or both of the factors 3 and 5, the gcd of this number and 15 will be 3, 5 or 15, so in every case the gcd will be bigger than one. For other numbers without a common factor with 15 the gcd becomes 1. This is a nice trick to test both conditions in one step. But be careful, it won't work for arbitrary numbers, e.g. when we had 4 and 9, the test gdc x 36 > 1 won't work, as gcd 6 36 == 6, but neither mod 6 4 == 0 nor mod 6 9 == 0.

The third version is quite funny. cycle repeats a list over and over. cycle [0,0,1] codes the "divisibility pattern" for 3, and cycle [0,0,0,0,1] does the same for 5. Then we "or" both lists together using zipWith, which gives us [0,0,1,0,1,1,0,0,1,1,0,1...]. Now we use zipWith again to multiply this with the actual numbers, resulting in [0,0,3,0,5,6,0,0,9,10,0,12...]. Then we just add it up.

Knowing different ways to do the same thing might be wasteful for other languages, but for Haskell it is essential. You need to spot patterns, pick up tricks and idioms, and play around a lot in order to gain the mental flexibility to use this language effectively. Challenges like the project Euler problems are a good opportunity to do so.

In Ruby, does reject or select combined with each results in multiple iterations?

You can use Enumerator::Lazy to process it in one iteration:

arr.lazy.reject { |e| e == :c }.each { |e| handle(e) }

This will also change the order of invocation. The first element is being processed by each block, then the second element and so on:

arr.lazy.reject { |e|
  puts "filtering #{e}"; e == :c
}.each { |e|
  puts "handling #{e}"
}

Output:

filtering a
handling a
filtering b
handling b
filtering c  # <- c doesn't make it to the 2nd block
filtering d
handling d

The non-lazy approach passes all elements to the first block and the results to the second block:

arr.reject { |e|
  puts "filtering #{e}"; e == :c
}.each { |e|
  puts "handling #{e}"
}

Output:

filtering a
filtering b
filtering c
filtering d
handling a
handling b
handling d

Python Equivalent of Ruby's .Select