Python equivalent of Ruby's .select
Python has a built-in filter
function:
lst = [1, 2, 3, 4, 5, 6]
filtered = filter(lambda x: x < 5, lst)
But list comprehensions might flow better, especially when combining with map operations:
mapped_and_filtered = [x*2 for x in lst if x < 5]
# compare to:
mapped_and_filtered = map(lambda y: y*2, filter(lambda x: x < 5, lst))
Python list comprehension = Ruby select / reject on index rather than element
The method order is relevant:
arr.each_with_index.select { |e, i| i % 3 == 0 }
#=> [[10, 0], [40, 3], [70, 6], [100, 9]]
versus:
arr.select.each_with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]
Since select
returns an enumerator, you could also use Enumerator#with_index
:
arr.select.with_index { |e, i| i % 3 == 0 }
#=> [10, 40, 70, 100]
Regarding your slice equivalent, you can use map
(or its alias collect
) to collect the items in an array:
(0..arr.length).step(3).map { |e| arr[e] }
#=> [10, 40, 70, 100]
or values_at
to fetch the items at the given indices:
arr.values_at(*(0..arr.length).step(3))
#=> [10, 40, 70, 100]
*
turns the argument into an array (via to_a
) and then into an argument list, i.e.:
arr.values_at(*(0..arr.length).step(3))
arr.values_at(*(0..arr.length).step(3).to_a)
arr.values_at(*[0, 3, 6, 9])
arr.values_at(0, 3, 6, 9)
Slightly shorter:
arr.values_at(*0.step(arr.size, 3))
#=> [10, 40, 70, 100]
Numpy equivalents of Ruby array functions
Initially this looked like a bincount
or histogram
, but the output is the bins where each value fits, not the number of items per bin:
In [3]: eq_width_bin(data,3)
Out[3]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]
Your bins:
In [10]: np.linspace(np.min(data),np.max(data),4)
Out[10]: array([ 10., 50., 90., 130.])
we can identify the bin for each value with a simple integer division:
In [12]: (data-10)//40
Out[12]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 3, 1])
and correct that 3 with:
In [16]: np.minimum((data-10)//40,2)
Out[16]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])
But that doesn't answer you question about .select .collect .inject .sort_by
. Off hand I'm not familiar with those (though I was a fan of Squeak
years ago, and dabbled in Ruby
a bit). They sound more like iterators, such as those collected in itertools
.
With numpy
we don't usually take an iterative approach. Rather we try to look at the arrays as a whole, doing things like division and min/max for the whole thing.
===
searchsorted
also works for this problem:
In [19]: np.searchsorted(Out[10],data)
Out[19]: array([2, 3, 2, 1, 1, 0, 2, 2, 3, 3, 3, 2])
In [21]: np.maximum(0,np.searchsorted(Out[10],data)-1)
Out[21]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])
A (possibly) cleaner expression of your Python loop:
def foo(i, edges):
for j,n in enumerate(edges):
if i<n:
return j-1
return j-1
In [34]: edges = np.linspace(np.min(data),np.max(data),4).tolist()
In [35]: [foo(i,edges) for i in data]
Out[35]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]
I converted edges
to a list, because it's faster to iterate that way.
With itertools
:
In [55]: [len(list(itertools.takewhile(lambda x: x<i,edges)))-1 for i in data]
Out[55]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]
===
Another approach
In [45]: edges = np.linspace(np.min(data),np.max(data),4)
In [46]: data[:,None]<-edges
Out[46]:
array([[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False]])
In [47]: np.argmax(data[:,None]<edges, axis=1)-1
Out[47]: array([ 1, 2, 1, 0, 0, 0, 1, 1, 2, 2, -1, 1])
That -1
needs cleaning (the row where there's no True).
edit
Lists have an index
method; with that we can get an expression that's a lot like your last Ruby
line. Looks like list comprehension is a lot like the Ruby collect
:
In [88]: [[i<j for i in edges].index(False)-1 for j in data]
Out[88]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]
Does Ruby have something like Python's list comprehensions?
The common way in Ruby is to properly combine Enumerable and Array methods to achieve the same:
digits.product(chars).select{ |d, ch| d >= 2 && ch == 'a' }.map(&:join)
This is only 4 or so characters longer than the list comprehension and just as expressive (IMHO of course, but since list comprehensions are just a special application of the list monad, one could argue that it's probably possible to adequately rebuild that using Ruby's collection methods), while not needing any special syntax.
List comprehension in Ruby
If you really want to, you can create an Array#comprehend method like this:
class Array
def comprehend(&block)
return self if block.nil?
self.collect(&block).compact
end
end
some_array = [1, 2, 3, 4, 5, 6]
new_array = some_array.comprehend {|x| x * 3 if x % 2 == 0}
puts new_array
Prints:
6
12
18
I would probably just do it the way you did though.
What is the Clojure equivalent of Ruby's select?
(filter #(or (zero? (mod % 3)) (zero? (mod % 5))) (range 1000))
List comprehension in Haskell, Python and Ruby
For Haskell I like
let s n = sum [0,n..999] in s 3 + s 5 - s 15
or
sum $ filter ((>1).(gcd 15)) [0..999]
For fun the Rube-Goldberg version:
import Data.Bits
sum $ zipWith (*) [1..999] $ zipWith (.|.) (cycle [0,0,1]) (cycle [0,0,0,0,1])
Okay, explanation time.
The first version defines a little function s that sums up all multiples of n up to 999. If we sum all multiples of 3 and all multiples of 5, we included all multiples of 15 twice (once in every list), hence we need to subtract them one time.
The second version uses the fact that 3 and 5 are primes. If a number contains one or both of the factors 3 and 5, the gcd of this number and 15 will be 3, 5 or 15, so in every case the gcd will be bigger than one. For other numbers without a common factor with 15 the gcd becomes 1. This is a nice trick to test both conditions in one step. But be careful, it won't work for arbitrary numbers, e.g. when we had 4 and 9, the test gdc x 36 > 1
won't work, as gcd 6 36 == 6
, but neither mod 6 4 == 0
nor mod 6 9 == 0
.
The third version is quite funny. cycle
repeats a list over and over. cycle [0,0,1]
codes the "divisibility pattern" for 3, and cycle [0,0,0,0,1]
does the same for 5. Then we "or" both lists together using zipWith
, which gives us [0,0,1,0,1,1,0,0,1,1,0,1...]
. Now we use zipWith
again to multiply this with the actual numbers, resulting in [0,0,3,0,5,6,0,0,9,10,0,12...]
. Then we just add it up.
Knowing different ways to do the same thing might be wasteful for other languages, but for Haskell it is essential. You need to spot patterns, pick up tricks and idioms, and play around a lot in order to gain the mental flexibility to use this language effectively. Challenges like the project Euler problems are a good opportunity to do so.
In Ruby, does reject or select combined with each results in multiple iterations?
You can use Enumerator::Lazy
to process it in one iteration:
arr.lazy.reject { |e| e == :c }.each { |e| handle(e) }
This will also change the order of invocation. The first element is being processed by each block, then the second element and so on:
arr.lazy.reject { |e|
puts "filtering #{e}"; e == :c
}.each { |e|
puts "handling #{e}"
}
Output:
filtering a
handling a
filtering b
handling b
filtering c # <- c doesn't make it to the 2nd block
filtering d
handling d
The non-lazy approach passes all elements to the first block and the results to the second block:
arr.reject { |e|
puts "filtering #{e}"; e == :c
}.each { |e|
puts "handling #{e}"
}
Output:
filtering a
filtering b
filtering c
filtering d
handling a
handling b
handling d
Related Topics
Convert Column to Date Format (Pandas Dataframe)
Styling Multi-Line Conditions in 'If' Statements
How to Change the Styles of Pandas Dataframe Headers
How to Convert R Dataframe Back to Pandas Using Rpy2
What Is Ruby Equivalent of Python's 'S= "Hello, %S. Where Is %S" % ("John","Mary")'
How to Import a JSON from a File on Cloud Storage to Bigquery
How to Increment Datetime by Custom Months in Python Without Using Library
Matplotlib: Annotating a 3D Scatter Plot
Remove Characters Except Digits from String Using Python
Database Does Not Update Automatically with MySQL and Python
Explaining Python's '_Enter_' and '_Exit_'
Color Coding Cells in a Table Based on the Cell Value Using Jinja Templates
How Is the Feature Score(/Importance) in the Xgboost Package Calculated
Python VS Groovy VS Ruby? (Based on Criteria Listed in Question)
What Does Blazeds Livecycle Data Services Do, That Something Like Pyamf or Rubyamf Not Do