Ruby Equivalent of Numpy

Numpy equivalents of Ruby array functions

Initially this looked like a bincount or histogram, but the output is the bins where each value fits, not the number of items per bin:

In [3]: eq_width_bin(data,3)                                                    
Out[3]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]

Your bins:

In [10]: np.linspace(np.min(data),np.max(data),4)                               
Out[10]: array([ 10.,  50.,  90., 130.])

we can identify the bin for each value with a simple integer division:

In [12]: (data-10)//40                                                          
Out[12]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 3, 1])

and correct that 3 with:

In [16]: np.minimum((data-10)//40,2)                                            
Out[16]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])

But that doesn't answer you question about .select .collect .inject .sort_by. Off hand I'm not familiar with those (though I was a fan of Squeak years ago, and dabbled in Ruby a bit). They sound more like iterators, such as those collected in itertools.

With numpy we don't usually take an iterative approach. Rather we try to look at the arrays as a whole, doing things like division and min/max for the whole thing.

===

searchsorted also works for this problem:

In [19]: np.searchsorted(Out[10],data)                                              
Out[19]: array([2, 3, 2, 1, 1, 0, 2, 2, 3, 3, 3, 2])

In [21]: np.maximum(0,np.searchsorted(Out[10],data)-1)                              
Out[21]: array([1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1])

A (possibly) cleaner expression of your Python loop:

def foo(i, edges):
    for j,n in enumerate(edges):
        if i<n:
            return j-1
    return j-1
In [34]: edges = np.linspace(np.min(data),np.max(data),4).tolist()              
In [35]: [foo(i,edges) for i in data]                                           
Out[35]: [1, 2, 1, 0, 0, 0, 1, 1, 2, 2, 2, 1]

I converted edges to a list, because it's faster to iterate that way.

With itertools:

In [55]: [len(list(itertools.takewhile(lambda x: x<i,edges)))-1 for i in data]  
Out[55]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]

===

Another approach

In [45]: edges = np.linspace(np.min(data),np.max(data),4)                       
In [46]: data[:,None]<-edges                                                    
Out[46]: 
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])
In [47]: np.argmax(data[:,None]<edges, axis=1)-1                                
Out[47]: array([ 1,  2,  1,  0,  0,  0,  1,  1,  2,  2, -1,  1])

That -1 needs cleaning (the row where there's no True).

edit

Lists have an index method; with that we can get an expression that's a lot like your last Ruby line. Looks like list comprehension is a lot like the Ruby collect:

In [88]: [[i<j for i in edges].index(False)-1 for j in data]                    
Out[88]: [1, 2, 1, 0, 0, -1, 1, 1, 2, 2, 2, 1]

Anything like SciPy in Ruby?

There's nothing quite as mature or well done as SciPy, but check out SciRuby and Numerical Ruby.

What is the equivalent of numpy.random.choice([0,1], p=[0.2, 0.8]) in ruby?

You could use https://github.com/fl00r/pickup

A simple example is:

require 'pickup'
headings = {
  A: 40,
  B: 20,
  C: 40,
}
pickup = Pickup.new(headings)
pickup.pick
#=> A
pickup.pick
#=> B
pickup.pick
#=> A
pickup.pick
#=> C
pickup.pick
#=> C

In your case you have 2 option 1/0 and the probabilities is 20 and 80. But this solution is also applicable if you have a non binary situations.

Scientific Programming with Ruby

Linear algebra is at the heart of most large-scale scientific computing. LAPACK is the gold standard for linear algebra libraries, first written in FORTRAN.

There's a port to Ruby here. Once you have that, the rest is incidental, but there are also plotting routines in Ruby.

Python Equivalent to Ruby's chunk_while?

Python doesn't have an equivalent function. You'll have to write your own.

Here's my implementation, using an iterator and the yield statement:

def chunk_while(iterable, predicate):
    itr = iter(iterable)

    try:
        prev_value = next(itr)
    except StopIteration:
        # if the iterable is empty, yield nothing
        return

    chunk = [prev_value]
    for value in itr:
        # if the predicate returns False, start a new chunk
        if not predicate(prev_value, value):
            yield chunk
            chunk = []

        chunk.append(value)
        prev_value = value

    # don't forget to yield the final chunk
    if chunk:
        yield chunk

Which can be used like so:

>>> list(chunk_while([1, 3, 2, 5, 5], lambda prev, next_: next_ <= prev + 2))
[[1, 3, 2], [5, 5]]

Ruby NArray.to_na() and Python numpy.array()

You can easily achieve this by using the fromstring method of numpy:

import numpy as np

line = "#!/usr/bin/ruby\n#\n#  Gen"
array = np.fromstring(line, dtype=float)
print array

Executing the above code results in

[  9.05457127e+164   3.30197767e-258   6.15310337e+223]

Python equivalent of ruby's `map.with_index`?

Use enumerate which yields tuples containing indices with original values:

>>> enumerate(array)
<enumerate object at 0x7f4ad46d0190>
>>> list(enumerate(array))
[(0, 200), (1, 100), (2, 150), (3, 500), (4, 25), (5, 650), (6, 175)]

combining with list comprehension:

>>> array = [200,100,150,500,25,650,175]
>>> [i for i, x in enumerate(array) if x < 200]
[1, 2, 4, 6]