Python(Or Numpy) Equivalent of Match in R

What is the equivalent to R's match() for python Pandas/numpy?

Edit:

If url in all right dataframes re unique, you can make the right dataframe as a Series of class indexed by url, then you can get the class of every url in left by index it.

from pandas import *
left = DataFrame({'url': ['foo.com', 'bar.com', 'foo.com', 'tmp', 'foo.com'], 'action': [0, 1, 0, 2, 4]})
left["klass"] = NaN
right1 = DataFrame({'url': ['foo.com', 'tmp'], 'klass': [10, 20]})
right2 = DataFrame({'url': ['bar.com'], 'klass': [30]})

left["klass"] = left.klass.combine_first(right1.set_index('url').klass[left.url].reset_index(drop=True))
left["klass"] = left.klass.combine_first(right2.set_index('url').klass[left.url].reset_index(drop=True))

print left

Is this what you want?

import pandas as pd
left = pd.DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = pd.DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = pd.DataFrame({'url': ['bar.com'], 'class': [ 1]})

pd.merge(left.drop("class", axis=1), pd.concat([right1, right2]), on="url")

output:

   action      url  class
0       0  foo.com      0
1       1  foo.com      0
2       0  bar.com      1

if the class column in left is not all NaN, you can combine_fist it with the result.

Python equivalence of R's match() for indexing

You can use first drop_duplicates and then boolean indexing with isin or merge.

Python counts from 0, so for same output add 1.

A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})

B = B.drop_duplicates('c')
print (B)
   c
0  c
2  b
6  a

print (B[B.c.isin(A.c)])
   c
2  b
6  a

print (B[B.c.isin(A.c)].index)
Int64Index([2, 6], dtype='int64')

print (pd.merge(B.reset_index(), A))
   index  c
0      2  b
1      6  a

print (pd.merge(B.reset_index(), A)['index'])
0    2
1    6
Name: index, dtype: int64

simpler python equivalent of R-style grep, including multiple things to match

Perhaps you're looking for the re module?

import re
pattern = re.compile("oa|sch")
[i for i in range(len(df.columns)) if pattern.search(df.columns[i])]
# [1, 2, 3, 4]

Maybe not the nicest compared to R's vectorization, but the list comprehension should be fine.

And if you wanted to concatenate strings together, you could do something like

"|".join(("oa", "sch"))
# 'oa|sch'

Python equivalent of (matrix)*(vector) in R

I am the OP.
I was looking for a quick and easy solution, but I guess there is no straightforward functionality in Python that allows us to do this. So, I had to make a function that multiplies a matrix with a vector in the same manner that R does:

def R_product(X,c):

"""
Computes the regular R product 
(not same as the matrix product) between 
a 2D Numpy Array X, and a numpy vector c.

Args:
   X: 2D Numpy Array
   c: A Numpy vector

Returns: the output of X*c in R. 
         (This is different than X/*/c in R)
"""
    X_nrow = X.shape[0]
    X_ncol = X.shape[1]
    X_dummy = np.zeros(shape=((X_nrow * X_ncol),1))
    nrow = X_dummy.shape[0]
    nc = nrow // len(c)
    Y = np.zeros(shape=(nrow,1))

    for j in range(X_ncol):
        for u in range(X_nrow):
            X_element = X[u,j]
                
            if u == X_nrow - 1:
                idx = X_nrow * (j+1) - 1 
            else:
                idx = X_nrow * j + (u+1) - 1
                    
            X_dummy[idx,0] = X_element

    for i in range(nc):
        for j in range(len(c)):
            Y[(i*len(c)+j):(i*len(c)+j+1),:] = (X_dummy[(i*len(c)+j):(i*len(c)+j+1),:]) * c[j]
     
    for z in range(nrow-nc*len(c)):
        Y[(nc*len(c)+z):(nc*len(c)+z+1),:] = (X_dummy[(nc*len(c)+z):(nc*len(c)+z+1),:]) * c[z]

    return Y.reshape(X_ncol, X_nrow).transpose() # the answer I am looking for

Should work.

Difference of cov and cor between R and Python

This is because numpy calculates by row and R by column. Either comment out X = np.transpose(X) # byrow=FALSE, or use np.cov(X, rowvar=False).

np.cov(X, rowvar=False)
array([[ 1.75      , -1.75      , -1.5       ],
       [-1.75      ,  2.33333333,  3.66666667],
       [-1.5       ,  3.66666667,  9.33333333]])

The difference is explained in the respective documentation (emphasis mine):

Python:

help(np.cov)

rowvar : bool, optional
If rowvar is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.

R:

?cov

var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors. If x and y are matrices
then the covariances (or correlations) between the columns of x and
the columns of y are computed.