Python(Or Numpy) Equivalent of Match in R

What is the equivalent to R's match() for python Pandas/numpy?

Edit:

If url in all right dataframes re unique, you can make the right dataframe as a Series of class indexed by url, then you can get the class of every url in left by index it.

from pandas import *
left = DataFrame({'url': ['foo.com', 'bar.com', 'foo.com', 'tmp', 'foo.com'], 'action': [0, 1, 0, 2, 4]})
left["klass"] = NaN
right1 = DataFrame({'url': ['foo.com', 'tmp'], 'klass': [10, 20]})
right2 = DataFrame({'url': ['bar.com'], 'klass': [30]})

left["klass"] = left.klass.combine_first(right1.set_index('url').klass[left.url].reset_index(drop=True))
left["klass"] = left.klass.combine_first(right2.set_index('url').klass[left.url].reset_index(drop=True))

print left

Is this what you want?

import pandas as pd
left = pd.DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = pd.DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = pd.DataFrame({'url': ['bar.com'], 'class': [ 1]})

pd.merge(left.drop("class", axis=1), pd.concat([right1, right2]), on="url")

output:

   action      url  class
0 0 foo.com 0
1 1 foo.com 0
2 0 bar.com 1

if the class column in left is not all NaN, you can combine_fist it with the result.

Python equivalence of R's match() for indexing

You can use first drop_duplicates and then boolean indexing with isin or merge.

Python counts from 0, so for same output add 1.

A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})

B = B.drop_duplicates('c')
print (B)
c
0 c
2 b
6 a

print (B[B.c.isin(A.c)])
c
2 b
6 a

print (B[B.c.isin(A.c)].index)
Int64Index([2, 6], dtype='int64')

print (pd.merge(B.reset_index(), A))
index c
0 2 b
1 6 a

print (pd.merge(B.reset_index(), A)['index'])
0 2
1 6
Name: index, dtype: int64

simpler python equivalent of R-style grep, including multiple things to match

Perhaps you're looking for the re module?

import re
pattern = re.compile("oa|sch")
[i for i in range(len(df.columns)) if pattern.search(df.columns[i])]
# [1, 2, 3, 4]

Maybe not the nicest compared to R's vectorization, but the list comprehension should be fine.

And if you wanted to concatenate strings together, you could do something like

"|".join(("oa", "sch"))
# 'oa|sch'

Python equivalent of (matrix)*(vector) in R

I am the OP.
I was looking for a quick and easy solution, but I guess there is no straightforward functionality in Python that allows us to do this. So, I had to make a function that multiplies a matrix with a vector in the same manner that R does:

def R_product(X,c):

"""
Computes the regular R product
(not same as the matrix product) between
a 2D Numpy Array X, and a numpy vector c.

Args:
X: 2D Numpy Array
c: A Numpy vector

Returns: the output of X*c in R.
(This is different than X/*/c in R)
"""
X_nrow = X.shape[0]
X_ncol = X.shape[1]
X_dummy = np.zeros(shape=((X_nrow * X_ncol),1))
nrow = X_dummy.shape[0]
nc = nrow // len(c)
Y = np.zeros(shape=(nrow,1))

for j in range(X_ncol):
for u in range(X_nrow):
X_element = X[u,j]

if u == X_nrow - 1:
idx = X_nrow * (j+1) - 1
else:
idx = X_nrow * j + (u+1) - 1

X_dummy[idx,0] = X_element

for i in range(nc):
for j in range(len(c)):
Y[(i*len(c)+j):(i*len(c)+j+1),:] = (X_dummy[(i*len(c)+j):(i*len(c)+j+1),:]) * c[j]

for z in range(nrow-nc*len(c)):
Y[(nc*len(c)+z):(nc*len(c)+z+1),:] = (X_dummy[(nc*len(c)+z):(nc*len(c)+z+1),:]) * c[z]

return Y.reshape(X_ncol, X_nrow).transpose() # the answer I am looking for

Should work.

Difference of cov and cor between R and Python

This is because numpy calculates by row and R by column. Either comment out X = np.transpose(X) # byrow=FALSE, or use np.cov(X, rowvar=False).

np.cov(X, rowvar=False)
array([[ 1.75 , -1.75 , -1.5 ],
[-1.75 , 2.33333333, 3.66666667],
[-1.5 , 3.66666667, 9.33333333]])

The difference is explained in the respective documentation (emphasis mine):

Python:

help(np.cov)

rowvar : bool, optional
If rowvar is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.

R:

?cov

var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors. If x and y are matrices
then the covariances (or correlations) between the columns of x and
the columns of y are computed.



Related Topics



Leave a reply



Submit