What is the equivalent to R's match() for python Pandas/numpy?
Edit:
If url in all right dataframes re unique, you can make the right dataframe as a Series of class
indexed by url
, then you can get the class of every url in left by index it.
from pandas import *
left = DataFrame({'url': ['foo.com', 'bar.com', 'foo.com', 'tmp', 'foo.com'], 'action': [0, 1, 0, 2, 4]})
left["klass"] = NaN
right1 = DataFrame({'url': ['foo.com', 'tmp'], 'klass': [10, 20]})
right2 = DataFrame({'url': ['bar.com'], 'klass': [30]})
left["klass"] = left.klass.combine_first(right1.set_index('url').klass[left.url].reset_index(drop=True))
left["klass"] = left.klass.combine_first(right2.set_index('url').klass[left.url].reset_index(drop=True))
print left
Is this what you want?
import pandas as pd
left = pd.DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = pd.DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = pd.DataFrame({'url': ['bar.com'], 'class': [ 1]})
pd.merge(left.drop("class", axis=1), pd.concat([right1, right2]), on="url")
output:
action url class
0 0 foo.com 0
1 1 foo.com 0
2 0 bar.com 1
if the class column in left is not all NaN, you can combine_fist it with the result.
Python equivalence of R's match() for indexing
You can use first drop_duplicates
and then boolean indexing
with isin
or merge
.
Python counts from 0
, so for same output add 1
.
A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})
B = B.drop_duplicates('c')
print (B)
c
0 c
2 b
6 a
print (B[B.c.isin(A.c)])
c
2 b
6 a
print (B[B.c.isin(A.c)].index)
Int64Index([2, 6], dtype='int64')
print (pd.merge(B.reset_index(), A))
index c
0 2 b
1 6 a
print (pd.merge(B.reset_index(), A)['index'])
0 2
1 6
Name: index, dtype: int64
simpler python equivalent of R-style grep, including multiple things to match
Perhaps you're looking for the re
module?
import re
pattern = re.compile("oa|sch")
[i for i in range(len(df.columns)) if pattern.search(df.columns[i])]
# [1, 2, 3, 4]
Maybe not the nicest compared to R's vectorization, but the list comprehension should be fine.
And if you wanted to concatenate strings together, you could do something like
"|".join(("oa", "sch"))
# 'oa|sch'
Python equivalent of (matrix)*(vector) in R
I am the OP.
I was looking for a quick and easy solution, but I guess there is no straightforward functionality in Python that allows us to do this. So, I had to make a function that multiplies a matrix with a vector in the same manner that R does:
def R_product(X,c):
"""
Computes the regular R product
(not same as the matrix product) between
a 2D Numpy Array X, and a numpy vector c.
Args:
X: 2D Numpy Array
c: A Numpy vector
Returns: the output of X*c in R.
(This is different than X/*/c in R)
"""
X_nrow = X.shape[0]
X_ncol = X.shape[1]
X_dummy = np.zeros(shape=((X_nrow * X_ncol),1))
nrow = X_dummy.shape[0]
nc = nrow // len(c)
Y = np.zeros(shape=(nrow,1))
for j in range(X_ncol):
for u in range(X_nrow):
X_element = X[u,j]
if u == X_nrow - 1:
idx = X_nrow * (j+1) - 1
else:
idx = X_nrow * j + (u+1) - 1
X_dummy[idx,0] = X_element
for i in range(nc):
for j in range(len(c)):
Y[(i*len(c)+j):(i*len(c)+j+1),:] = (X_dummy[(i*len(c)+j):(i*len(c)+j+1),:]) * c[j]
for z in range(nrow-nc*len(c)):
Y[(nc*len(c)+z):(nc*len(c)+z+1),:] = (X_dummy[(nc*len(c)+z):(nc*len(c)+z+1),:]) * c[z]
return Y.reshape(X_ncol, X_nrow).transpose() # the answer I am looking for
Should work.
Difference of cov and cor between R and Python
This is because numpy
calculates by row and R
by column. Either comment out X = np.transpose(X) # byrow=FALSE
, or use np.cov(X, rowvar=False)
.
np.cov(X, rowvar=False)
array([[ 1.75 , -1.75 , -1.5 ],
[-1.75 , 2.33333333, 3.66666667],
[-1.5 , 3.66666667, 9.33333333]])
The difference is explained in the respective documentation (emphasis mine):
Python:
help(np.cov)
rowvar : bool, optional
Ifrowvar
is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.
R:
?cov
var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors. If x and y are matrices
then the covariances (or correlations) between the columns of x and
the columns of y are computed.
Related Topics
How to Execute a Python Script from the Django Shell
How to Delete the Contents of a Folder
Proper Indentation for Multiline Strings
How to Define a Function with Optional Arguments
How to Implement a Tree in Python
Fast Way of Counting Non-Zero Bits in Positive Integer
Convert Column to Date Format (Pandas Dataframe)
Remove Characters Except Digits from String Using Python
Force Python to Forego Native SQLite3 and Use the (Installed) Latest SQLite3 Version
Different Yaml Array Representations
How to Decrypt Aws Ruby Client-Side Encryption in Python
Equivalent to Python's Findall() Method in Ruby
Find in Files Using Ruby or Python
Using Perl, Python, or Ruby, How to Write a Program to "Click" on the Screen at Scheduled Time
How to Create an Empty R Vector to Add New Items