Minimal Example of Rpy2 Regression Using Pandas Data Frame

Minimal example of rpy2 regression using pandas data frame

The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

Otherwise, the conversion shipping with rpy2 appears to be working here:

from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

The result:

>>> print(base.summary(M).rx2('coefficients'))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6 1.1489125 0.522233 0.6376181
x 0.8 0.3464102 2.309401 0.1040880

pandas dataframe to rpy2 dataframe generates me unwanted data

Fixed : apparently there is a bug in rpy2 during the conversion of my data.frame.
pandas2ri.py2ri thought that the column "fips"(which should have been strings), was a list. And therefore he created N*N extra values (whereby N == the amount of records).
To solve this, I had to parse, each column separately, and check if it was a string value or not. If it wasn't a string value, I parsed it to a string.
Last but not least, I created / concatenated all the column vectors together

           sh = df.shape
rlc_ordDict = [0] * sh[1]

for index,column in enumerate(df):
r_vector = pandas2ri.py2ri(df[column])

# If the type of the column is a string
if type(r_vector) == rinterface.StrSexpVector:
rlc_ordDict[index] = (column, r_vector)

# If the type of a column is not a string, make it a string
else:
r_vector = pandas2ri.py2ri(df[column].apply(str))
rlc_ordDict[index] = (column, r_vector)

od = rlc.OrdDict(rlc_ordDict)
df = robjects.DataFrame(od)

How to convert a rpy2 matrix object into a Pandas data frame?

May be it should happen automatically during conversion, but in the meantime row and column names can easily be obtained from the R object and added to the pandas DataFrame. For example the column names for the R matrix should be at: https://rpy2.github.io/doc/v2.9.x/html/vector.html#rpy2.robjects.vectors.Matrix.colnames

pandas and rpy2: Why does ezANOVA work via robjects.r but not robjects.packages.importr?

In the easy version you are passing symbol names as strings. This is not the same as a symbol.

Check the use of as_symbol in Minimal example of rpy2 regression using pandas data frame

Regression by group and display output in python

I will show some mockup so you can build the rest. It is mainly pulling up a your custom regression function and passing the dataframe in using apply.

let me know what you think.

import pandas as pd
import statsmodels.api as sm

def GroupRegress(data, yvar, xvars):
Y = data[yvar]
X = data[xvars]
X['intercept'] = 1.
result = sm.OLS(Y, X).fit()
return result.params

import pandas as pd
df = pd.DataFrame({'group': [1,1,1,2,2,2],
'Y': [9,5,3,1,2,3],
'X': [3,4,1,6,4,9]
})
df

df.groupby('group').apply(GroupRegress, 'Y', ['X'])

Result below:

X   intercept
group
1 1.000000 3.0
2 0.236842 0.5

Running R's aov() mixed effects model from Python using rpy2

[Voting up just because you have a nice small and self-contained example.]

The R equivalent of what you are doing with rpy2 is the following (and returns the same error)

> mixed <- aov("result ~ group*session + covar + Error(as.factor(subject)/session)",data=df)
Error: $ operator is invalid for atomic vectors

Formula objects are different than strings.

> class(y ~ x)
[1] "formula"
> class("y ~ x")
[1] "character"

rpy2 has a constructor to build R formulae from Python strings:

from rpy2.robjects import Formula
fml = Formula("y ~ x")

Pass this to aov() instead of the string.



Related Topics



Leave a reply



Submit