Minimal Example of Rpy2 Regression Using Pandas Data Frame

Minimal example of rpy2 regression using pandas data frame

The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

Otherwise, the conversion shipping with rpy2 appears to be working here:

from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

The result:

>>> print(base.summary(M).rx2('coefficients'))
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

pandas dataframe to rpy2 dataframe generates me unwanted data

Fixed : apparently there is a bug in rpy2 during the conversion of my data.frame.
pandas2ri.py2ri thought that the column "fips"(which should have been strings), was a list. And therefore he created N*N extra values (whereby N == the amount of records).
To solve this, I had to parse, each column separately, and check if it was a string value or not. If it wasn't a string value, I parsed it to a string.
Last but not least, I created / concatenated all the column vectors together

           sh = df.shape
            rlc_ordDict = [0] * sh[1]

            for index,column in enumerate(df):
                r_vector = pandas2ri.py2ri(df[column])

                # If the type of the column is a string
                if type(r_vector) == rinterface.StrSexpVector:
                    rlc_ordDict[index] = (column, r_vector)

                # If the type of a column is not a string, make it a string 
                else:
                    r_vector = pandas2ri.py2ri(df[column].apply(str))
                    rlc_ordDict[index] = (column, r_vector)

            od = rlc.OrdDict(rlc_ordDict)
            df = robjects.DataFrame(od)

How to convert a rpy2 matrix object into a Pandas data frame?

May be it should happen automatically during conversion, but in the meantime row and column names can easily be obtained from the R object and added to the pandas DataFrame. For example the column names for the R matrix should be at: https://rpy2.github.io/doc/v2.9.x/html/vector.html#rpy2.robjects.vectors.Matrix.colnames

pandas and rpy2: Why does ezANOVA work via robjects.r but not robjects.packages.importr?

In the easy version you are passing symbol names as strings. This is not the same as a symbol.

Check the use of as_symbol in Minimal example of rpy2 regression using pandas data frame

Regression by group and display output in python

I will show some mockup so you can build the rest. It is mainly pulling up a your custom regression function and passing the dataframe in using apply.

let me know what you think.

import pandas as pd
import statsmodels.api as sm 

def GroupRegress(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params

import pandas as pd
df = pd.DataFrame({'group': [1,1,1,2,2,2], 
                   'Y': [9,5,3,1,2,3],
                  'X': [3,4,1,6,4,9]
                  })
df

df.groupby('group').apply(GroupRegress, 'Y', ['X'])

Result below:

X   intercept
group       
1   1.000000    3.0
2   0.236842    0.5

Running R's aov() mixed effects model from Python using rpy2

[Voting up just because you have a nice small and self-contained example.]

The R equivalent of what you are doing with rpy2 is the following (and returns the same error)

> mixed <- aov("result ~ group*session + covar + Error(as.factor(subject)/session)",data=df)
Error: $ operator is invalid for atomic vectors

Formula objects are different than strings.

> class(y ~ x)
[1] "formula"
> class("y ~ x")
[1] "character"

rpy2 has a constructor to build R formulae from Python strings:

from rpy2.robjects import Formula
fml = Formula("y ~ x")

Pass this to aov() instead of the string.

Minimal Example of Rpy2 Regression Using Pandas Data Frame