How to Convert R Dataframe Back to Pandas Using Rpy2

rpy2 does not convert back to pandas

In R, when calling source() by default on a script without named functions, the returned object is a list of two named components, $value and $visible, where:

  • $value is the last displayed or defined object which in your case is the far_df data frame (which in R data.frame is a class object extending list type);
  • $visible is a boolean vector indicating if last object was displayed or not which in your case is TRUE. This would be FALSE had you ended script at far_df <- tidy.sts(surveil_ts_4_far).

In fact, your Python error confirms this output indicatating a list of [ListSexpVector, BoolSexpVector].

Therefore, since you only want the first item, index for first item accordingly by number or name.

r_raw = ro.r['source']('farrington.R')        # IN R: r_raw <- source('farrington.R')
r_df = r_raw[0] # IN R: r_df <- r_raw[1]
r_df = r_raw[r_raw.names.index('value')] # IN R: r_df <- r_raw$value

with localconverter(ro.default_converter + pandas2ri.converter):
pd_from_r_df = ro.conversion.rpy2py(r_df)

Use rpy2 with pandas dataframe

You are almost there. In order to run R functions, you need to convert the pandas Dataframe to R Dataframe. Once we have the R object we can call the functions as shown below.

import rpy2
from rpy2.robjects.packages import importr # import R's "base" package
base = importr('base')

from rpy2.robjects import pandas2ri # install any dependency package if you get error like "module not found"
pandas2ri.activate()

# Create pandas df
df = pd.DataFrame( np.random.randn(5,2), # 5 rows, 2 columns
columns = ["A","B"], # name of columns
index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )

# Convert pandas to r
r_df = pandas2ri.py2ri(df)
type(r_df)

#calling function under base package
print(base.summary(r_df))

Converting a Pandas DataFrame to R dataframe using Rpy2

Unfortunately, this is going to be difficult because the Python -> R transformation is better than it used to be, but isn't perfect, and is still hard on Windows currently, which it looks like you're using.

This is a bit of a hack, but as a work-around you might try setting the name and time variables while you are assigning the pd.DataFrame before you convert the DataFrame into R.

Once it's in R, you'll need to use R functions to operate on the data frame, rather than your python functions---even your getter and setter will need to be passed into the R environment in a way that looks more like this:

myfunct = robjects.r('''
f <- function(r, verbose=FALSE) {
if (verbose) {
cat("I am calling f().\n")
}
2 * pi * r
}
f(3)
''')

from here.

But just to check that your DataFrame is being converted appropriately in the first place, you might start your debugging by running this:

import pandas as pd
import numpy as np
import pandas.rpy.common as com
from datetime import datetime

n = 10
df = pd.DataFrame({
"timestamp": [datetime.now() for t in range(n)],
"value": np.random.uniform(-1, 1, n)
})

r_dataframe = com.convert_to_r_dataframe(df)
print(r_dataframe)

Is that producing something that looks like an R print statement of a dataframe, like so

>>>             timestamp        value
0 2014-06-03 15:02:20 -0.36672....
1 2014-06-03 15:02:20 -0.89136....
2 2014-06-03 15:02:20 0.509215....
3 2014-06-03 15:02:20 0.862909....
4 2014-06-03 15:02:20 0.389879....
5 2014-06-03 15:02:20 -0.80607....
6 2014-06-03 15:02:20 -0.97116....
7 2014-06-03 15:02:20 0.376419....
8 2014-06-03 15:02:20 0.848243....
9 2014-06-03 15:02:20 0.446798....

Example peeled from here and here.

rpy2 How to assign R dataframe to value/values

One way to achieve what you want is:

r_df[r_df.colnames.index('col1')] = base.as_Date(r_df.rx2('col1'), '%Y-%m-%d')

Why is something like r_df['col1'] not implemented? Because R can be peculiar, and a lot of choices in rpy2 prefer a slight annoyance to a source of very hard-to-debug issues. Here this is because column names in an R data frame are not enforced to be unique, and getting an item by name will return the first one with that name. For example:

import rpy2.robjects as ro
dataf = ro.r('data.frame(x=1:3, x=4:6, check.names=FALSE)')

print(dataf)
# x x
# 1 1 4
# 2 2 5
# 3 3 6

dataf.rx2('x')
# R object with classes: ('RTYPES.INTSXP',) mapped to:
# [1, 2, 3]

The Python method index is present in Python list, tuple, etc... and is documented to return the first matching index.



Related Topics



Leave a reply



Submit