Loading .Rdata Files into Python

Loading .RData files into Python

People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData file format. So any other implementation in any other language is hard++.

I think the only reasonable way is to install RPy2 and use R's load function from that, converting to appropriate python objects as you go. The .RData file can contain structured objects as well as plain tables so watch out.

Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

Quicky:

>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")

objects are now loaded into the R workspace.

>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]

That's a simple scalar, d is a data frame, I can subset to get columns:

>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[ 1, 2, 3, ..., 8, 9, 10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]

How to load R's .rdata files into Python?

There is a new python package pyreadr that makes very easy import RData and Rds files into python:

import pyreadr

result = pyreadr.read_r('mtcars_nms.rdata')

mtcars = result['mtcars_nms']

It does not depend on having R or other external dependencies installed.
It is a wrapper around the C library librdata, therefore it is very fast.

You can install it very easily with pip:

pip install pyreadr

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer.

Reading .RData files into python using rpy2

Ok, it seems I have understood the issue here.

whilst specifying the path to the .RData file, I specified the path using the standard windows ("\") directory separator which r.load() (obviously) didn't recognize the path. But when I use the "/" directory separator, the .rdata file was loaded successfully.

How to access a matrix in an .Rdata file in Python using rpy2

You can find the solution in two other Stack Overflow questions/answers: this shows how to load a variable from an RData file, and this shows how to convert an R matrix to a numpy array.

Combined, the solution looks like this:

import rpy2.robjects as robjects
import numpy as np

# load your file
robjects.r['load']('fname.RData')

# retrieve the matrix that was loaded from the file
matrix = robjects.r['fname']

# turn the R matrix into a numpy array
a = np.array(matrix)

print a

For instance, if you'd started by running the following code in R:

fname <- matrix(1:9, nrow = 3)
save(fname, file = "fname.RData")

The above Python code would print:

[[1 4 7]
[2 5 8]
[3 6 9]]


Related Topics



Leave a reply



Submit