Loading .RData files into Python
People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData
file format. So any other implementation in any other language is hard++.
I think the only reasonable way is to install RPy2 and use R's load
function from that, converting to appropriate python objects as you go. The .RData
file can contain structured objects as well as plain tables so watch out.
Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/
Quicky:
>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")
objects are now loaded into the R workspace.
>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]
That's a simple scalar, d is a data frame, I can subset to get columns:
>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[ 1, 2, 3, ..., 8, 9, 10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]
How to load R's .rdata files into Python?
There is a new python package pyreadr that makes very easy import RData and Rds files into python:
import pyreadr
result = pyreadr.read_r('mtcars_nms.rdata')
mtcars = result['mtcars_nms']
It does not depend on having R or other external dependencies installed.
It is a wrapper around the C library librdata, therefore it is very fast.
You can install it very easily with pip:
pip install pyreadr
The repo is here: https://github.com/ofajardo/pyreadr
Disclaimer: I am the developer.
Reading .RData files into python using rpy2
Ok, it seems I have understood the issue here.
whilst specifying the path to the .RData file, I specified the path using the standard windows ("\") directory separator which r.load() (obviously) didn't recognize the path. But when I use the "/" directory separator, the .rdata file was loaded successfully.
How to access a matrix in an .Rdata file in Python using rpy2
You can find the solution in two other Stack Overflow questions/answers: this shows how to load a variable from an RData file, and this shows how to convert an R matrix to a numpy array.
Combined, the solution looks like this:
import rpy2.robjects as robjects
import numpy as np
# load your file
robjects.r['load']('fname.RData')
# retrieve the matrix that was loaded from the file
matrix = robjects.r['fname']
# turn the R matrix into a numpy array
a = np.array(matrix)
print a
For instance, if you'd started by running the following code in R:
fname <- matrix(1:9, nrow = 3)
save(fname, file = "fname.RData")
The above Python code would print:
[[1 4 7]
[2 5 8]
[3 6 9]]
Related Topics
Socketserver.Threadingtcpserver - Cannot Bind to Address After Program Restart
Parallel Processing from a Command Queue on Linux (Bash, Python, Ruby... Whatever)
Parsing Date/Time String with Timezone Abbreviated Name in Python
How to Print a String at a Fixed Width
Type Hints: Solve Circular Dependency
Reading an Excel File in Python Using Pandas
Why Are Packages Installed Rather Than Just Linked to a Specific Environment
How to Add Multiple Columns to Pandas Dataframe in One Assignment
Python in Raw Mode Stdin Print Adds Spaces
Financial Charts/Graphs in Ruby or Python
Pandas: Rolling Mean by Time Interval
Python: Problem with Raw_Input Reading a Number
Plotting a 3D Cube, a Sphere and a Vector in Matplotlib