Listing Contents of an R Data File Without Loading

Examining contents of .rdata file by attaching into a new environment - possible?

You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.

x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10

Get specific object from Rdata file

.RData files don't have an index (the contents are serialized as one big pairlist). You could hack a way to go through the pairlist and assign only entries you like, but it's not easy since you can't do it at the R level.

However, you can simply convert the .RData file into a lazy-load database which serializes each entry separately and creates an index. The nice thing is that the loading will be on-demand:

# convert .RData -> .rdb/.rdx
e = local({load("New.RData"); environment()})
tools:::makeLazyLoadDB(e, "New")

Loading the DB then only loads the index but not the contents. The contents are loaded as they are used:

lazyLoad("New")
ls()
x # if you had x in the New.RData it will be fetched now from New.rdb

Just like with load() you can specify an environment to load into so you don't need to pollute the global workspace etc.

Can I access R data objects' attributes without fully loading objects from file?

You can't "really" do it, but you could modify the code in my cgwtools::lsdata function.

function (fnam = ".Rdata") 
{
x <- load(fnam, envir = environment())
return(x)
}

This loads, thus taking time and briefly taking memory, and then the local environment disappears. So, add an argument for the items you want to check attributes for, add a line inside the function which does attributes(your_items) ->y ; return (list(x=x,y=y))

load .Rdata file into list()

Using load inside mget with other envir=onment than the .GlobalEnv.

d1 <- d2 <- d3  <- d4 <- data.frame()
save(d1, d2, d3, d4, file="test.rda")
rm(d1, d2, d3, d4)

x <- mget(load("test.rda", envir=(NE. <- new.env())), envir=NE.)
ls()
# [1] "NE." "x"
x
# $d1
# data frame with 0 columns and 0 rows
#
# $d2
# data frame with 0 columns and 0 rows
#
# $d3
# data frame with 0 columns and 0 rows
#
# $d4
# data frame with 0 columns and 0 rows

R show variable list/header of Stata or SAS file in R without loading the complete dataset

See below for the dta solution, which you can update to SAS using read_sas.

library(haven)

# read in first row of dta
dta_head <- read_dta("my_data.dta",
n_max = 1)

# get variable names of dta
dta_names <- names(dta_head)

After examining the names and labels of your dta file, you can then remove the n_max = 1 option and read in full while possibly adding the col_select option specifying the subset of variables you wish to read in.

Read datasets from .Rdata file in loop

I would argue that this is possible, but would require some parallel processing capabilities. Each worker would load the .RData file and output the desired object. Merging the result would probably be pretty straightforward.

I can't provide code for your data because I don't know the structure, but I would do something along the lines of the below chunk'o'code. Note that I'm on Windows and your workflow may differ. You should not be short on computer memory. Also, snowfall is not the only interface to use multiple cores.

# load library snowfall and set up working directory
# to where the RData files are
library(snowfall)
working.dir <- "/path/to/dir/with/files"
setwd(working.dir)

# initiate (redneck jargon: and then she ate) workers and export
# working directory. Working directory could be hard coded into
# the function, rendering this step moot
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
sfExport(list = c("working.dir")) # you need to export all variables but x

# read filenames and step through each, returning only the
# desired object
lofs <- list.files(pattern = ".RData")
inres <- sfSapply(x = lofs, fun = function(x, wd = working.dir) {
setwd(wd)
load(x)
return(Dataset_of_interest)
}, simplify = FALSE)
sfStop()

# you could post-process the data by rbinding, cbinding, cing...
result <- do.call("rbind", inres)


Related Topics



Leave a reply



Submit