Load a Dataset into R with Data() Using a Variable Instead of the Dataset Name

Load dataset from R package using data(), assign it directly to a variable?

Using a helper function:

# define this function
getdata <- function(...)
{
e <- new.env()
name <- data(..., envir = e)[1]
e[[name]]
}

# now load your data calling getdata()
x <- getdata("faithful")

Or using an anonymous function:

x <- (function(...)get(data(...,envir = new.env())))("faithful")

Lazy evaluation

You should also consider lazy loading your data adding LazyData: true in the DESCRIPTION file of your package.

If you use RStudio, after running data("faithful"), you'll see at the Environment panel that the "faithful" data.frame is called "promise" (another less common name is "thunk") and is greyed out. That means that it is lazily evaluated by R and not still loaded into memory. You can even lazy load the "x" variable with the delayedAssign() function:

data("faithful")              # lazy load "faithful"
delayedAssign("x", faithful) # lazy assign "x" with a reference to "faithful"
rm(faithful) # remove "faithful"

Still nothing has been loaded into memory yet

summary(x)                    # now x has been loaded and evaluated

Learn more about lazy evaluation here.

How to load and use data file with R whose name is in a variable?

If you're trying to figure out how to access data programmatically when you just have the objects name in a character vector you can use get.

library(ChIPpeakAnno)
assembly <- 'TSS.human.NCBI36'
data(list=c(assembly))

# Now store the data into 'dat'
dat <- get(assembly)
# Now you can use 'dat' anywhere you would normally use TSS.human.NCBI36
head(start(dat))
#[1] 1873 4274 20229 24417 24417 42912
head(start(TSS.human.NCBI36))
#[1] 1873 4274 20229 24417 24417 42912

R: Using the names function on a dataset created within a loop

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above):

import <- lapply(files, read.csv, header=FALSE)

Then if you want to operate on each data.frame in the list using a loop, you easily can:

for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')

How can I load an object into a variable name that I specify from an R data file?

If you're just saving a single object, don't use an .Rdata file, use an .RDS file:

x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)

Load R datasets in dataframes

Calling data(AirPassengers) adds a promise pointing to AirPassengers to your global environment. Once you use the AirPassengers object it will be loaded into your global environment. You can either just use the AirPassengers object like you would any object, or you can copy it to another variable, e.g.:

data(AirPassengers)
dat <- AirPassengers

If you run class on AirPassengers you will see that AirPassengers is not a data.frame. Try looking at the differences here:

class(AirPassengers)
# [1] "ts"
data(mtcars)
class(mtcars)
# [1] "data.frame"
summary(mtcars)
# mpg cyl disp hp
# Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
# 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
# Median :19.20 Median :6.000 Median :196.3 Median :123.0
# Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
# 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
# Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
# drat wt qsec vs
# Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
# 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
# Median :3.695 Median :3.325 Median :17.71 Median :0.0000
# Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
# 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
# Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
# am gear carb
# Min. :0.0000 Min. :3.000 Min. :1.000
# 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
# Median :0.0000 Median :4.000 Median :2.000
# Mean :0.4062 Mean :3.688 Mean :2.812
# 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
# Max. :1.0000 Max. :5.000 Max. :8.000

AirPassengers is a specific type of vector called a time-series. Look at ?ts for more information.

A function that returns a dataset

Give this a try

loadDataSet <- function(name, pkg) {
do.call("data", list(name,package=pkg))
return(get(name))
}

loadDataSet("acme", "boot")

R Function to import data set and pipeline create variables based on field name/existence

Solution

Simply use regex to change the column names:

temp_set <- read_table(input_path)

names(temp_set) <- gsub(x = names(temp_set), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX")

Or equivalently with `names<-`() in the dplyr workflow:

temp_set <- read_table(input_path) %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))

Regex

The pattern = "^(.+)(\\d{4,4})$" breaks each name into two capturing groups:

  1. Any prefix of positive length: .+
  2. Some year comprised of 4 digits: \\d{4,4}

The replacement = "\\1XXXX" then prepends the first group (\\1) to the code (XXXX); so the code essentially "replaces" the year.

Example

Here are two possible cases, where the MSAXXXX column starts as MSA2003 and as MSA2013 respectively:

case_1 <- data.frame(
MSA2003 = c(41929, 33820, 27642, 88111),
var2019 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_1
#> MSA2003 var2019 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4

case_2 <- data.frame(
MSA2013 = c(41929, 33820, 27642, 88111),
var2009 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_2
#> MSA2013 var2009 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4

Notice how the solution standardizes all variables with years in their names, yet leaves the other variables untouched:

library(dplyr)

case_1 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4

case_2 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4


Related Topics



Leave a reply



Submit