Load a Dataset into R with Data() Using a Variable Instead of the Dataset Name

Load dataset from R package using data(), assign it directly to a variable?

Using a helper function:

# define this function
getdata <- function(...)
{
    e <- new.env()
    name <- data(..., envir = e)[1]
    e[[name]]
}

# now load your data calling getdata()
x <- getdata("faithful")

Or using an anonymous function:

x <- (function(...)get(data(...,envir = new.env())))("faithful")

Lazy evaluation

You should also consider lazy loading your data adding LazyData: true in the DESCRIPTION file of your package.

If you use RStudio, after running data("faithful"), you'll see at the Environment panel that the "faithful" data.frame is called "promise" (another less common name is "thunk") and is greyed out. That means that it is lazily evaluated by R and not still loaded into memory. You can even lazy load the "x" variable with the delayedAssign() function:

data("faithful")              # lazy load "faithful"
delayedAssign("x", faithful)  # lazy assign "x" with a reference to "faithful"
rm(faithful)                  # remove "faithful"

Still nothing has been loaded into memory yet

summary(x)                    # now x has been loaded and evaluated

Learn more about lazy evaluation here.

How to load and use data file with R whose name is in a variable?

If you're trying to figure out how to access data programmatically when you just have the objects name in a character vector you can use get.

library(ChIPpeakAnno)
assembly <- 'TSS.human.NCBI36'
data(list=c(assembly)) 

# Now store the data into 'dat'
dat <- get(assembly)
# Now you can use 'dat' anywhere you would normally use TSS.human.NCBI36
head(start(dat))
#[1]  1873  4274 20229 24417 24417 42912
head(start(TSS.human.NCBI36))
#[1]  1873  4274 20229 24417 24417 42912

R: Using the names function on a dataset created within a loop

A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above):

import <- lapply(files, read.csv, header=FALSE)

Then if you want to operate on each data.frame in the list using a loop, you easily can:

for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')

How can I load an object into a variable name that I specify from an R data file?

If you're just saving a single object, don't use an .Rdata file, use an .RDS file:

x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)

Load R datasets in dataframes

Calling data(AirPassengers) adds a promise pointing to AirPassengers to your global environment. Once you use the AirPassengers object it will be loaded into your global environment. You can either just use the AirPassengers object like you would any object, or you can copy it to another variable, e.g.:

data(AirPassengers)
dat <- AirPassengers

If you run class on AirPassengers you will see that AirPassengers is not a data.frame. Try looking at the differences here:

class(AirPassengers)
# [1] "ts"
data(mtcars)
class(mtcars)
# [1] "data.frame"
summary(mtcars)
#     mpg             cyl             disp             hp       
# Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
# 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
# Median :19.20   Median :6.000   Median :196.3   Median :123.0  
# Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
# 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
# Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
# drat             wt             qsec             vs        
# Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
# 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
# Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
# Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
# 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
# Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
# am              gear            carb      
# Min.   :0.0000   Min.   :3.000   Min.   :1.000  
# 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
# Median :0.0000   Median :4.000   Median :2.000  
# Mean   :0.4062   Mean   :3.688   Mean   :2.812  
# 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
# Max.   :1.0000   Max.   :5.000   Max.   :8.000

AirPassengers is a specific type of vector called a time-series. Look at ?ts for more information.

A function that returns a dataset

Give this a try

loadDataSet <- function(name, pkg) {
      do.call("data", list(name,package=pkg))
      return(get(name))
    }

loadDataSet("acme", "boot")

R Function to import data set and pipeline create variables based on field name/existence

Solution

Simply use regex to change the column names:

temp_set <- read_table(input_path)

names(temp_set) <- gsub(x = names(temp_set), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX")

Or equivalently with `names<-`() in the dplyr workflow:

temp_set <- read_table(input_path) %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))

Regex

The pattern = "^(.+)(\\d{4,4})$" breaks each name into two capturing groups:

Any prefix of positive length: .+
Some year comprised of 4 digits: \\d{4,4}

The replacement = "\\1XXXX" then prepends the first group (\\1) to the code (XXXX); so the code essentially "replaces" the year.

Example

Here are two possible cases, where the MSAXXXX column starts as MSA2003 and as MSA2013 respectively:

case_1 <- data.frame(
  MSA2003 = c(41929, 33820, 27642, 88111),
  var2019 = c(41929, 33820, 27642, 88111),
  other_var = 1:4
)
case_1
#>   MSA2003 var2019 other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

case_2 <- data.frame(
  MSA2013 = c(41929, 33820, 27642, 88111),
  var2009 = c(41929, 33820, 27642, 88111),
  other_var = 1:4
)
case_2
#>   MSA2013 var2009 other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

Notice how the solution standardizes all variables with years in their names, yet leaves the other variables untouched:

library(dplyr)

case_1 %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#>   MSAXXXX varXXXX other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4

case_2 %>%
  `names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#>   MSAXXXX varXXXX other_var
#> 1   41929   41929         1
#> 2   33820   33820         2
#> 3   27642   27642         3
#> 4   88111   88111         4