Load dataset from R package using data(), assign it directly to a variable?
Using a helper function:
# define this function
getdata <- function(...)
{
e <- new.env()
name <- data(..., envir = e)[1]
e[[name]]
}
# now load your data calling getdata()
x <- getdata("faithful")
Or using an anonymous function:
x <- (function(...)get(data(...,envir = new.env())))("faithful")
Lazy evaluation
You should also consider lazy loading
your data adding LazyData: true
in the DESCRIPTION file of your package.
If you use RStudio
, after running data("faithful")
, you'll see at the Environment
panel that the "faithful" data.frame is called "promise"
(another less common name is "thunk"
) and is greyed out. That means that it is lazily evaluated by R and not still loaded into memory. You can even lazy load the "x"
variable with the delayedAssign()
function:
data("faithful") # lazy load "faithful"
delayedAssign("x", faithful) # lazy assign "x" with a reference to "faithful"
rm(faithful) # remove "faithful"
Still nothing has been loaded into memory yet
summary(x) # now x has been loaded and evaluated
Learn more about lazy evaluation
here.
How to load and use data file with R whose name is in a variable?
If you're trying to figure out how to access data programmatically when you just have the objects name in a character vector you can use get
.
library(ChIPpeakAnno)
assembly <- 'TSS.human.NCBI36'
data(list=c(assembly))
# Now store the data into 'dat'
dat <- get(assembly)
# Now you can use 'dat' anywhere you would normally use TSS.human.NCBI36
head(start(dat))
#[1] 1873 4274 20229 24417 24417 42912
head(start(TSS.human.NCBI36))
#[1] 1873 4274 20229 24417 24417 42912
R: Using the names function on a dataset created within a loop
A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files
is the vector of file names (as you imply above):
import <- lapply(files, read.csv, header=FALSE)
Then if you want to operate on each data.frame in the list using a loop, you easily can:
for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')
How can I load an object into a variable name that I specify from an R data file?
If you're just saving a single object, don't use an .Rdata
file, use an .RDS
file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
Load R datasets in dataframes
Calling data(AirPassengers)
adds a promise pointing to AirPassengers
to your global environment. Once you use the AirPassengers
object it will be loaded into your global environment. You can either just use the AirPassengers
object like you would any object, or you can copy it to another variable, e.g.:
data(AirPassengers)
dat <- AirPassengers
If you run class
on AirPassengers
you will see that AirPassengers
is not a data.frame. Try looking at the differences here:
class(AirPassengers)
# [1] "ts"
data(mtcars)
class(mtcars)
# [1] "data.frame"
summary(mtcars)
# mpg cyl disp hp
# Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
# 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
# Median :19.20 Median :6.000 Median :196.3 Median :123.0
# Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
# 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
# Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
# drat wt qsec vs
# Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
# 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
# Median :3.695 Median :3.325 Median :17.71 Median :0.0000
# Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
# 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
# Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
# am gear carb
# Min. :0.0000 Min. :3.000 Min. :1.000
# 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
# Median :0.0000 Median :4.000 Median :2.000
# Mean :0.4062 Mean :3.688 Mean :2.812
# 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
# Max. :1.0000 Max. :5.000 Max. :8.000
AirPassengers
is a specific type of vector called a time-series. Look at ?ts
for more information.
A function that returns a dataset
Give this a try
loadDataSet <- function(name, pkg) {
do.call("data", list(name,package=pkg))
return(get(name))
}
loadDataSet("acme", "boot")
R Function to import data set and pipeline create variables based on field name/existence
Solution
Simply use regex to change the column names:
temp_set <- read_table(input_path)
names(temp_set) <- gsub(x = names(temp_set), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX")
Or equivalently with `names<-`()
in the dplyr
workflow:
temp_set <- read_table(input_path) %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
Regex
The pattern = "^(.+)(\\d{4,4})$"
breaks each name into two capturing groups:
- Any prefix of positive length:
.+
- Some year comprised of
4
digits:\\d{4,4}
The replacement = "\\1XXXX"
then prepends the first group (\\1
) to the code (XXXX
); so the code essentially "replaces" the year.
Example
Here are two possible cases, where the MSAXXXX
column starts as MSA2003
and as MSA2013
respectively:
case_1 <- data.frame(
MSA2003 = c(41929, 33820, 27642, 88111),
var2019 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_1
#> MSA2003 var2019 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
case_2 <- data.frame(
MSA2013 = c(41929, 33820, 27642, 88111),
var2009 = c(41929, 33820, 27642, 88111),
other_var = 1:4
)
case_2
#> MSA2013 var2009 other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
Notice how the solution standardizes all variables with years in their names, yet leaves the other variables untouched:
library(dplyr)
case_1 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
case_2 %>%
`names<-`(gsub(x = names(.), pattern = "^(.+)(\\d{4,4})$", replacement = "\\1XXXX"))
#> MSAXXXX varXXXX other_var
#> 1 41929 41929 1
#> 2 33820 33820 2
#> 3 27642 27642 3
#> 4 88111 88111 4
Related Topics
Group_By() into Fill() Not Working as Expected
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
Rcurl: Http Authentication When Site Responds with Http 401 Code Without Www-Authenticate
Extracting Common Character Strings from Multiple Vectors of Different Lengths
Data.Table - Left Outer Join on Multiple Tables
Adding Percentages to a Grouped Barchart Columns in Ggplot2
Random Sampling to Give an Exact Sum
Calculate Average Over Multiple Data Frames
Replace Nas in One Variable with Values from Another Variable
Only Source Functions in a .R File
R Output Without [1], How to Nicely Format
R: Saving Ggplot2 Plots in a List
How to Add Abline with Lattice Xyplot Function
How to Combine Multiple .CSV Files in R
Scales = "Free" Works for Facet_Wrap But Doesn't for Facet_Grid
Separate a Column into 2 Columns at the Last Underscore in R