Loading Many Files at Once

Loading many files at once?

lapply works, but you have to specify that you want the objects loaded to the .GlobalEnv otherwise they're loaded into the temporary evaluation environment created (and destroyed) by lapply.

lapply(file_names,load,.GlobalEnv)

Loading multiple files into R at the same time (with similar file names)

One solution is to parse the file names and assign them as names to elements in a list of data frames. We'll use some sample data that has monthly sales for beer brands across two years that were saved as CSV files into two subdirectories, year1 and year2.

We will use lapply() to read the files into a list of data frames, and then use the names() function to name each element by appending year<x>. to the file name (excluding .csv).

fileList <- c("year1/beer.csv","year2/beer.csv")

data <- lapply(fileList,function(x){
read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")

theNames <- unlist(lapply(fileNameTokens,function(x){
paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]

...and the output.

> data[["year1.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>

Next, we'll print the first few rows of the second file.

> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
Month Item Sales
1 1 Budweiser 23847
2 2 Budweiser 33847
3 3 Budweiser 44400
4 4 Budweiser 35333
5 5 Budweiser 18710
6 6 Budweiser 63108
>

If one needs to access the files directly without relying on the list() names, they can be assigned to the parent environment within the lapply() function via the assign() function, as noted in the other answer.

# alternate form, assigning directly to parent environment

data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>

The technique also works with RDS files as follows.

data <- lapply(fileList,function(x){
# x is the filename, parse into strings to generate data set name
fileNameTokens <- unlist(strsplit(x,"/|[.]"))
assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
Month Item Sales
1 1 Budweiser 83047
2 2 Budweiser 38374
3 3 Budweiser 47287
4 4 Budweiser 18417
5 5 Budweiser 23981
6 6 Budweiser 55471
>

Loading multiple .RData and binding into a single data.frame

You could use get() to return the data from the calling environment or alternatively load them into a new environment and bind them afterwards. Note that .Rdata files can contain multiple objects but assuming these objects are all conformable, you could do:

library(purrr)
library(dplyr)

df1 <- data.frame(X = 1:10)
df2 <- data.frame(X = 1:10)

save(df1, file = "df1.RData", compress = "xz")
save(df2, file = "df2.RData", compress = "xz")

list.files(pattern = "\\.RData$") %>%
map_df(~ get(load(file = .x)))

X
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 1
12 2
13 3
14 4
15 5
16 6
17 7
18 8
19 9
20 10

Or:

temp_env <- new.env()

list.files(pattern = "\\.RData$") %>%
map(~load(file = .x, envir = temp_env))

bind_rows(as.list(temp_env))

how to load multiple files into one file using informatica

Use indirect file load using a list of files to load all files together. Then use sorter on col2 to order the data. Finally use a target file to store data.

Whole mapping should be like this -

SQ --> EXP--> SRT(key = col2) --> Target

Few things to note -

  • In the session, use indirect file and use a list file name - mention filelist1.txt
  • Use ls -1 file* >filelist1.txt in pre session command task to create a file list with all required files.
  • Expression transformation- convert the col2 to INTEGER if its coming up as string in SQ.
  • Sorter transformation- use col2 as key column.

How to load multiple csv files into seperate objects(dataframes) in R based on filename?

Solution for anyone curious...

files <- list.files(pattern = ".*csv")

for(file in 1:length(files)) {
file_name <- paste(c("file00",file), collapse = " ")
file_name <- gsub(" ", "", file_name, fixed = TRUE)
ex_file_name <- paste(c("exfile00",file), collapse = " ")
ex_file_name <- gsub(" ", "", ex_file_name, fixed = TRUE)

file_object <- read.csv(file = paste(file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
exfile_object <- read.csv(file = paste(ex_file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
}

Essentially build the filename within the loop, then passs it to the readcsv function on each iteration.



Related Topics



Leave a reply



Submit