How to Read in Multiple ".Xlsx" Files to R

How can I read multiple (excel) files into R?

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

If you also want to include the files in subdirectories, use:

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

Other possible packages for reading Excel-files: openxlsx & xlsx


Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from dplyr:

library(dplyr)
df <- bind_rows(df.list, .id = "id")

or with rbindlist from data.table:

library(data.table)
df <- rbindlist(df.list, idcol = "id")

Both have the option to add a id column for identifying the separate datasets.


Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

df.list <- sapply(file.list, read.csv, simplify=FALSE)

When using bind_rows from dplyr or rbindlist from data.table, the id column now contains the filenames.

Even another approach is using the purrr-package:

library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names

df <- map_df(file.list, read.csv, .id = "id")

Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:

# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.

Reading all sheets in multiple excel files into R

You could try with readxl...

I've not tested this for the case of different workbooks with duplicate worksheet names.

There were a number of issues with your code:

  1. the list.files pattern included a . which is a reserved character so needs to be escaped with \\
  2. As @deschen pointed out the excel referring functions are from the openxlsx package
library(readxl)

files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder

for (i in seq_along(files.list)){

sheet_nm <- excel_sheets(files.list[i])

for (j in seq_along(sheet_nm)){

assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}

}

Created on 2022-01-31 by the reprex package (v2.0.1)

Read one worksheet from multiple excel files using purrr and readxl and add field

Supposing the two packs.xlsx files are in different subfolders:

library(readxl)

filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")

# add the filename as a field
xl$filename <- fn

# function return
xl
})

# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)

Read in Multiple Excel Files with Different Layouts in R

So, the easiest solution I can think of here is something like this.

First, import the xlsx files with colNames = FALSE like so:

data <- files %>%
map(~readWorkbook(file.path(getwd(), .), sheet = "Sheet1", colNames = FALSE))

Now all you need to do is
- remove the first row if it contains "Result" in the first column
- assign each xlsx file to its own data frame (optional)
- set the column names for each of these files (optional)

This can be done like so:

for(i in 1:length(data)){
data[[i]] %<>% filter(X1 != "Result") #Alternatively data[[i]] <- data[[i]] %>% filter(X1 != "Result")
assign(paste0("FileName", i), as.data.frame(data[[i]]))
names(paste0("FileName", i)) <- c("Names", "For", "Your", "Columns")
}

Please note the usage of the reverse pipe %<>% (from the package magrittr) in the first statement inside the for loop.

Note: this will remove any and all rows that contain the string "Result" in the first column.

Iteratively read, manipulate multiple excel files and append them into one dataframe using R

You could try:

library(fs)
library(readxl)

file_paths = list.files("./test/", pattern = "*.xlsx")

df = data.frame()

for(i in file_paths){
df_temp = read_xlsx(path=paste0("./test/", i))
df_temp$`1` = names(df_temp)[2]
names(df_temp) = c("date", "value", "name")
df = rbind(df, df_temp)
}

rm(df_temp)

Output:

> df
# A tibble: 21 x 3
date value name
<dttm> <dbl> <chr>
1 2021-01-07 00:00:00 76.5 J01-J05
2 2021-01-08 00:00:00 93.5 J01-J05
3 2021-01-15 00:00:00 305 J01-J05
4 2021-01-22 00:00:00 -289 J01-J05
5 2021-01-29 00:00:00 -242. J01-J05
6 2021-02-05 00:00:00 -266 J01-J05
7 2021-02-10 00:00:00 -240. J01-J05
8 2021-01-07 00:00:00 323 J01-J09
9 2021-01-08 00:00:00 318. J01-J09
10 2021-01-15 00:00:00 528. J01-J09
# ... with 11 more rows

Update, with function:

read_excel = function(name) {
df_temp = read_xlsx(path=paste0("./test/", name))
df_temp$`1` = names(df_temp)[2]
names(df_temp) = c("date", "value", "name")
return(df_temp)
}

df = do.call(rbind, lapply(file_paths, read_excel))


Related Topics



Leave a reply



Submit