Read Multiple Xlsx Files with Multiple Sheets into One R Data Frame

Importing multiple .xlsx files with multiple sheets in R

Let's have 2 files with two worksheets each:

library(tidyverse)
library(readxl)

list.files("~/data", full.names = TRUE)
#> [1] "/home/rstudio/data/data.xlsx" "/home/rstudio/data/data2.xlsx"
read_excel("/home/rstudio/data/data.xlsx", sheet = 1)
#> # A tibble: 2 x 2
#> a `1`
#> <chr> <dbl>
#> 1 b 2
#> 2 c NA
read_excel("/home/rstudio/data/data.xlsx", sheet = 2)
#> # A tibble: 2 x 2
#> d `4`
#> <chr> <dbl>
#> 1 e 5
#> 2 f 6

expand_grid(
file = list.files("~/data", full.names = TRUE),
sheet = seq(2)
) %>%
transmute(data = file %>% map2(sheet, ~ read_excel(path = .x, sheet = .y))) %>%
pull(data)
#> [[1]]
#> # A tibble: 2 x 2
#> a `1`
#> <chr> <dbl>
#> 1 b 2
#> 2 c NA
#>
#> [[2]]
#> # A tibble: 2 x 2
#> d `4`
#> <chr> <dbl>
#> 1 e 5
#> 2 f 6
#>
#> [[3]]
#> # A tibble: 2 x 2
#> a `1`
#> <chr> <dbl>
#> 1 b 2
#> 2 c NA
#>
#> [[4]]
#> # A tibble: 2 x 2
#> d `4`
#> <chr> <dbl>
#> 1 e 5
#> 2 f 6

Created on 2021-11-11 by the reprex package (v2.0.1)

Reading all sheets in multiple excel files into R

You could try with readxl...

I've not tested this for the case of different workbooks with duplicate worksheet names.

There were a number of issues with your code:

  1. the list.files pattern included a . which is a reserved character so needs to be escaped with \\
  2. As @deschen pointed out the excel referring functions are from the openxlsx package
library(readxl)

files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder

for (i in seq_along(files.list)){

sheet_nm <- excel_sheets(files.list[i])

for (j in seq_along(sheet_nm)){

assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}

}

Created on 2022-01-31 by the reprex package (v2.0.1)

Read one worksheet from multiple excel files using purrr and readxl and add field

Supposing the two packs.xlsx files are in different subfolders:

library(readxl)

filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")

# add the filename as a field
xl$filename <- fn

# function return
xl
})

# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)

Read multiple xlsx files with multiple sheets into one R data frame

I would use a nested loop like this to go through each sheet of each file.
It might not be the fastest solution but it is the simplest.

require(xlsx)    
file.list <- list.files(recursive=T,pattern='*.xlsx') #get files list from folder

for (i in 1:length(files.list)){
wb <- loadWorkbook(files.list[i]) #select a file & load workbook
sheet <- getSheets(wb) #get sheet list

for (j in 1:length(sheet)){
tmp<-read.xlsx(files.list[i], sheetIndex=j, colIndex= c(1:6,8:10,12:17,19),
sheetName=NULL, startRow=4, endRow=NULL,
as.data.frame=TRUE, header=F)
if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp) #happend to previous

}
}

You can clean NA values after the loading phase.

Importing multiple excel sheets into one dataframe adding the sheet name as variable

Simply use bind_rows() in dplyr and set the arg .id = "sheet", then data in each sheet will be row-bind together and a new column named what you set in .id is added to record the sheet names which the data come from.

dplyr::bind_rows(
import_list("path/to/file/test.xlsx", setclass = "tbl"),
.id = "sheet"
)


Test

Write out an excel file with 2 sheets named AUS and AUT:

openxlsx::write.xlsx(
list(AUS = data.frame(x = 1:2, y = 3:4),
AUT = data.frame(x = 5:6, y = 7:8)),
file = "test.xlsx"
)

Then

dplyr::bind_rows(
rio::import_list("test.xlsx", setclass = "tbl"),
.id = "sheet"
)

# # A tibble: 4 × 3
# sheet x y
# <chr> <dbl> <dbl>
# 1 AUS 1 3
# 2 AUS 2 4
# 3 AUT 5 7
# 4 AUT 6 8


Related Topics



Leave a reply



Submit