How can I read multiple (excel) files into R?
With list.files
you can create a list of all the filenames in your workingdirectory. Next you can use lapply
to loop over that list and read each file with the read_excel
function from the readxl
package:
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
This method can off course also be used with other file reading functions like read.csv
or read.table
. Just replace read_excel
with the appropriate file reading function and make sure you use the correct pattern in list.files
.
If you also want to include the files in subdirectories, use:
file.list <- list.files(pattern='*.xlsx', recursive = TRUE)
Other possible packages for reading Excel-files: openxlsx & xlsx
Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows
from dplyr:
library(dplyr)
df <- bind_rows(df.list, .id = "id")
or with rbindlist
from data.table:
library(data.table)
df <- rbindlist(df.list, idcol = "id")
Both have the option to add a id
column for identifying the separate datasets.
Update: If you don't want a numeric identifier, just use sapply
with simplify = FALSE
to read the files in file.list
:
df.list <- sapply(file.list, read.csv, simplify=FALSE)
When using bind_rows
from dplyr or rbindlist
from data.table, the id
column now contains the filenames.
Even another approach is using the purrr
-package:
library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names
df <- map_df(file.list, read.csv, .id = "id")
Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)
Now you can bind the list of dataframes together in one dataframe with rbindlist
from data.table or bind_rows
from dplyr. The id
column will now contain the filenames instead of a numeric indentifier.
R: Importing multiple excel files at once
Here what I like to do in this type of situation:
purrr::map_df(
.x = list.files(pattern = ".xlsx"),
.f = read_excel,
.id = "source"
)
Reading all sheets in multiple excel files into R
You could try with readxl
...
I've not tested this for the case of different workbooks with duplicate worksheet names.
There were a number of issues with your code:
- the list.files pattern included a
.
which is a reserved character so needs to be escaped with\\
- As @deschen pointed out the excel referring functions are from the
openxlsx
package
library(readxl)
files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder
for (i in seq_along(files.list)){
sheet_nm <- excel_sheets(files.list[i])
for (j in seq_along(sheet_nm)){
assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}
}
Created on 2022-01-31 by the reprex package (v2.0.1)
Read one worksheet from multiple excel files using purrr and readxl and add field
Supposing the two packs.xlsx files are in different subfolders:
library(readxl)
filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")
# add the filename as a field
xl$filename <- fn
# function return
xl
})
# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)
Read in Multiple Excel Files with Different Layouts in R
So, the easiest solution I can think of here is something like this.
First, import the xlsx
files with colNames = FALSE
like so:
data <- files %>%
map(~readWorkbook(file.path(getwd(), .), sheet = "Sheet1", colNames = FALSE))
Now all you need to do is
- remove the first row if it contains "Result" in the first column
- assign each xlsx
file to its own data frame (optional)
- set the column names for each of these files (optional)
This can be done like so:
for(i in 1:length(data)){
data[[i]] %<>% filter(X1 != "Result") #Alternatively data[[i]] <- data[[i]] %>% filter(X1 != "Result")
assign(paste0("FileName", i), as.data.frame(data[[i]]))
names(paste0("FileName", i)) <- c("Names", "For", "Your", "Columns")
}
Please note the usage of the reverse pipe %<>%
(from the package magrittr
) in the first statement inside the for loop.
Note: this will remove any and all rows that contain the string "Result" in the first column.
Related Topics
What Are the Differences Between "=" and "≪-" Assignment Operators
How to Debug "Contrasts Can Be Applied Only to Factors With 2 or More Levels" Error
Error in If/While (Condition) {: Missing Value Where True/False Needed
How to Select the Rows With Maximum Values in Each Group With Dplyr
What Specifically Are the Dangers of Eval(Parse(...))
Split Delimited Strings in a Column and Insert as New Rows
Find Complement of a Data Frame (Anti - Join)
Reshape Three Column Data Frame to Matrix ("Long" to "Wide" Format)
How to Implement Coalesce Efficiently in R
Combine Two Data Frames by Rows (Rbind) When They Have Different Sets of Columns
Why Is '[' Better Than 'Subset'
Aggregating by Unique Identifier and Concatenating Related Values into a String
Subset Rows Corresponding to Max Value by Group Using Data.Table
Is R'S Apply Family More Than Syntactic Sugar
Can Lists Be Created That Name Themselves Based on Input Object Names