How can I read multiple (excel) files into R?
With list.files
you can create a list of all the filenames in your workingdirectory. Next you can use lapply
to loop over that list and read each file with the read_excel
function from the readxl
package:
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
This method can off course also be used with other file reading functions like read.csv
or read.table
. Just replace read_excel
with the appropriate file reading function and make sure you use the correct pattern in list.files
.
If you also want to include the files in subdirectories, use:
file.list <- list.files(pattern='*.xlsx', recursive = TRUE)
Other possible packages for reading Excel-files: openxlsx & xlsx
Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows
from dplyr:
library(dplyr)
df <- bind_rows(df.list, .id = "id")
or with rbindlist
from data.table:
library(data.table)
df <- rbindlist(df.list, idcol = "id")
Both have the option to add a id
column for identifying the separate datasets.
Update: If you don't want a numeric identifier, just use sapply
with simplify = FALSE
to read the files in file.list
:
df.list <- sapply(file.list, read.csv, simplify=FALSE)
When using bind_rows
from dplyr or rbindlist
from data.table, the id
column now contains the filenames.
Even another approach is using the purrr
-package:
library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names
df <- map_df(file.list, read.csv, .id = "id")
Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)
Now you can bind the list of dataframes together in one dataframe with rbindlist
from data.table or bind_rows
from dplyr. The id
column will now contain the filenames instead of a numeric indentifier.
Reading all sheets in multiple excel files into R
You could try with readxl
...
I've not tested this for the case of different workbooks with duplicate worksheet names.
There were a number of issues with your code:
- the list.files pattern included a
.
which is a reserved character so needs to be escaped with\\
- As @deschen pointed out the excel referring functions are from the
openxlsx
package
library(readxl)
files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder
for (i in seq_along(files.list)){
sheet_nm <- excel_sheets(files.list[i])
for (j in seq_along(sheet_nm)){
assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}
}
Created on 2022-01-31 by the reprex package (v2.0.1)
Read one worksheet from multiple excel files using purrr and readxl and add field
Supposing the two packs.xlsx files are in different subfolders:
library(readxl)
filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")
# add the filename as a field
xl$filename <- fn
# function return
xl
})
# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)
Read in Multiple Excel Files with Different Layouts in R
So, the easiest solution I can think of here is something like this.
First, import the xlsx
files with colNames = FALSE
like so:
data <- files %>%
map(~readWorkbook(file.path(getwd(), .), sheet = "Sheet1", colNames = FALSE))
Now all you need to do is
- remove the first row if it contains "Result" in the first column
- assign each xlsx
file to its own data frame (optional)
- set the column names for each of these files (optional)
This can be done like so:
for(i in 1:length(data)){
data[[i]] %<>% filter(X1 != "Result") #Alternatively data[[i]] <- data[[i]] %>% filter(X1 != "Result")
assign(paste0("FileName", i), as.data.frame(data[[i]]))
names(paste0("FileName", i)) <- c("Names", "For", "Your", "Columns")
}
Please note the usage of the reverse pipe %<>%
(from the package magrittr
) in the first statement inside the for loop.
Note: this will remove any and all rows that contain the string "Result" in the first column.
Iteratively read, manipulate multiple excel files and append them into one dataframe using R
You could try:
library(fs)
library(readxl)
file_paths = list.files("./test/", pattern = "*.xlsx")
df = data.frame()
for(i in file_paths){
df_temp = read_xlsx(path=paste0("./test/", i))
df_temp$`1` = names(df_temp)[2]
names(df_temp) = c("date", "value", "name")
df = rbind(df, df_temp)
}
rm(df_temp)
Output:
> df
# A tibble: 21 x 3
date value name
<dttm> <dbl> <chr>
1 2021-01-07 00:00:00 76.5 J01-J05
2 2021-01-08 00:00:00 93.5 J01-J05
3 2021-01-15 00:00:00 305 J01-J05
4 2021-01-22 00:00:00 -289 J01-J05
5 2021-01-29 00:00:00 -242. J01-J05
6 2021-02-05 00:00:00 -266 J01-J05
7 2021-02-10 00:00:00 -240. J01-J05
8 2021-01-07 00:00:00 323 J01-J09
9 2021-01-08 00:00:00 318. J01-J09
10 2021-01-15 00:00:00 528. J01-J09
# ... with 11 more rows
Update, with function:
read_excel = function(name) {
df_temp = read_xlsx(path=paste0("./test/", name))
df_temp$`1` = names(df_temp)[2]
names(df_temp) = c("date", "value", "name")
return(df_temp)
}
df = do.call(rbind, lapply(file_paths, read_excel))
Related Topics
Loop Character Values in Ggtitle
Create an Expression from a Function for Data.Table to Eval
Pandoc Insert Appendix After Bibliography
Enter New Column Names as String in Dplyr's Rename Function
Using ':=' in Data.Table to Sum the Values of Two Columns in R, Ignoring Nas
How to Extract Sheet Names from Excel File in R
Grouped Barplot with Cut Y Axis
How to Read \" Double-Quote Escaped Values with Read.Table in R
Subtracting Values Group-Wise by the Average of Each Group in R
R Reading in a Zip Data File Without Unzipping It
Return Most Frequent String Value for Each Group
R Convert Between Zoo Object and Data Frame, Results Inconsistent for Different Numbers of Columns
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
Combining Multiple Complex Plots as Panels in a Single Figure