How to add filename when importing multiple Excel files in R?
Assuming that your vector of file names, file.list
, looks something like this:
file.list <- c("20200101-foo1.xls", "20210101-foo2.xls")
The trick is to make df.list
a named list. You can use stringr::str_replace
to make the YYYYMMDD names. In this example, remove everything from the "-" onwards:
library(stringr)
file.list %>% str_replace("-.+", "")
[1] "20200101" "20210101"
So supply the names to the list like this:
names(df.list) <- file.list %>%
str_replace("-.+", "")
And now bind_rows
with the .id
argument will create a column with the names:
df <- bind_rows(df.list, .id = "filename")
Importing multiple Excel files with filenames in R
Figured it out myself. The key was to use rbind.fill
instead of rbind
.
library(plyr)
df.list <- lapply(filenames, function(x) read.xlsx(file=x, sheetIndex=1,
colIndex=1:4,as.data.frame=TRUE, header=FALSE, FILENAMEVAR=x))
final.df <- rbind.fill(df.list)
Import multiple Excel files with names in R as a list
You can do :
names(data.list) <- sub('\\.xlsx', '', basename(data))
Or without any regex :
names(data.list) <- tools::file_path_sans_ext(basename(data))
Read a folder of excel files and import individual sheets as separate df's in R with Names
You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
Read one worksheet from multiple excel files using purrr and readxl and add field
Supposing the two packs.xlsx files are in different subfolders:
library(readxl)
filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")
# add the filename as a field
xl$filename <- fn
# function return
xl
})
# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)
How can I read multiple (excel) files into R?
With list.files
you can create a list of all the filenames in your workingdirectory. Next you can use lapply
to loop over that list and read each file with the read_excel
function from the readxl
package:
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
This method can off course also be used with other file reading functions like read.csv
or read.table
. Just replace read_excel
with the appropriate file reading function and make sure you use the correct pattern in list.files
.
If you also want to include the files in subdirectories, use:
file.list <- list.files(pattern='*.xlsx', recursive = TRUE)
Other possible packages for reading Excel-files: openxlsx & xlsx
Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows
from dplyr:
library(dplyr)
df <- bind_rows(df.list, .id = "id")
or with rbindlist
from data.table:
library(data.table)
df <- rbindlist(df.list, idcol = "id")
Both have the option to add a id
column for identifying the separate datasets.
Update: If you don't want a numeric identifier, just use sapply
with simplify = FALSE
to read the files in file.list
:
df.list <- sapply(file.list, read.csv, simplify=FALSE)
When using bind_rows
from dplyr or rbindlist
from data.table, the id
column now contains the filenames.
Even another approach is using the purrr
-package:
library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names
df <- map_df(file.list, read.csv, .id = "id")
Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)
Now you can bind the list of dataframes together in one dataframe with rbindlist
from data.table or bind_rows
from dplyr. The id
column will now contain the filenames instead of a numeric indentifier.
How to import and merge multiple excel files in R?
The error message tells you everything. The path you're passing doesn't exist. Either you set your working directory to F:/Spring 2019/Thesis_data/Kam_Thesis/data/water_level
(?setwd
) or you need to be passing the full paths like F:/Spring 2019/Thesis_data/Kam_Thesis/data/water_level/barishal_sw183.xlsx
to read_excel
Related Topics
R Dynamically Build "List" in Data.Table (Or Ddply)
R Cannot Allocate Memory Though Memory Seems to Be Available
How to Remove Na Data in Only One Columns
Rename Columns in Multiple Dataframes, R
Why Does Lm Run Out of Memory While Matrix Multiplication Works Fine for Coefficients
How to Replace Multiple Values at Once
How to Filter on Partial Match Using Sparklyr
Specify Position of Geom_Text by Keywords Like "Top", "Bottom", "Left", "Right", "Center"
How to Multiply a Single Column in a Data.Frame by a Number
Fill in Na Based on the Last Non-Na Value for Each Group in R
Loop for Reverse Geocoding in R
How to Display Strip Labels Below the Plot When Faceting
Reduce Space Between Grid.Arrange Plots
Filter Groups in Dplyr That Exclusively Contain Specific Combinations of Values