Importing Multiple Excel Files with Filenames in R

How to add filename when importing multiple Excel files in R?

Assuming that your vector of file names, file.list, looks something like this:

file.list <- c("20200101-foo1.xls", "20210101-foo2.xls")

The trick is to make df.list a named list. You can use stringr::str_replace to make the YYYYMMDD names. In this example, remove everything from the "-" onwards:

library(stringr)
file.list %>% str_replace("-.+", "")
[1] "20200101" "20210101"

So supply the names to the list like this:

names(df.list) <- file.list %>% 
str_replace("-.+", "")

And now bind_rows with the .id argument will create a column with the names:

df <- bind_rows(df.list, .id = "filename")

Importing multiple Excel files with filenames in R

Figured it out myself. The key was to use rbind.fill instead of rbind.

library(plyr)
df.list <- lapply(filenames, function(x) read.xlsx(file=x, sheetIndex=1,
colIndex=1:4,as.data.frame=TRUE, header=FALSE, FILENAMEVAR=x))
final.df <- rbind.fill(df.list)

Import multiple Excel files with names in R as a list

You can do :

names(data.list) <- sub('\\.xlsx', '', basename(data))

Or without any regex :

names(data.list) <- tools::file_path_sans_ext(basename(data))

Read a folder of excel files and import individual sheets as separate df's in R with Names

You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):

library(openxlsx)
library(tidyverse)

Starting with your `files_list` we can do:

# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))

# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")

# Assign the names to our list of dfs
names(list_of_dfs) <- df_names

# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)

Read one worksheet from multiple excel files using purrr and readxl and add field

Supposing the two packs.xlsx files are in different subfolders:

library(readxl)

filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")

# add the filename as a field
xl$filename <- fn

# function return
xl
})

# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)

How can I read multiple (excel) files into R?

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

If you also want to include the files in subdirectories, use:

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

Other possible packages for reading Excel-files: openxlsx & xlsx


Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from dplyr:

library(dplyr)
df <- bind_rows(df.list, .id = "id")

or with rbindlist from data.table:

library(data.table)
df <- rbindlist(df.list, idcol = "id")

Both have the option to add a id column for identifying the separate datasets.


Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

df.list <- sapply(file.list, read.csv, simplify=FALSE)

When using bind_rows from dplyr or rbindlist from data.table, the id column now contains the filenames.

Even another approach is using the purrr-package:

library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names

df <- map_df(file.list, read.csv, .id = "id")

Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:

# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.

How to import and merge multiple excel files in R?

The error message tells you everything. The path you're passing doesn't exist. Either you set your working directory to F:/Spring 2019/Thesis_data/Kam_Thesis/data/water_level (?setwd) or you need to be passing the full paths like F:/Spring 2019/Thesis_data/Kam_Thesis/data/water_level/barishal_sw183.xlsx to read_excel



Related Topics



Leave a reply



Submit