Read All Worksheets in an Excel Workbook into an R List With Data.Frames

How to read in excel sheets into one data frame in R and skip certain lines

In read_excel function you are not passing the sheet that you want to read which is present in sheetnames variable. Try the following :

library(readxl)
path <- "/Users/xxx/file.xlsx"
sheetnames <- excel_sheets(path)
mylist <- lapply(sheetnames, function(x)
read_excel(path,x, col_names = TRUE,skip = 1))
#col_names is TRUE by default so you can use this without anonymous function like
#mylist <- lapply(sheetnames, read_excel, path = path, skip = 1)

# name the dataframes
names(mylist) <- sheetnames

#use Map to bind all the elements of the list into a dataframe
my_list <- Map(cbind, mylist, Cluster = names(mylist))
df <- do.call("rbind", my_list)

Import excel workbook with multiple sheets

You can use readxl package. See the following example.

library(readxl)
path <- readxl_example("datasets.xls")
sheetnames <- excel_sheets(path)
mylist <- lapply(excel_sheets(path), read_excel, path = path)

# name the dataframes
names(mylist) <- sheetnames

The spreadsheet will be captured in a list with the sheetname as the name of the dataframe in the list.

If you want to bring the dataframes out of the list use the next bit of code.

# Bring the dataframes to the global environment
list2env(mylist ,.GlobalEnv)

Import excel sheets as individual dataframes into R

We can use readxl package:

library(readxl)
my_sheet_names <- excel_sheets("my_file.xlsx")
my_sheets <- lapply(my_sheet_names, function(x) read_excel("my_file.xlsx", sheet = x))
names(my_sheets) <- my_sheet_names

This will give you a list of dataframes, each will be one your sheets. You can then save them as individual dataframes if desired:

list2env(my_sheets, envir=.GlobalEnv)

Read all worksheets (as dataframes) from multiple Excel workbooks of different structure

Since you mention the purrr package, some other tidyverse packages are worth considering.

  • dplyr for mutate(), when applying purrr::map() to a column of a data frame and storing the result as list-column.
  • tidyr for unnest(), which expands a list-column so that each row inside a list-column becomes a row in the overall data frame.
  • tibble for nicely printed nested data frames

Sample files are needed to demonstrate. This code uses the openxlsx package to create one file containing two sheets (the built-in iris and mtcars datasets), and another file containing three sheets (adding the built-in attitude dataset).

library(openxlsx)

# Create two spreadsheet files, with different numbers of worksheets
write.xlsx(list(iris, mtcars, attitude), "three_sheets.xlsx")
write.xlsx(list(iris, mtcars), "two_sheets.xlsx")

Now a solution.

First, list the filenames, which will passed to readxl::excel_sheets() for the names of the sheets within each file, and readxl::read_excel() to import the data itself.

(paths <- list.files(pattern = "*.xlsx"))
#> [1] "three_sheets.xlsx" "two_sheets.xlsx"

(x <- tibble::data_frame(path = paths))
#> # A tibble: 2 x 1
#> path
#> <chr>
#> 1 three_sheets.xlsx
#> 2 two_sheets.xlsx

'Map' the readxl::excel_sheets() function over each of the file paths, and store the results in a new list-column. Each row of the sheet_name column is a vector of sheet names. As expected, the first one has three sheet names, while the second has two.

(x <- dplyr::mutate(x, sheet_name = purrr::map(path, readxl::excel_sheets)))
#> # A tibble: 2 x 2
#> path sheet_name
#> <chr> <list>
#> 1 three_sheets.xlsx <chr [3]>
#> 2 two_sheets.xlsx <chr [2]>

We need to pass each filename and each sheet name into readxl::read_excel(path=, sheet=), so the next step is to have a data frame where each row gives a path and one sheet name. This is done using tidyr::unnest().

(x <- tidyr::unnest(x))
#> # A tibble: 5 x 2
#> path sheet_name
#> <chr> <chr>
#> 1 three_sheets.xlsx Sheet 1
#> 2 three_sheets.xlsx Sheet 2
#> 3 three_sheets.xlsx Sheet 3
#> 4 two_sheets.xlsx Sheet 1
#> 5 two_sheets.xlsx Sheet 2

Now each path and sheet name can be passed into readxl::read_excel(), using purrr::map2() rather than purrr::map() because we pass two arguments rather than one.

(x <- dplyr::mutate(x, data = purrr::map2(path, sheet_name,
~ readxl::read_excel(.x, .y))))
#> # A tibble: 5 x 3
#> path sheet_name data
#> <chr> <chr> <list>
#> 1 three_sheets.xlsx Sheet 1 <tibble [150 × 5]>
#> 2 three_sheets.xlsx Sheet 2 <tibble [32 × 11]>
#> 3 three_sheets.xlsx Sheet 3 <tibble [30 × 7]>
#> 4 two_sheets.xlsx Sheet 1 <tibble [150 × 5]>
#> 5 two_sheets.xlsx Sheet 2 <tibble [32 × 11]>

Now each dataset is in a separate row of the data column. We can look at just one of the datasets by subsetting that column.

x$data[3]
#> [[1]]
#> # A tibble: 30 x 7
#> rating complaints privileges learning raises critical advance
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 43.0 51.0 30.0 39.0 61.0 92.0 45.0
#> 2 63.0 64.0 51.0 54.0 63.0 73.0 47.0
#> 3 71.0 70.0 68.0 69.0 76.0 86.0 48.0
#> 4 61.0 63.0 45.0 47.0 54.0 84.0 35.0
#> 5 81.0 78.0 56.0 66.0 71.0 83.0 47.0
#> 6 43.0 55.0 49.0 44.0 54.0 49.0 34.0
#> 7 58.0 67.0 42.0 56.0 66.0 68.0 35.0
#> 8 71.0 75.0 50.0 55.0 70.0 66.0 41.0
#> 9 72.0 82.0 72.0 67.0 71.0 83.0 31.0
#> 10 67.0 61.0 45.0 47.0 62.0 80.0 41.0
#> # ... with 20 more rows

List of data.frame's to individual excel worksheets - R

Specify sheet name for each list element.

library(xlsx)
file <- paste("usarrests.xlsx", sep = "")
write.xlsx(USArrests, file, sheetName = "Sheet1")
write.xlsx(USArrests, file, sheetName = "Sheet2", append = TRUE)

Second approach as suggested by @flodel, would be to use addDataFrame. This is more or less an example from the help page of the said function.

file <- paste("usarrests.xlsx", sep="")
wb <- createWorkbook()
sheet1 <- createSheet(wb, sheetName = "Sheet1")
sheet2 <- createSheet(wb, sheetName = "Sheet2")

addDataFrame(USArrests, sheet = sheet1)
addDataFrame(USArrests * 2, sheet = sheet2)
saveWorkbook(wb, file = file)

Assuming you have a list of data.frames and a list of sheet names, you can use them pair-wise.

wb <- createWorkbook()
datas <- list(USArrests, USArrests * 2)
sheetnames <- paste0("Sheet", seq_along(datas)) # or names(datas) if provided
sheets <- lapply(sheetnames, createSheet, wb = wb)
void <- Map(addDataFrame, datas, sheets)
saveWorkbook(wb, file = file)

Import multiple sheets from excel spreadsheet into r

You are very close! You can use lapply and such to accomplish this using base R, but I routinely perform tasks like this using the purrr package.

library(purrr)
library(readxl)

sheets <- excel_sheets('data.xlsx')

sample_sheets <- sheets[grepl("samples", sheets)]

sheet_df <- map_dfr(sample_sheets, ~read_excel(path = 'data.xlsx', sheet = .x), id = .x)

This does:

  1. Get the names of the sheets.
  2. Use grepl to subset the sheets to only those containing "samples" in the name.
  3. Use map_dfr to iterate over the sample sheets, reading each one in and assigning an id column equal to the name of the sheet, then bind all the results together by rows and return a data frame.

Read a folder of excel files and import individual sheets as separate df's in R with Names

You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):

library(openxlsx)
library(tidyverse)

Starting with your `files_list` we can do:

# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))

# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")

# Assign the names to our list of dfs
names(list_of_dfs) <- df_names

# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)

Easy way to export multiple data.frame to multiple Excel worksheets

You can write to multiple sheets with the xlsx package. You just need to use a different sheetName for each data frame and you need to add append=TRUE:

library(xlsx)
write.xlsx(dataframe1, file="filename.xlsx", sheetName="sheet1", row.names=FALSE)
write.xlsx(dataframe2, file="filename.xlsx", sheetName="sheet2", append=TRUE, row.names=FALSE)

Another option, one that gives you more control over formatting and where the data frame is placed, is to do everything within R/xlsx code and then save the workbook at the end. For example:

wb = createWorkbook()

sheet = createSheet(wb, "Sheet 1")

addDataFrame(dataframe1, sheet=sheet, startColumn=1, row.names=FALSE)
addDataFrame(dataframe2, sheet=sheet, startColumn=10, row.names=FALSE)

sheet = createSheet(wb, "Sheet 2")

addDataFrame(dataframe3, sheet=sheet, startColumn=1, row.names=FALSE)

saveWorkbook(wb, "My_File.xlsx")

In case you might find it useful, here are some interesting helper functions that make it easier to add formatting, metadata, and other features to spreadsheets using xlsx:
http://www.sthda.com/english/wiki/r2excel-read-write-and-format-easily-excel-files-using-r-software



Related Topics



Leave a reply



Submit