How to read in excel sheets into one data frame in R and skip certain lines
In read_excel
function you are not passing the sheet that you want to read which is present in sheetnames
variable. Try the following :
library(readxl)
path <- "/Users/xxx/file.xlsx"
sheetnames <- excel_sheets(path)
mylist <- lapply(sheetnames, function(x)
read_excel(path,x, col_names = TRUE,skip = 1))
#col_names is TRUE by default so you can use this without anonymous function like
#mylist <- lapply(sheetnames, read_excel, path = path, skip = 1)
# name the dataframes
names(mylist) <- sheetnames
#use Map to bind all the elements of the list into a dataframe
my_list <- Map(cbind, mylist, Cluster = names(mylist))
df <- do.call("rbind", my_list)
Import excel workbook with multiple sheets
You can use readxl
package. See the following example.
library(readxl)
path <- readxl_example("datasets.xls")
sheetnames <- excel_sheets(path)
mylist <- lapply(excel_sheets(path), read_excel, path = path)
# name the dataframes
names(mylist) <- sheetnames
The spreadsheet will be captured in a list with the sheetname as the name of the dataframe in the list.
If you want to bring the dataframes out of the list use the next bit of code.
# Bring the dataframes to the global environment
list2env(mylist ,.GlobalEnv)
Import excel sheets as individual dataframes into R
We can use readxl
package:
library(readxl)
my_sheet_names <- excel_sheets("my_file.xlsx")
my_sheets <- lapply(my_sheet_names, function(x) read_excel("my_file.xlsx", sheet = x))
names(my_sheets) <- my_sheet_names
This will give you a list of dataframes, each will be one your sheets. You can then save them as individual dataframes if desired:
list2env(my_sheets, envir=.GlobalEnv)
Read all worksheets (as dataframes) from multiple Excel workbooks of different structure
Since you mention the purrr
package, some other tidyverse packages are worth considering.
dplyr
formutate()
, when applyingpurrr::map()
to a column of a data frame and storing the result as list-column.tidyr
forunnest()
, which expands a list-column so that each row inside a list-column becomes a row in the overall data frame.tibble
for nicely printed nested data frames
Sample files are needed to demonstrate. This code uses the openxlsx
package to create one file containing two sheets (the built-in iris
and mtcars
datasets), and another file containing three sheets (adding the built-in attitude
dataset).
library(openxlsx)
# Create two spreadsheet files, with different numbers of worksheets
write.xlsx(list(iris, mtcars, attitude), "three_sheets.xlsx")
write.xlsx(list(iris, mtcars), "two_sheets.xlsx")
Now a solution.
First, list the filenames, which will passed to readxl::excel_sheets()
for the names of the sheets within each file, and readxl::read_excel()
to import the data itself.
(paths <- list.files(pattern = "*.xlsx"))
#> [1] "three_sheets.xlsx" "two_sheets.xlsx"
(x <- tibble::data_frame(path = paths))
#> # A tibble: 2 x 1
#> path
#> <chr>
#> 1 three_sheets.xlsx
#> 2 two_sheets.xlsx
'Map' the readxl::excel_sheets()
function over each of the file paths, and store the results in a new list-column. Each row of the sheet_name
column is a vector of sheet names. As expected, the first one has three sheet names, while the second has two.
(x <- dplyr::mutate(x, sheet_name = purrr::map(path, readxl::excel_sheets)))
#> # A tibble: 2 x 2
#> path sheet_name
#> <chr> <list>
#> 1 three_sheets.xlsx <chr [3]>
#> 2 two_sheets.xlsx <chr [2]>
We need to pass each filename and each sheet name into readxl::read_excel(path=, sheet=)
, so the next step is to have a data frame where each row gives a path and one sheet name. This is done using tidyr::unnest()
.
(x <- tidyr::unnest(x))
#> # A tibble: 5 x 2
#> path sheet_name
#> <chr> <chr>
#> 1 three_sheets.xlsx Sheet 1
#> 2 three_sheets.xlsx Sheet 2
#> 3 three_sheets.xlsx Sheet 3
#> 4 two_sheets.xlsx Sheet 1
#> 5 two_sheets.xlsx Sheet 2
Now each path and sheet name can be passed into readxl::read_excel()
, using purrr::map2()
rather than purrr::map()
because we pass two arguments rather than one.
(x <- dplyr::mutate(x, data = purrr::map2(path, sheet_name,
~ readxl::read_excel(.x, .y))))
#> # A tibble: 5 x 3
#> path sheet_name data
#> <chr> <chr> <list>
#> 1 three_sheets.xlsx Sheet 1 <tibble [150 × 5]>
#> 2 three_sheets.xlsx Sheet 2 <tibble [32 × 11]>
#> 3 three_sheets.xlsx Sheet 3 <tibble [30 × 7]>
#> 4 two_sheets.xlsx Sheet 1 <tibble [150 × 5]>
#> 5 two_sheets.xlsx Sheet 2 <tibble [32 × 11]>
Now each dataset is in a separate row of the data
column. We can look at just one of the datasets by subsetting that column.
x$data[3]
#> [[1]]
#> # A tibble: 30 x 7
#> rating complaints privileges learning raises critical advance
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 43.0 51.0 30.0 39.0 61.0 92.0 45.0
#> 2 63.0 64.0 51.0 54.0 63.0 73.0 47.0
#> 3 71.0 70.0 68.0 69.0 76.0 86.0 48.0
#> 4 61.0 63.0 45.0 47.0 54.0 84.0 35.0
#> 5 81.0 78.0 56.0 66.0 71.0 83.0 47.0
#> 6 43.0 55.0 49.0 44.0 54.0 49.0 34.0
#> 7 58.0 67.0 42.0 56.0 66.0 68.0 35.0
#> 8 71.0 75.0 50.0 55.0 70.0 66.0 41.0
#> 9 72.0 82.0 72.0 67.0 71.0 83.0 31.0
#> 10 67.0 61.0 45.0 47.0 62.0 80.0 41.0
#> # ... with 20 more rows
List of data.frame's to individual excel worksheets - R
Specify sheet name for each list element.
library(xlsx)
file <- paste("usarrests.xlsx", sep = "")
write.xlsx(USArrests, file, sheetName = "Sheet1")
write.xlsx(USArrests, file, sheetName = "Sheet2", append = TRUE)
Second approach as suggested by @flodel, would be to use addDataFrame
. This is more or less an example from the help page of the said function.
file <- paste("usarrests.xlsx", sep="")
wb <- createWorkbook()
sheet1 <- createSheet(wb, sheetName = "Sheet1")
sheet2 <- createSheet(wb, sheetName = "Sheet2")
addDataFrame(USArrests, sheet = sheet1)
addDataFrame(USArrests * 2, sheet = sheet2)
saveWorkbook(wb, file = file)
Assuming you have a list of data.frames and a list of sheet names, you can use them pair-wise.
wb <- createWorkbook()
datas <- list(USArrests, USArrests * 2)
sheetnames <- paste0("Sheet", seq_along(datas)) # or names(datas) if provided
sheets <- lapply(sheetnames, createSheet, wb = wb)
void <- Map(addDataFrame, datas, sheets)
saveWorkbook(wb, file = file)
Import multiple sheets from excel spreadsheet into r
You are very close! You can use lapply
and such to accomplish this using base R, but I routinely perform tasks like this using the purrr
package.
library(purrr)
library(readxl)
sheets <- excel_sheets('data.xlsx')
sample_sheets <- sheets[grepl("samples", sheets)]
sheet_df <- map_dfr(sample_sheets, ~read_excel(path = 'data.xlsx', sheet = .x), id = .x)
This does:
- Get the names of the sheets.
- Use
grepl
to subset the sheets to only those containing "samples" in the name. - Use
map_dfr
to iterate over the sample sheets, reading each one in and assigning an id column equal to the name of the sheet, then bind all the results together by rows and return a data frame.
Read a folder of excel files and import individual sheets as separate df's in R with Names
You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
Easy way to export multiple data.frame to multiple Excel worksheets
You can write to multiple sheets with the xlsx
package. You just need to use a different sheetName
for each data frame and you need to add append=TRUE
:
library(xlsx)
write.xlsx(dataframe1, file="filename.xlsx", sheetName="sheet1", row.names=FALSE)
write.xlsx(dataframe2, file="filename.xlsx", sheetName="sheet2", append=TRUE, row.names=FALSE)
Another option, one that gives you more control over formatting and where the data frame is placed, is to do everything within R/xlsx code and then save the workbook at the end. For example:
wb = createWorkbook()
sheet = createSheet(wb, "Sheet 1")
addDataFrame(dataframe1, sheet=sheet, startColumn=1, row.names=FALSE)
addDataFrame(dataframe2, sheet=sheet, startColumn=10, row.names=FALSE)
sheet = createSheet(wb, "Sheet 2")
addDataFrame(dataframe3, sheet=sheet, startColumn=1, row.names=FALSE)
saveWorkbook(wb, "My_File.xlsx")
In case you might find it useful, here are some interesting helper functions that make it easier to add formatting, metadata, and other features to spreadsheets using xlsx
:
http://www.sthda.com/english/wiki/r2excel-read-write-and-format-easily-excel-files-using-r-software
Related Topics
How to See the Source Code of R .Internal or .Primitive Function
How to Merge Color, Line Style and Shape Legends in Ggplot
R on Macos Error: Vector Memory Exhausted (Limit Reached)
How to Order Data by Value Within Ggplot Facets
Subscript Out of Bounds - General Definition and Solution
Putting Mathematical Symbols and Subscripts Mixed With Regular Letters
Create New Variables With Mutate_At While Keeping the Original Ones
How to Sum a Numeric List Elements
Latitude Longitude Coordinates to State Code in R
Simpler Population Pyramid in Ggplot2
Dplyr::Select Function Clashes With Mass::Select
Dplyr Mutate Rowsums Calculations or Custom Functions
How to Perform Natural (Lexicographic) Sorting in R
Aggregate a Data Frame Based on Unordered Pairs of Columns