How to load xlsx file using fread function?
Here's how: Using command line tools directly in conjunction with csvkit like this
my.dt<-fread('in2csv my.xls')
data.table::fread Read all worksheets in an Excel workbook
I used openxlsx::read.xlsx
the last time I needed to read many sheets from an XLSX.
#install.packages("openxlsx")
library(openxlsx)
#?openxlsx::read.xlsx
#using file chooser:
filename <- file.choose()
#or hard coded file name:
#filename <- "filename.xlsx"
#get all the sheet names from the workbook
SheetNames<-getSheetNames(filename)
# loop through each sheet in the workbook
for (i in SheetNames){
#Read the i'th sheet
tmp_sheet<-openxlsx::read.xlsx(filename, i)
#if the input file exists, append the new data;; else use the first sheet to initialize the input file
ifelse(exists("input"),
input<-rbind(input, tmp_sheet),
input<-tmp_sheet)
}
Note: This assumes each worksheet has identical column structure and data types. You may need to standardize\normalize the data (ex. tmp_sheet <- as.data.frame(sapply(tmp_sheet,as.character), stringsAsFactors=FALSE)
), or load each sheet into it's own dataframe and pre-process further before merging.
I can't read excel file using dt.fread from datatable AttributeError
The issue is that datatable package is not updated yet to make use of xldr>1.2.0, so in order to make it work you have to install xldr = 1.2.0
pip install xldr==1.2.0
I hope it helped.
How to read tab separated file into data.table using fread?
This has been fixed recently in the devel version, v1.9.5 (will be soon available on CRAN as v1.9.6):
require(data.table) # v1.9.5+
fread("~/Downloads/tmp.txt")
# V1 V2 V3
# 1: Beth 4.00 0
# 2: Dan 3.75 0
# 3: Kathy 4.00 10
# 4: Mark 5.00 20
# 5: Mary 5.50 22
# 6: Susie 4.25 18
See README.md
in the project page for more info. fread
gained strip.white
argument (amidst other functionalities / bug fixes) which is by default TRUE
.
Update: it also has col.names
argument now:
fread("~/Downloads/tmp.txt", col.names = c("Name", "PayRate", "HoursWorked"))
# Name PayRate HoursWorked
# 1: Beth 4.00 0
# 2: Dan 3.75 0
# 3: Kathy 4.00 10
# 4: Mark 5.00 20
# 5: Mary 5.50 22
# 6: Susie 4.25 18
Fastest way to read large Excel xlsx files? To parallelize or not?
You could try to run it in parallel using the parallel
package, but it is a bit hard to estimate how fast it will be without sample data:
library(parallel)
library(readxl)
excel_path <- ""
sheets <- excel_sheets(excel_path)
Make a cluster with a specified number of cores:
cl <- makeCluster(detectCores() - 1)
Use parLapplyLB
to go through the excel sheets and read them in parallel using load balancing:
parLapplyLB(cl, sheets, function(sheet, excel_path) {
readxl::read_excel(excel_path, sheet = sheet)
}, excel_path)
You can use the package microbenchmark
to test how fast certain options are:
library(microbenchmark)
microbenchmark(
lapply = {lapply(sheets, function(sheet) {
read_excel(excel_path, sheet = sheet)
})},
parralel = {parLapplyLB(cl, sheets, function(sheet, excel_path) {
readxl::read_excel(excel_path, sheet = sheet)
}, excel_path)},
times = 10
)
In my case, the parallel version is faster:
Unit: milliseconds
expr min lq mean median uq max neval
lapply 133.44857 167.61801 179.0888 179.84616 194.35048 226.6890 10
parralel 58.94018 64.96452 118.5969 71.42688 80.48588 316.9914 10
The test file contains of 6 sheets, each containing this table:
test test1 test3 test4 test5
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
9 9 9 9 9 9
10 10 10 10 10 10
11 11 11 11 11 11
12 12 12 12 12 12
13 13 13 13 13 13
14 14 14 14 14 14
15 15 15 15 15 15
Note:
you can use stopCluster(cl)
to shut down the workers when the process is finished.
Cannot import XLSX file
At least two issues here:
- you have a bogus-looking tilde (
~
) at the beginning of your file name data.table::fread()
reads "delimited" files (i.e., space or whitespace or tab or comma-separated), not XLSX files
Try e.g.
readxl::read_excel("C:/matly/Desktop/Grad School/Class 4/Customer.xlsx")
Other style points:
read_excel
automatically usesstringsAsFactors=FALSE
; it returns a "tibble", which is almost (but not quite!) the same as a data frame- using
/
as a path separator works cross-platform and is a little easier to read - I'd strongly encourage you to change your working directory and use relative path names, e.g.
setwd("C:/matly/Desktop/Grad School/Class 4/")
readxl::read_excel("Customer.xlsx")
Related Topics
Creating a Grouped Bar Plot in R
Using Data.Table to Create a Column of Regression Coefficients
Convert R Dataframe from Long to Wide Format, But with Unequal Group Sizes, for Use with Qcc
How to Install R Packages via Proxy [User + Password]
Grid.Table and Tablegrob in Gridextra Package
Inline R Code in Yaml for Rmarkdown Doesn't Run
Making Gsub Only Replace Entire Words
How to Find Which Polygon a Point Belong to via Sf
Unzip Password Protected Zip Files in R
How to Know a Function or an Operation in R Is Vectorized
Ggplot Piecharts on a Ggmap: Labels Destroy the Small Plots
Creating Igraph with Isolated Nodes
How to Plot a Boxplot from Previously-Calculated Statistics Easily (In R)
Setting Working Directory: Julia Versus R
Does R-Server or Shiny Server Create a New R Process/Instance for Each User
How to Log an R Session to a File
Differencebetween Scale Transformation and Coordinate System Transformation