Fast reading and combining several files using data.table (with fread)
Use rbindlist()
which is designed to rbind
a list
of data.table
's together...
mylist <- lapply(all.files, readdata)
mydata <- rbindlist( mylist )
And as @Roland says, do not set the key in each iteration of your function!
So in summary, this is best :
l <- lapply(all.files, fread, sep=",")
dt <- rbindlist( l )
setkey( dt , ID, date )
How to read multiple files once using 'fread' in R
First, you need to list all files that you want to read. Then, you could use a loop to capture the data in a list like so:
filelist <- list.files(pattern='.snplist')
datalist <- list()
for(i in seq_along(filelist)) {
datalist[[i]] <- fread(filelist[i])
}
Note we use seq_along
instead of 1:length(filelist)
to avoid errors in case filelist
is empty (length 0).
Quick Read and Merge with Data.Table's Fread and Rbindlist
You could do datatablelist = lapply(list.files("my/data/directory/"), fread)
and then rbind the resulting list of data frames.
Although lapply
is cleaner than an explicit loop, your loop will work if you read the files directly into a list.
datatablelist = list()
for(i in 1:length(datafiles)){
datatablelist[[datafiles[i]]] = fread(datafiles[i])
}
read.csv faster than data.table::fread
data.table::fread
s significant performance advantage becomes clear if you consider larger files. Here is a fully reproducible example.
Let's generate a CSV file consisting of 10^5 rows and 100 columns
if (!file.exists("test.csv")) {
set.seed(2017)
df <- as.data.frame(matrix(runif(10^5 * 100), nrow = 10^5))
write.csv(df, "test.csv", quote = F)
}We run a
microbenchmark
analysis (note that this may take a couple of minutes depending on your hardware)library(microbenchmark)
res <- microbenchmark(
read.csv = read.csv("test.csv", header = TRUE, stringsAsFactors = FALSE, colClasses = "numeric"),
fread = data.table::fread("test.csv", sep = ",", stringsAsFactors = FALSE, colClasses = "numeric"),
times = 10)
res
# Unit: milliseconds
# expr min lq mean median uq max
# read.csv 17034.2886 17669.8653 19369.1286 18537.7057 20433.4933 23459.4308
# fread 287.1108 311.6304 432.8106 356.6992 460.6167 888.6531
library(ggplot2)
autoplot(res)
Related Topics
Change the Position of the Strip Label in Ggplot from the Top to the Bottom
Why Has Data.Table Defined := Rather Than Overloading <-
Find Out the Number of Days of a Month in R
Error: Vector Memory Exhausted (Limit Reached) R 3.5.0 MACos
R: Merge Two Irregular Time Series
Raw Text Strings for File Paths in R
Removing Specific Rows from a Dataframe
Export Data Frames to Excel via Xlsx with Conditional Formatting
R Command Line Passing a Filename to Script in Arguments (Windows)
Non-Numeric Argument to Binary Operator Error in R
Importing a Big Xlsx File into R
Shiny: Merge Cells in Dt::Datatable
How to Filter Data Without Losing Na Rows Using Dplyr
Modify Glm Function to Adopt User-Specified Link Function in R