R Data.Table Fread Command:How to Read Large Files with Irregular Separators

R data.table fread command : how to read large files with irregular separators?

sed 's/^[[:blank:]]*//;s/[[:blank:]]\{1,\}/,/g' 

for you sed

it's not possible to collect all result of fread into 1 (temporary) file (adding the source reference) and treat this file with sed (or other tool) to avoid a fork of the tools at every iteration ?

How to read file with irregular space separated value using fread()?

As said in this post, this is now handled automatically by fread (see README for more):

require(data.table) #v1.9.5+
fread("~/Downloads/tmp.txt")
# Date Col1 Col2
# 1: 2014-01-01 123 12
# 2: 2014-01-01 123 21
# 3: 2014-01-01 124 32
# 4: 2014-01-01 125 32
# 5: 2014-01-02 123 34
# 6: 2014-01-02 126 24
# 7: 2014-01-02 127 23
# 8: 2014-01-03 521 21
# 9: 2014-01-03 123 13
# 10: 2014-01-03 126 15

Either update to latest devel version or wait until v1.9.6 hits CRAN.

R data.table fread cannot read in irregular column lengths when the larger rows do not appear early in a file

Note that what you have is not a csv file since it has no header. If we add a header it will work. First use fread to read it in as a single field per line giving the character vector Lines. From that compute the maximum number of fields n. Finally re-read Lines after prefixing it with a header.

Lines <- fread("shortLong.csv", sep = "")[[1]]
n <- max(count.fields(textConnection(Lines), sep = ","))
fread(text = c(toString(1:n), Lines), header = TRUE, fill = TRUE)

R: Is there a way to fread data.table from file where some (irregular) lines are skipped?

Thanks to Frank,
Solution:

Step 1: Make lines that should be skipped blank (with external editor), then

Step 2: Run fread(text, blank.lines.skip=TRUE)

How to handle data with no space between separators when using fread in R

I have tried using read.table with the option skipNul = TRUE, and this
works perfectly. However, there doesn't seem to be any option similar
to skipNul for fread.

This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :


  1. fread() now skips embedded NUL (\0), #3400. Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.

R: fread multiple files with different decimal seperators

It is not normal to have a file with 2 different types of numeric separator.

You should question the source of the file first thing.

Nevertheless, if you have such a file, the correct way to read it is with the variables with a comma separator as a string then to convert it to a numeric.

library(data.table)

dt <- data.table(a=c("1,4","2,0","4,5","3,5","6,9"),c=(10:14))
write.csv(dt,"dt.csv",row.names=F)
dcsv <- fread("dt.csv", dec = ".")
dcsv[, a:= as.numeric(gsub("\"", "", gsub(",", ".", a)))]

If you don't know if your variable is with a comma or a dot separator, you can loop over your variable to test if the variable is a string with only number and comma and convert only the ones fulfilling that condition.



Related Topics



Leave a reply



Submit