R data.table fread command : how to read large files with irregular separators?
sed 's/^[[:blank:]]*//;s/[[:blank:]]\{1,\}/,/g'
for you sed
it's not possible to collect all result of fread into 1 (temporary) file (adding the source reference) and treat this file with sed (or other tool) to avoid a fork of the tools at every iteration ?
How to read file with irregular space separated value using fread()?
As said in this post, this is now handled automatically by fread
(see README
for more):
require(data.table) #v1.9.5+
fread("~/Downloads/tmp.txt")
# Date Col1 Col2
# 1: 2014-01-01 123 12
# 2: 2014-01-01 123 21
# 3: 2014-01-01 124 32
# 4: 2014-01-01 125 32
# 5: 2014-01-02 123 34
# 6: 2014-01-02 126 24
# 7: 2014-01-02 127 23
# 8: 2014-01-03 521 21
# 9: 2014-01-03 123 13
# 10: 2014-01-03 126 15
Either update to latest devel version or wait until v1.9.6 hits CRAN.
R data.table fread cannot read in irregular column lengths when the larger rows do not appear early in a file
Note that what you have is not a csv file since it has no header. If we add a header it will work. First use fread
to read it in as a single field per line giving the character vector Lines
. From that compute the maximum number of fields n
. Finally re-read Lines
after prefixing it with a header.
Lines <- fread("shortLong.csv", sep = "")[[1]]
n <- max(count.fields(textConnection(Lines), sep = ","))
fread(text = c(toString(1:n), Lines), header = TRUE, fill = TRUE)
R: Is there a way to fread data.table from file where some (irregular) lines are skipped?
Thanks to Frank,
Solution:
Step 1: Make lines that should be skipped blank (with external editor), then
Step 2: Run fread(text, blank.lines.skip=TRUE)
How to handle data with no space between separators when using fread in R
I have tried using read.table with the option skipNul = TRUE, and this
works perfectly. However, there doesn't seem to be any option similar
to skipNul for fread.
This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :
- fread() now skips embedded NUL (\0), #3400. Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.
R: fread multiple files with different decimal seperators
It is not normal to have a file with 2 different types of numeric separator.
You should question the source of the file first thing.
Nevertheless, if you have such a file, the correct way to read it is with the variables with a comma separator as a string then to convert it to a numeric.
library(data.table)
dt <- data.table(a=c("1,4","2,0","4,5","3,5","6,9"),c=(10:14))
write.csv(dt,"dt.csv",row.names=F)
dcsv <- fread("dt.csv", dec = ".")
dcsv[, a:= as.numeric(gsub("\"", "", gsub(",", ".", a)))]
If you don't know if your variable is with a comma or a dot separator, you can loop over your variable to test if the variable is a string with only number and comma and convert only the ones fulfilling that condition.
Related Topics
Create a Histogram for Weighted Values
How to Underline Text in a Plot Title or Label? (Ggplot2)
Write Different Data Frame in One .CSV File with R
R Specify Function Environment
Row-Wise Sum of Values Grouped by Columns with Same Name
Subset Dataframe Based on Posixct Date and Time Greater Than Datetime Using Dplyr
R: Calculate Means for Subset of a Group
How to Read a Text File into Gnu R with a Multiple-Byte Separator
How to Pass Individual 'Curvature' Arguments in 'Ggplot2' 'Geom_Curve' Function
How to Pad a Vector with Na from the Front
From Long to Wide Data with Multiple Columns
How to Calculate Confidence Intervals for Nonlinear Least Squares in R
How to Find the Package Name in R for a Specific Function
Using Proxy Interface in Plotly/Shiny to Dynamically Change Data
Number of Rows Each Data Frame in a List
Combining Geom_Point and Geom_Line with Position_Jitterdodge for Two Grouping Factors
Significance Level Added to Matrix Correlation Heatmap Using Ggplot2