fread and a quoted multi-line column value
UPDATE: Now fixed in v1.9.3 on GitHub :
- fread() now accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting.
This error has been reported before and it's on the list to do. But what's new here is the \n inside the quotes. I hadn't realised that was a use case giving rise to the error.
Many thanks for reporting. It'll be fixed.
Similar question but not exactly the same here :
data.table::fread and Unbalanced "
and the bug report is here :
https://r-forge.r-project.org/tracker/?group_id=240&atid=975&func=detail&aid=2694
Interpreting new line \n character when reading from file using fread in r
You are misinterpreting what fread
is doing. Your input file contains a backslash followed by n
, and that's what the string from fread
contains. However, when you print a string containing a backslash, it is doubled. (Use cat()
to print it if you don't want this.) Your strAsIntended
variable doesn't contain a backslash, it contains a single newline character, which is displayed as \n
when printed.
If you want to convert the \n
in your input file into a newline character, used gsub
or another substitution function. For example,
dt[,3] <- gsub("\\n", "\n", dt[,3], fixed = TRUE)
Issue with double quotes and fread function
As @Arun kindly suggested, the data.table
development version 1.9.5 currently on github may be of help here.
To install please follow this procedure (Rtools required):
# To install development version
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
It has been tested so this is to confirm that the newest version of data.table
solves the issue with double quotes without problems.
For further details and updates check the following link github data.table
Read CSV file with embedded double quotes and commas
You could try cleaning your data beforehand and replace the double quotes with single quotes.
x = readLines('my_file.csv')
y = gsub('","', "','", x) # replace double quotes for each field
y = gsub('^"|"$', "'", y) # replace trailing and leading double quotes
z = paste(y, collapse='\n') # turn it back into a table for fread to read
df = fread(z, quote="'")
df
SA SU CC CN POC PAC
1: NE R 0 H "B", O 1 8
2: A A 0 P E,5 8
I can't confirm that this is efficient since I don't know how big your file is, but it might be a worthwhile approach.
Related Topics
R Geom_Tile Ggplot2 What Kind of Stat Is Applied
Data.Table: Sum by All Existing Combinations in Table
How to Draw Roc Curve Using Value of Confusion Matrix
How to Color Bar Plots When Using ..Prop.. in Ggplot
R Dplyr Subset with Missing Columns
How to Drop Factor Levels While Scraping Data Off Us Census HTML Site
How to Get This Data Structure in R
Remove the Columns with the Colsums=0
Removing Everything After First 'Backslash' in a String
Shiny - Custom Warning/Error Messages
Follow-Up: Generalizing a Data.Frame Subsetting Function 2
R/Ggplot Cumulative Sum in Histogram
Shiny: How to Stop Processing Invalidatelater() After Data Was Abtained or at the Given Time
R - Pivoting Duplicate Rows into Multiple Column with Unknown Number of Columns
Non-Equi-Joins in R with Data.Table - Backticked Column Name Trouble
R: Need Finite 'Ylim' Values in Function