Fread and a Quoted Multi-Line Column Value

fread and a quoted multi-line column value

UPDATE: Now fixed in v1.9.3 on GitHub :

  • fread() now accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting.



This error has been reported before and it's on the list to do. But what's new here is the \n inside the quotes. I hadn't realised that was a use case giving rise to the error.

Many thanks for reporting. It'll be fixed.

Similar question but not exactly the same here :

data.table::fread and Unbalanced "

and the bug report is here :

https://r-forge.r-project.org/tracker/?group_id=240&atid=975&func=detail&aid=2694

Interpreting new line \n character when reading from file using fread in r

You are misinterpreting what fread is doing. Your input file contains a backslash followed by n, and that's what the string from fread contains. However, when you print a string containing a backslash, it is doubled. (Use cat() to print it if you don't want this.) Your strAsIntended variable doesn't contain a backslash, it contains a single newline character, which is displayed as \n when printed.

If you want to convert the \n in your input file into a newline character, used gsub or another substitution function. For example,

dt[,3] <- gsub("\\n", "\n", dt[,3], fixed = TRUE)

Issue with double quotes and fread function

As @Arun kindly suggested, the data.table development version 1.9.5 currently on github may be of help here.

To install please follow this procedure (Rtools required):

# To install development version

library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)

It has been tested so this is to confirm that the newest version of data.table solves the issue with double quotes without problems.

For further details and updates check the following link github data.table

Read CSV file with embedded double quotes and commas

You could try cleaning your data beforehand and replace the double quotes with single quotes.

x = readLines('my_file.csv')
y = gsub('","', "','", x) # replace double quotes for each field
y = gsub('^"|"$', "'", y) # replace trailing and leading double quotes
z = paste(y, collapse='\n') # turn it back into a table for fread to read
df = fread(z, quote="'")
df

SA SU CC CN POC PAC
1: NE R 0 H "B", O 1 8
2: A A 0 P E,5 8

I can't confirm that this is efficient since I don't know how big your file is, but it might be a worthwhile approach.



Related Topics



Leave a reply



Submit