Ruby Unable to Parse a CSV File: CSV::Malformedcsverror (Illegal Quoting in Line 1.)

Ruby CSV - Illegal quoting in line 1. CSV::MalformedCSVError

I didn't find any way to read directly from remote file, if it contains BOM. So I use Tempfile file to create temporary file and then I do CSV.open with 'r:bom|utf-8':

doc = Document.find(doc_id)

path = "#{Rails.root.join('tmp')}/#{doc.name.split('.').first}_#{Time.now.to_i}.csv"

file = Tempfile.new(["#{doc.name.split('.').first}_#{Time.now.to_i}", '.csv'])
file.binmode
file << open(doc.file.url).read
file.close

CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.open(file.path, 'r:bom|utf-8', headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n").each_with_index do |line, index|

# do something

end
end

Now, it seems to parse the file.

CSV.read Illegal quoting in line x

I had this problem in a line like 123,456,a"b"c

The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.

Solution use a quote character besides " that I was sure would not appear in my data:

CSV.read(filename, :quote_char => "|")

Rescue CSV::MalformedCsvError: Illegal quoting in line n

Your solution works. The expected result resides in the variable my_array.

CSV::MalformedCSVError: Illegal quoting in line 1 with SmarterCSV

This is due to illegal Unicode characters inside your file.

You can process file with Unicode characters with

f = File.open(file_path, "r:bom|utf-8"); data = SmarterCSV.process(f); f.close

here data will contain parsed data.

Also refer official documentation on this:https://github.com/tilo/smarter_csv#notes-about-file-encodings

Illegal quoting in line 1 using Ruby CSV

Binary encoding of my file is below:

"\xFF\xFES\x00t\x00a\x00t\x00u\x00s\x00...

0xFF 0xFE is the byte order mark for UTF-16LE.

You have to specify the encoding when processing this file with CSV#foreach:

This method also understands an additional :encoding parameter that
you can use to specify the Encoding of the data in the file to be
read. You must provide this unless your data is in
Encoding::default_external(). CSV will use this to determine how to
parse the data. You may provide a second Encoding to have the data
transcoded as it is read. For example, encoding: "UTF-32BE:UTF-8"
would read UTF-32BE data from the file but transcode it to UTF-8
before CSV parses it.

Furthermore you have to specify that a BOM is present. According to the IO#new docs:

If “BOM|UTF-8”, “BOM|UTF-16LE” or “BOM|UTF16-BE” are (...) present, the BOM is stripped

Applied to your file and example:

CSV.foreach(file, col_sep: "\t", encoding: "BOM|UTF-16LE:UTF-8", headers: true) do |row|
# ...
end

Illegal Quoting error with Ruby CSV parsing

The problem causing the Illegal quoting error was due to a Byte-Order-Mark (BOM) at the very beginning of the file. It didn't show up in editors, but the Ruby CSV lib was choking on it unless :encoding => 'bom|utf-8' was set.

Once that was fixed, I still needed to remove all the '^M' characters by running %s/\r//g in vim. And everything was working fine after that.



Related Topics



Leave a reply



Submit