Ruby CSV - Illegal quoting in line 1. CSV::MalformedCSVError
I didn't find any way to read directly from remote file, if it contains BOM. So I use Tempfile file to create temporary file and then I do CSV.open with 'r:bom|utf-8':
doc = Document.find(doc_id)
path = "#{Rails.root.join('tmp')}/#{doc.name.split('.').first}_#{Time.now.to_i}.csv"
file = Tempfile.new(["#{doc.name.split('.').first}_#{Time.now.to_i}", '.csv'])
file.binmode
file << open(doc.file.url).read
file.close
CSV.open(path, 'w', headers: :first_row, col_sep: ';', row_sep: "\r\n", encoding: 'utf-8') do |csv|
CSV.open(file.path, 'r:bom|utf-8', headers: :first_row, col_sep: ';', quote_char: "\"", row_sep: "\r\n").each_with_index do |line, index|
# do something
end
end
Now, it seems to parse the file.
CSV.read Illegal quoting in line x
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting "
, if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides "
that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
Rescue CSV::MalformedCsvError: Illegal quoting in line n
Your solution works. The expected result resides in the variable my_array.
CSV::MalformedCSVError: Illegal quoting in line 1 with SmarterCSV
This is due to illegal Unicode characters inside your file.
You can process file with Unicode characters with
f = File.open(file_path, "r:bom|utf-8"); data = SmarterCSV.process(f); f.close
here data will contain parsed data.
Also refer official documentation on this:https://github.com/tilo/smarter_csv#notes-about-file-encodings
Illegal quoting in line 1 using Ruby CSV
Binary encoding of my file is below:
"\xFF\xFES\x00t\x00a\x00t\x00u\x00s\x00...
0xFF
0xFE
is the byte order mark for UTF-16LE.
You have to specify the encoding when processing this file with CSV#foreach
:
This method also understands an additional
:encoding
parameter that
you can use to specify the Encoding of the data in the file to be
read. You must provide this unless your data is in
Encoding::default_external()
. CSV will use this to determine how to
parse the data. You may provide a second Encoding to have the data
transcoded as it is read. For example,encoding: "UTF-32BE:UTF-8"
would read UTF-32BE data from the file but transcode it to UTF-8
before CSV parses it.
Furthermore you have to specify that a BOM is present. According to the IO#new
docs:
If “BOM|UTF-8”, “BOM|UTF-16LE” or “BOM|UTF16-BE” are (...) present, the BOM is stripped
Applied to your file and example:
CSV.foreach(file, col_sep: "\t", encoding: "BOM|UTF-16LE:UTF-8", headers: true) do |row|
# ...
end
Illegal Quoting error with Ruby CSV parsing
The problem causing the Illegal quoting
error was due to a Byte-Order-Mark (BOM) at the very beginning of the file. It didn't show up in editors, but the Ruby CSV lib was choking on it unless :encoding => 'bom|utf-8'
was set.
Once that was fixed, I still needed to remove all the '^M' characters by running %s/\r//g
in vim. And everything was working fine after that.
Related Topics
Rails: Your User Account Isn't Allowed to Install to the System Rubygems
Building a Windows Executable from My Ruby App
App Pushed to Heroku Still Shows Standard Index Page
How Might I Pass Text Data from the Ruby Console into My Clipboard Without Saving to a File
Run Ruby Script in the Background
Using Typeahead from Twitter Bootstrap in a Form (Formtastic)
Ruby: How to Count the Number of Times a String Appears in Another String
Ruby Koan 151 Raising Exceptions
Ruby Concatenate Strings and Add Spaces
How to Add Two Weeks to Time.Now
How to Remove Validation Using Instance_Eval Clause in Rails
Unit Test in Rails - Model with Paperclip
Passing Param Values to Redirect_To as Querystring in Rails
Using the Postgresql Gem Async
How to Get the Width of Terminal Window in Ruby
Difference Between Add_Dependency and Add_Runtime_Dependency