Ruby/Rails CSV Parsing, Invalid Byte Sequence in Utf-8

Ruby/Rails CSV parsing, invalid byte sequence in UTF-8

You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:

file=File.open("input_file", "r:ISO-8859-1")

The second argument tells Ruby to open read only with the encoding ISO-8859-1.

Rails Import CSV Error: invalid byte sequence in UTF-8

Specify the encoding with encoding option:

CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end

Ruby `CSV.read` error invalid byte sequence in UTF-8 (ArgumentError)

First of all, your encoding doesn't look right:

'社員番号'.force_encoding("Shift_JIS").encode!
#=> "\x{E7A4}\xBE\x{E593}\xA1\x{E795}\xAA\x{E58F}\xB7"

force_encoding takes the bytes from str1 and interprets them as Shift JIS, whereas you probably want to convert the string to Shift JIS:

'社員番号'.encode('Shift_JIS')
#=> "\x{8ED0}\x{88F5}\x{94D4}\x{8D86}"

Next, you can pass a filename to CSV.read, so instead of:

file = File.open(filename)
CSV.read(file)

You can just write:

CSV.read(filename)

That said, you could either work with Shift JIS encoded strings:

require 'csv'
str1 = '社員番号'.encode("Shift_JIS")
str2 = 'メールアドレス'.encode("Shift_JIS")
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS', headers: true)
csv[str1]
csv[str2]

Or – and that's what I would do – you could work with UTF-8 strings by specifying a second encoding:

require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true)
csv[str1]
csv[str2]

encoding: 'Shift_JIS:UTF-8' instructs CSV to read Shift JIS data and transcode it to UTF-8. It's equivalent to passing 'r:Shift_JIS:UTF-8' to File.open

when we import csv data, how eliminate invalid byte sequence in UTF-8

Ruby 1.9 CSV has new parser that works with m17n. The parser works with Encoding of IO object in the string. Following methods: ::foreach, ::open, ::read, and ::readlines could take in optional options :encoding which you could specify the the Encoding.

For example:

CSV.read('/path/to/file', :encoding => 'windows-1251:utf-8')

Would convert all strings to UTF-8.

Also you can use the more standard encoding name 'ISO-8859-1'

CSV.read('/..', {:headers => true, :col_sep => ';', :encoding => 'ISO-8859-1'})

CSV importing in Rails - invalid byte sequence in UTF-8 with non-english characters

Solved it with a different approach, this is a much easier solution for importing CSV files into a Rails 3 model than using an external gem:

    require 'csv'
CSV.foreach('doc/socios_full.csv') do |row|
record = Associate.new(
:media_format => row[0],
:group => row[0],
:member => row[1],
:family_relationship_code => row[2],
:family_relationship_description => row[3],
:last_name => row[4],
:names => row[5],
...
)
record.save!
end

It works flawlessly, even with non-english characters (just tried a 75k import file!). Hope it's helpful for someone.

Invalid byte sequence in UTF-8, CSV import, Rails 4

I solved this by saving the file as a MDOS CSV, instead of the standard CSV file or the Windows CSV format.

Reading CSV File - invalid byte sequence in UTF-8

I've found a solution to discard all invalid utf8 bytes from a string :

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

(taken from this blog post)

Hope this helps.



Related Topics



Leave a reply



Submit