Ruby/Rails CSV parsing, invalid byte sequence in UTF-8
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Rails Import CSV Error: invalid byte sequence in UTF-8
Specify the encoding with encoding
option:
CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end
Ruby `CSV.read` error invalid byte sequence in UTF-8 (ArgumentError)
First of all, your encoding doesn't look right:
'社員番号'.force_encoding("Shift_JIS").encode!
#=> "\x{E7A4}\xBE\x{E593}\xA1\x{E795}\xAA\x{E58F}\xB7"
force_encoding
takes the bytes from str1
and interprets them as Shift JIS, whereas you probably want to convert the string to Shift JIS:
'社員番号'.encode('Shift_JIS')
#=> "\x{8ED0}\x{88F5}\x{94D4}\x{8D86}"
Next, you can pass a filename to CSV.read
, so instead of:
file = File.open(filename)
CSV.read(file)
You can just write:
CSV.read(filename)
That said, you could either work with Shift JIS encoded strings:
require 'csv'
str1 = '社員番号'.encode("Shift_JIS")
str2 = 'メールアドレス'.encode("Shift_JIS")
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS', headers: true)
csv[str1]
csv[str2]
Or – and that's what I would do – you could work with UTF-8 strings by specifying a second encoding:
require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true)
csv[str1]
csv[str2]
encoding: 'Shift_JIS:UTF-8'
instructs CSV
to read Shift JIS data and transcode it to UTF-8. It's equivalent to passing 'r:Shift_JIS:UTF-8'
to File.open
when we import csv data, how eliminate invalid byte sequence in UTF-8
Ruby 1.9 CSV has new parser that works with m17n. The parser works with Encoding of IO object in the string. Following methods: ::foreach, ::open, ::read, and ::readlines
could take in optional options :encoding
which you could specify the the Encoding.
For example:
CSV.read('/path/to/file', :encoding => 'windows-1251:utf-8')
Would convert all strings to UTF-8.
Also you can use the more standard encoding name 'ISO-8859-1'
CSV.read('/..', {:headers => true, :col_sep => ';', :encoding => 'ISO-8859-1'})
CSV importing in Rails - invalid byte sequence in UTF-8 with non-english characters
Solved it with a different approach, this is a much easier solution for importing CSV files into a Rails 3 model than using an external gem:
require 'csv'
CSV.foreach('doc/socios_full.csv') do |row|
record = Associate.new(
:media_format => row[0],
:group => row[0],
:member => row[1],
:family_relationship_code => row[2],
:family_relationship_description => row[3],
:last_name => row[4],
:names => row[5],
...
)
record.save!
end
It works flawlessly, even with non-english characters (just tried a 75k import file!). Hope it's helpful for someone.
Invalid byte sequence in UTF-8, CSV import, Rails 4
I solved this by saving the file as a MDOS CSV, instead of the standard CSV file or the Windows CSV format.
Reading CSV File - invalid byte sequence in UTF-8
I've found a solution to discard all invalid utf8 bytes from a string :
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]
(taken from this blog post)
Hope this helps.
Related Topics
Bundler Install Getting "I18N Requires Ruby Version >= 1.9.3"
Rails: Hasmanythroughassociationnotfounderror
Can Someone Explain Ruby's Use of Pipe Characters in a Block
Ruby/Rails - Models Named with Two Words (Naming Convention Issues)
Rails If Object.Nil? Then Magic '' in Views
How to Calculate the Offset, in Hours, of a Given Timezone from Utc in Ruby
Generate and Publish Ruby Based Rest APIs Documentation
Capybara: Select an Option by Value Not Text
How to Drop to the Irb Prompt from a Running Script
How to Customize Gemfile Per Developer
Installed Rails But the Rails Command Says It's Not Installed
Rails Authentication Across Apps/Servers
How to Change Hash Keys from 'Symbol's to 'String'S