Ruby 1.9: Invalid Byte Sequence in Utf-8

Ruby 1.9.3 Invalid byte sequence in UTF-8 explanation needed

I have 64 bit Cygwin, Ruby 2.0.0 and gem 2.4.1 and was experiencing the same issue. gem install ..., gem update, everything ended with "ERROR: While executing gem ... (ArgumentError) invalid byte sequence in UTF-8".

I had also all locales set to "en_US.UTF-8".

I have read somewhere that it should help to set LANG to an empty string or "C.BINARY", but it didn't help. But it was good hint to start experimenting.

Finally I have solved that by setting both LANG and LC_ALL to an empty string. All other locale environment variables (LC_CTYPE etc.) was automatically set to "C.UTF-8" by that, LANG and LC_ALL remained empty.

Now gem is finally working.



UPDATE

It seems that specifically LC_CTYPE is causing that issue if it's set to UTF-8. So setting it to C.BINARY should help. Other locale environment variables can be set to UTF-8 without affecting it.

export LC_CTYPE=C.BINARY

Invalid Byte Sequence In UTF-8 Ruby

As Arie already answered this error is because invalid byte sequence \xC3

If you are using Ruby 2.1 +, you can also use String#scrub to replace invalid bytes with given replacement character. Here:

a = "abce\xC3"
# => "abce\xC3"
a.scrub
# => "abce�"
a.scrub.sub("a","A")
# => "Abce�"

ArgumentError (invalid byte sequence in UTF-8): Ruby 1.9.3 render view

For Fixed this, only used gem 'mysql2' and change adapter in my database.yml, and change the encoding

staging:

adapter: mysql2

database: data_basename

username: root

encoding: utf8

Ruby Invalid Byte Sequence in UTF-8

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and #encoding: UTF-8 solved the issue.

rake invalid byte sequence in UTF-8

Try saving the offending file (could be anything that rake is trying) in UTF-8 WITH BOM.

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

"€foo\xA0".chars.select(&:valid_encoding?).join

Ruby `CSV.read` error invalid byte sequence in UTF-8 (ArgumentError)

First of all, your encoding doesn't look right:

'社員番号'.force_encoding("Shift_JIS").encode!
#=> "\x{E7A4}\xBE\x{E593}\xA1\x{E795}\xAA\x{E58F}\xB7"

force_encoding takes the bytes from str1 and interprets them as Shift JIS, whereas you probably want to convert the string to Shift JIS:

'社員番号'.encode('Shift_JIS')
#=> "\x{8ED0}\x{88F5}\x{94D4}\x{8D86}"

Next, you can pass a filename to CSV.read, so instead of:

file = File.open(filename)
CSV.read(file)

You can just write:

CSV.read(filename)

That said, you could either work with Shift JIS encoded strings:

require 'csv'
str1 = '社員番号'.encode("Shift_JIS")
str2 = 'メールアドレス'.encode("Shift_JIS")
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS', headers: true)
csv[str1]
csv[str2]

Or – and that's what I would do – you could work with UTF-8 strings by specifying a second encoding:

require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true)
csv[str1]
csv[str2]

encoding: 'Shift_JIS:UTF-8' instructs CSV to read Shift JIS data and transcode it to UTF-8. It's equivalent to passing 'r:Shift_JIS:UTF-8' to File.open



Related Topics



Leave a reply



Submit