Invalid byte sequence in UTF-8 (ArgumentError)
Probably your string is not in UTF-8 format, so use
if ! file_content.valid_encoding?
s = file_content.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
s.gsub(/dr/i,'med')
end
See "Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8".
Invalid Byte Sequence In UTF-8 Ruby
As Arie already answered this error is because invalid byte sequence \xC3
If you are using Ruby 2.1 +, you can also use String#scrub
to replace invalid bytes with given replacement character. Here:
a = "abce\xC3"
# => "abce\xC3"
a.scrub
# => "abce�"
a.scrub.sub("a","A")
# => "Abce�"
`scan': invalid byte sequence in UTF-8 (ArgumentError)
The linked text file contains the following line:
Character set encoding: ISO-8859-1
If converting it isn't desired or possible then you have to tell Ruby that this file is ISO-8859-1 encoded. Otherwise the default external encoding is used (UTF-8 in your case). A possible way to do that is:
s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1')
s.encoding # => #<Encoding:ISO-8859-1>
Or even like this if you prefer your string UTF-8 encoded (see utf8everywhere.org):
s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1:UTF-8')
s.encoding # => #<Encoding:UTF-8>
Ruby Invalid Byte Sequence in UTF-8
The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)
and #encoding: UTF-8
solved the issue.
ArgumentError invalid byte sequence in UTF-8
You get these errors because the Zip gem assumes the filenames to be encoded in UTF-8 but they are actually in a different encoding.
To fix the error, you first have to find the correct encoding. Let's re-create the string from its bytes:
bytes = [111, 117, 116, 112, 117, 116, 50, 48, 50, 48, 49,
50, 48, 55, 95, 49, 52, 49, 54, 48, 50, 47, 87,
78, 83, 95, 85, 80, 151, 112, 131, 102, 129, 91,
131, 94, 46, 116, 120, 116]
string = bytes.pack('c*')
#=> "output20201207_141602/WNS_UP\x97p\x83f\x81[\x83^.txt"
We can now traverse the Encoding.list
and select
those that return the expected result:
Encoding.list.select do |enc|
s = string.encode('UTF-8', enc) rescue next
s.end_with?('WNS_UP用データ.txt')
end
#=> [
# #<Encoding:Windows-31J>,
# #<Encoding:Shift_JIS>,
# #<Encoding:SJIS-DoCoMo>,
# #<Encoding:SJIS-KDDI>,
# #<Encoding:SJIS-SoftBank>
# ]
All of the above encodings result in the correct output.
Back to your code, you could use:
path = entry.name.encode('UTF-8', 'Windows-31J')
#=> "output20201207_141602/WNS_UP用データ.txt"
ext = File.extname(path)
#=> ".txt"
file_name = File.basename(path)
#=> "WNS_UP用データ.txt"
The Zip gem also has an option to set an explicit encoding for non-ASCII file names. You might want to give it a try by setting Zip.force_entry_names_encoding = 'Windows-31J'
(haven't tried it)
File.readlines invalid byte sequence in UTF-8 (ArgumentError)
I am trying to get this solution working. I have seen people doing
.encode!('UTF-8', 'UTF-8', :invalid => :replace)
but it doesnt appear to work with File.readlines.
File.readlines returns an Array. Arrays don't have an encode method. On the other hand, strings do have an encode method.
could you please provide an example to the alternative above.
require 'csv'
CSV.foreach("log.csv", encoding: "utf-8") do |row|
md = row[0].match /watch\?v=/
puts row[0], row[1], row[3] if md
end
Or,
CSV.foreach("log.csv", 'rb:utf-8') do |row|
If you need more speed, use the fastercsv gem.
This seems to have worked for me.
File.readlines('log.csv', :encoding => 'ISO-8859-1')
Yes, in order to read a file you have to know its encoding.
Related Topics
Trying to Get Svn2Git Working on Windows
Single Custom Param Name in Routes for Nested Resources Rails 4.1
How to Create a Folder (If Not Present) with Logger.New
How to Download a File Over Http Using Ruby
Prawnto Displaying Tables That Don't Break When New Page
Lion: Problem with Rvm Installing Rubies - Problem Related to Openssl
Ruby - No Pid Found in Tmp/Pids/Thin.Pid (Thin::Pidfilenotfound)
Get Time from Datetime Variable in Ruby
Ruby on Rails Looks for CSS in Assets Instead of Public/Stylesheets
Sorting an Array of Arrays in Ruby
Os X Mavericks Install Rvm Warning
Fresh Install of Rvm in Ubuntu Isn't Letting Me Install Gems (Zlib Error)
An Error Occurred While Installing Curb (0.8.5)
Nokogiri Recursively Get All Children
How to Properly Test Cancan Abilities with Rspec