How to specify output file encoding in Ruby?
Here's an example that outputs a file in the UTF-16LE encoding:
open("data.txt", "w:UTF-16LE")
Ruby looks at the encoding of the string you are writing, and transcodes as necessary. Here's a very detailed blog post describing mechanics with excellent examples (see the section called "The Default External and Internal Encodings").
Write string with encoding of UTF-8 to a file
Usually Ruby takes the $LANG
env variable as a starter, if that one is set to utf-8, ruby should read files as utf-8 by default.
Wierd output characters (Chinese characters) when using Ruby to read / write CSV
The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb"
with utf-16le
encoding. Write the output in the same format.
num_lines=ARGV[0]
# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")
until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end
Useful references:
- Working with encodings in Ruby 1.9
- CSV encodings
- Determining the encoding of a CSV file
Related Topics
How to Get a Stack Trace Object in Ruby
Case-Insensitive Array#Include
Ruby: Sum Corresponding Members of Two or More Arrays
Class Method VS Constant in Ruby/Rails
How to Catch Errno::Econnreset Class in "Case When"
Rails Erb Form Helper Options_For_Select :Selected
How to Write Columns Header to a CSV File with Ruby
Conditional Key/Value in a Ruby Hash
In Ruby What Does "=>" Mean and How Does It Work
Form Submitted Twice, Due to :Remote=>True
Multiple Sinatra Apps Using Rack-Mount
Rails: Validation in Model VS Migration
Writing Over Previously Output Lines in the Command Prompt with Ruby