How to Specify Output File Encoding in Ruby

How to specify output file encoding in Ruby?

Here's an example that outputs a file in the UTF-16LE encoding:

open("data.txt", "w:UTF-16LE")

Ruby looks at the encoding of the string you are writing, and transcodes as necessary. Here's a very detailed blog post describing mechanics with excellent examples (see the section called "The Default External and Internal Encodings").

Write string with encoding of UTF-8 to a file

Usually Ruby takes the $LANG env variable as a starter, if that one is set to utf-8, ruby should read files as utf-8 by default.

Wierd output characters (Chinese characters) when using Ruby to read / write CSV

The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb" with utf-16le encoding. Write the output in the same format.

num_lines=ARGV[0]  

# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")

until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end

Useful references:

  • Working with encodings in Ruby 1.9
  • CSV encodings
  • Determining the encoding of a CSV file


Related Topics



Leave a reply



Submit