Ruby To_Yaml Utf8 String

Ruby to_yaml utf8 string

This is probably a really bad idea as I'm sure YAML has its reasons for encoding the characters as it does, but it doesn't seem too hard to undo:

require 'yaml'
require 'yaml/encoding'

text = "Ça va bien?"

puts text.to_yaml(:Encoding => :Utf8) # => --- "\xC3\x87a va bien?"
puts YAML.unescape(YAML.dump(text)) # => --- "Ça va bien?"

In ruby-on-rails, how to convert the '\X93' like string format to its original look?

Use ya2yaml:

require 'ya2yaml'
$KCODE = "UTF8"
"你好".ya2yaml #=> "--- 你好\n"

Rails 3 Routes, Yaml & Escaping UTF-8

I was able to store the original UTF-8 string without converting the string to Ruby internal encoding by using ya2yaml instead of to_yaml. However, I had to encode the params hash's keys & values to UTF-8(keys were being encoded as ASCII-8BIT and values as UTF-8) before the yaml was generated properly:

  def utf8_hash(some_hash) # convert hash key & values to utf-8 for proper translation
new_hash = Hash.new
some_hash.each do |key, value|
new_hash[key.encode(Encoding::UTF_8)] = value.to_s.encode(Encoding::UTF_8)
end
new_hash
end

utf8_hash(params).ya2yaml

Converting a YAML response w/ binary data to UTF-8 in Ruby 1.8.7

That appears to be a YAML document, not JSON, using YAML's binary data language (which in turn uses base64 encoding).

Ruby's built in YAML parsing library should be able to parse the data for you:

> x = YAML.load('      response: 
job:
unit_count: "1"
slug: Answers
lc_tgt: ja
body_tgt: !binary |
5Zue562U

lc_src: en
body_src: Answers
job_id: "1948888"
opstat: ok')
=> {"opstat"=>"ok", "response"=>{"job"=>{"slug"=>"Answers",
"unit_count"=>"1", "lc_tgt"=>"ja", "lc_src"=>"en", "body_tgt"=>"回答",
"job_id"=>"1948888", "body_src"=>"Answers"}}}

In order to produce YAML with UTF-8 directly embedded, instead of escaped as binary objects, you can use ya2yaml, "yet another to_yaml" implementation, which can produce output encoded as UTF-8. Install the ya2yaml gem, and then invoke it as:

> require 'ya2yaml'
> x.ya2yaml(:syck_compatible => true)

Rails: encoding woes with serialized hashes despite UTF8

This seems to have been caused by a difference in the behaviour of the two available YAML engines "syck" and "psych".
To set the YAML engine to syck:

YAML::ENGINE.yamler = 'syck'

To set the YAML engine back to psych:

YAML::ENGINE.yamler = 'psych'

The "syck" engine processes the strings as expected and converts them to hashes with proper Chinese strings. When the "psych" engine is used (default in ruby 1.9.3), the conversion results in garbled strings.

Adding the above line (the first of the two) to config/application.rb fixes this problem. The "syck" engine is no longer maintained, so I should probably only use this workaround to buy me some time to make the strings acceptable for "psych".

ruby 1.8.7 why .to_yaml converts some Strings to non-readable bytes

Whether YAML prefers to dump a string as text or binary is a matter of ratio between ASCII and non ASCII characters.

If you want to avoid !binary as much as possible, you should use the ya2yaml gem. It tries hard to dump strings as ASCII + escaped UTF-8.

The simplest way to puts sterling-pound in ruby from a yaml file

The external encoding is your issue; Ruby is assuming that any data read from external files is CP-850, rather than UTF-8.

You can solve this a few ways:

  1. Set Encoding.default_external ='utf-8'. This will tell Ruby to read files as UTF-8 by default.
  2. Explicitly read your file as UTF-8, via open('file.yml', 'r:utf-8')
  3. Convert your string to UTF-8 before you pass it to your YAML parser:

You can do this via String#force_encoding, which tells Ruby to reinterpret the raw bytes with a different encoding:

 text = open("file.yml").read
text.force_encoding("utf-8")
YAML.load text


Related Topics



Leave a reply



Submit