Rails: Encoding Woes with Serialized Hashes Despite Utf8

Rails: encoding woes with serialized hashes despite UTF8

This seems to have been caused by a difference in the behaviour of the two available YAML engines "syck" and "psych".
To set the YAML engine to syck:

YAML::ENGINE.yamler = 'syck'

To set the YAML engine back to psych:

YAML::ENGINE.yamler = 'psych'

The "syck" engine processes the strings as expected and converts them to hashes with proper Chinese strings. When the "psych" engine is used (default in ruby 1.9.3), the conversion results in garbled strings.

Adding the above line (the first of the two) to config/application.rb fixes this problem. The "syck" engine is no longer maintained, so I should probably only use this workaround to buy me some time to make the strings acceptable for "psych".

hash strings get improperly encoded

The default internal and external encodings are aimed at IO operations:

  • CSV
  • File data read from disk
  • File names from Dir
  • etc...

The easiest thing for you to do is to add a # encoding=utf-8 comment to tell Ruby that the source file is UTF-8 encoded. For example, if you run this:

# encoding=utf-8
H = { 'this' => 'that' }
puts H.keys.first.encoding

as a stand-alone Ruby script you'll get UTF-8, but if you run this:

H = { 'this' => 'that' }
puts H.keys.first.encoding

you'll probably get US-ASCII.

Existing data serialized as hash produces error when upgrading to Rails 5

From the fine manual:

serialize(attr_name, class_name_or_coder = Object)

[...] If class_name is specified, the serialized object must be of that class on assignment and retrieval. Otherwise SerializationTypeMismatch will be raised.

So when you say this:

serialize :social_media, Hash

ActiveRecord will require the unserialized social_media to be a Hash. However, as noted by vnbrs, ActionController::Parameters no longer subclasses Hash like it used to and you have a table full of serialized ActionController::Parameters instances. If you look at the raw YAML data in your social_media column, you'll see a bunch of strings like:

--- !ruby/object:ActionController::Parameters...

rather than Hashes like this:

---\n:key: value...

You should fix up all your existing data to have YAMLized Hashes in social_media rather than ActionController::Parameters and whatever else is in there. This process will be somewhat unpleasant:

  1. Pull each social_media out of the table as a string.
  2. Unpack that YAML string into a Ruby object: obj = YAML.load(str).
  3. Convert that object to a Hash: h = obj.to_unsafe_h.
  4. Write that Hash back to a YAML string: str = h.to_yaml.
  5. Put that string back into the database to replace the old one from (1).

Note the to_unsafe_h call in (3). Just calling to_h (or to_hash for that matter) on an ActionController::Parameters instance will give you an exception in Rails5, you have to include a permit call to filter the parameters first:

h = params.to_h                   # Exception!
h = params.permit(:whatever).to_h # Indifferent access hash with one entry

If you use to_unsafe_h (or to_unsafe_hash) then you get the whole thing in a HashWithIndifferentAccess. Of course, if you really want a plain old Hash then you'd say:

h = obj.to_unsafe_h.to_h

to unwrap the indifferent access wrapper as well. This also assumes that you only have ActionController::Parameters in social_media so you might need to include an obj.respond_to?(:to_unsafe_hash) check to see how you unpack your social_media values.

You could do the above data migration through direct database access in a Rails migration. This could be really cumbersome depending on how nice the low level MySQL interface is. Alternatively, you could create a simplified model class in your migration, something sort of like this:

class YourMigration < ...
class ModelHack < ApplicationRecord
self.table_name = 'clubs'
serialize :social_media
end

def up
ModelHack.all.each do |m|
# Update this to match your real data and what you want `h` to be.
h = m.social_media.to_unsafe_h.to_h
m.social_media = h
m.save!
end
end

def down
raise ActiveRecord::IrreversibleMigration
end
end

You'd want to use find_in_batches or in_batches_of instead all if you have a lot of Clubs of course.


If your MySQL supports json columns and ActiveRecord works with MySQL's json columns (sorry, PostgreSQL guy here), then this might be a good time to change the column to json and run far away from serialize.

Rails 3.2, saving serialized hash will not save number_with_delimiter()

Yes, this is a Rails (ActiveSupport) bug that was eventually fixed in Rails 4.2.1. From the 4.2.1 release notes:

Fixed a roundtrip problem with AS::SafeBuffer where primitive-like strings will be dumped as primitives

When you use helper.number_with_delimiter, the resulting object looks and behaves like a String, but in reality it is an ActiveSupport::SafeBuffer.

helper.number_with_delimiter(123456).class # => ActiveSupport::SafeBuffer < String

When you use:

serialize :stuff, Hash

That means behind the scenes Rails is using YAML format to save the data to the database. There was a bug in SafeBuffer that caused SafeBuffers like "123" to be mistakenly converted to integers (i.e. 123) instead of remaining strings when saving and loading to/from YAML.

Again, this is now fixed as of Rails 4.2.1. You can see the fix here:

https://github.com/rails/rails/commit/debe7aedda3665702d1f99a3ffb4a123a6c44e9c



Related Topics



Leave a reply



Submit