Rails: encoding woes with serialized hashes despite UTF8
This seems to have been caused by a difference in the behaviour of the two available YAML engines "syck" and "psych".
To set the YAML engine to syck:
YAML::ENGINE.yamler = 'syck'
To set the YAML engine back to psych:
YAML::ENGINE.yamler = 'psych'
The "syck" engine processes the strings as expected and converts them to hashes with proper Chinese strings. When the "psych" engine is used (default in ruby 1.9.3), the conversion results in garbled strings.
Adding the above line (the first of the two) to config/application.rb
fixes this problem. The "syck" engine is no longer maintained, so I should probably only use this workaround to buy me some time to make the strings acceptable for "psych".
hash strings get improperly encoded
The default internal and external encodings are aimed at IO operations:
- CSV
- File data read from disk
- File names from Dir
- etc...
The easiest thing for you to do is to add a # encoding=utf-8
comment to tell Ruby that the source file is UTF-8 encoded. For example, if you run this:
# encoding=utf-8
H = { 'this' => 'that' }
puts H.keys.first.encoding
as a stand-alone Ruby script you'll get UTF-8, but if you run this:
H = { 'this' => 'that' }
puts H.keys.first.encoding
you'll probably get US-ASCII.
Existing data serialized as hash produces error when upgrading to Rails 5
From the fine manual:
serialize(attr_name, class_name_or_coder = Object)
[...] If
class_name
is specified, the serialized object must be of that class on assignment and retrieval. OtherwiseSerializationTypeMismatch
will be raised.
So when you say this:
serialize :social_media, Hash
ActiveRecord will require the unserialized social_media
to be a Hash
. However, as noted by vnbrs, ActionController::Parameters
no longer subclasses Hash
like it used to and you have a table full of serialized ActionController::Parameters
instances. If you look at the raw YAML data in your social_media
column, you'll see a bunch of strings like:
--- !ruby/object:ActionController::Parameters...
rather than Hashes like this:
---\n:key: value...
You should fix up all your existing data to have YAMLized Hashes in social_media
rather than ActionController::Parameters
and whatever else is in there. This process will be somewhat unpleasant:
- Pull each
social_media
out of the table as a string. - Unpack that YAML string into a Ruby object:
obj = YAML.load(str)
. - Convert that object to a Hash:
h = obj.to_unsafe_h
. - Write that Hash back to a YAML string:
str = h.to_yaml
. - Put that string back into the database to replace the old one from (1).
Note the to_unsafe_h
call in (3). Just calling to_h
(or to_hash
for that matter) on an ActionController::Parameters
instance will give you an exception in Rails5, you have to include a permit
call to filter the parameters first:
h = params.to_h # Exception!
h = params.permit(:whatever).to_h # Indifferent access hash with one entry
If you use to_unsafe_h
(or to_unsafe_hash
) then you get the whole thing in a HashWithIndifferentAccess
. Of course, if you really want a plain old Hash then you'd say:
h = obj.to_unsafe_h.to_h
to unwrap the indifferent access wrapper as well. This also assumes that you only have ActionController::Parameters
in social_media
so you might need to include an obj.respond_to?(:to_unsafe_hash)
check to see how you unpack your social_media
values.
You could do the above data migration through direct database access in a Rails migration. This could be really cumbersome depending on how nice the low level MySQL interface is. Alternatively, you could create a simplified model class in your migration, something sort of like this:
class YourMigration < ...
class ModelHack < ApplicationRecord
self.table_name = 'clubs'
serialize :social_media
end
def up
ModelHack.all.each do |m|
# Update this to match your real data and what you want `h` to be.
h = m.social_media.to_unsafe_h.to_h
m.social_media = h
m.save!
end
end
def down
raise ActiveRecord::IrreversibleMigration
end
end
You'd want to use find_in_batches
or in_batches_of
instead all
if you have a lot of Club
s of course.
If your MySQL supports json
columns and ActiveRecord works with MySQL's json
columns (sorry, PostgreSQL guy here), then this might be a good time to change the column to json
and run far away from serialize
.
Rails 3.2, saving serialized hash will not save number_with_delimiter()
Yes, this is a Rails (ActiveSupport) bug that was eventually fixed in Rails 4.2.1. From the 4.2.1 release notes:
Fixed a roundtrip problem with AS::SafeBuffer where primitive-like strings will be dumped as primitives
When you use helper.number_with_delimiter
, the resulting object looks and behaves like a String, but in reality it is an ActiveSupport::SafeBuffer
.
helper.number_with_delimiter(123456).class # => ActiveSupport::SafeBuffer < String
When you use:
serialize :stuff, Hash
That means behind the scenes Rails is using YAML format to save the data to the database. There was a bug in SafeBuffer that caused SafeBuffers like "123"
to be mistakenly converted to integers (i.e. 123
) instead of remaining strings when saving and loading to/from YAML.
Again, this is now fixed as of Rails 4.2.1. You can see the fix here:
https://github.com/rails/rails/commit/debe7aedda3665702d1f99a3ffb4a123a6c44e9c
Related Topics
Rails Put Validation in a Module Mixin
How to Test 'Rand()' with Rspec
To_Specs': Could Not Find Chef (>= 0) Amongst [] (Gem::Loaderror)
Invalid Gemspec -Illformed Requirement ["#<Yaml::Syck::Defaultkey:0Xb5F9C990> 3.2.0"]
What Do I Need to Do to Get the Blog to Work in Rails 4.2
Undefined Local Variable for Hash in Method Ruby
Can You Get Db Username, Pw, Database Name in Rails
Format the Date Using Ruby on Rails
How to Remove '---' on Top of a Yaml File
How to Change Column Type in Heroku
Ruby Forgets Local Variables During a While Loop
Problem with Quantifiers and Look-Behind
Ruby - Send Get Request with Headers
Chaining Methods Using Symbol#To_Proc Shorthand in Ruby
Ruby | Find a Way to Find an Exception on the Same Word to Capitalize