Character Encoding Issue in Rails V3/Ruby 1.9.2

Character Encoding issue in Rails v3/Ruby 1.9.2

Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.

Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:

File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }

If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).

Encoding error with Rails 2.3 on Ruby 1.9.3

I finally figured out what my issue was. While my databases were encoded with utf8, the app with the original mysql gem was injecting latin1 text into the utf8 tables.

What threw me off was that the output from the mysql comand line client looked correct. It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8.

MySQL's client runs in latin1 by default. You can discover what it is running in by issuing this query:

show variables like 'char%';

If setup properly for utf8 you should see:

+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file:

default-character-set = utf8

Add add the following to the [mysqld] section:

# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci

Make sure to restart the mysql daemon before relaunching the client and then verify.

NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8.

After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. I was also able to verify that this content was latin1 by switching to "encoding: latin1" temporarily in my database.conf.

One extremely handy query to find issues is using char length to find the rows with multi-byte characters:

SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);

There are a lot of scripts out there to convert latin1 contents to utf8, but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8:

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset  DBNAME > DBNAME.sql

mysql -u root -p --default-character-set=utf8 DBNAME < DBNAME.sql

I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB.

My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me.

Some resources that proved invaluable in sorting all of this out:

  • Derek Sivers guide on transforming MySQL data latin1 in utf8 -> utf8
  • Blue Box article on MySQL character set hell
  • Simple table conversion instructions on Stack Overlow

encoding and utf-8 exceptions after upgrade to Ruby 1.9.3 and rails 3.2

I had the same problem occuring "sometimes", I use now at the very top of each .rb files the following:

# encoding: UTF-8

class Whatever < ActiveRecord::Base
...
end

The problem occurs when the file contains one/several accent(s) (as a french guy, I sometimes use it in comments).

Is there a solution to the character encoding problem ( � ) for Rails 2 / Ruby 1.8.7?

You can use the rchardet gem to detect the encoding of incoming strings, and the built-in Iconv libs to convert strings that need conversion:

require ‘rchardet’

[...]

cd = CharDet.detect(params[:my_upload_form][:uploaded_file])
encoding = cd['encoding']

converted_string = Iconv.conv(‘UTF-8′, encoding, params[:my_upload_form][:uploaded_file])

The example is working on an uploaded file, but of course you can apply it to data coming in from textareas or wherever else you think users may be pasting data in encodings other than the one you want.

Borrowed shamelessly from the kind gentleman at http://www.meeho.net/blog/2010/03/ruby-how-to-detect-the-encoding-of-a-string/.

Rails 3 Ruby 1.9.2: UTF-8 characters show garbled in console and view

After a long research I found the solution. It seems like the columns in question were double encoded. They used to have Latin1 collation and were not converted correctly to UTF8.

A proposed solution to change the column to a BLOB and then back to TEXT with UTF8 did not work:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

What did eventually work was:

mysqldump -uuser -ppassword --opt --quote-names --skip-set-charset --default-character-set=latin1 dbname1 table1 > dump.sql
mysql -uuser -ppassword --default-character-set=utf8 dbname1 < dump.sql

UTF-8 encoding not work with gets method in Ruby

Setting the file encoding using the "magic" comment on top of the file only specifies the encoding of your source code in the file (that is: the encoding of string literals created directly from the parser in your code).

Ruby knows two other default encodings:

  • the external encoding - this specifies the default encoding of data read from external sources (such as the console, opened files, network sockets, ...)
  • the internal encoding - data read from external sources will be transformed into the default internal encoding after reading to ensure you can use compatible encodings everywhere (this is not used by default, the external encoding is thus preserved).

In your case, you have not set the external encoding. On Windows and with Ruby before version 3.0, Ruby assumes the local console encoding of your Windows installation here (such as cp850 in Western Europe).

When Ruby reads your String, it assumes it to be in cp850 encoding (or whatever your default encoding is) while you likely provide utf-8 encoded data. As spoon as you start to operate on this incorrectly encoded data, you will get errors similar to the one you have seen there.

Thus, to be able to correctly read data you need to either provide it with an encoding matching your shell encoding, or you need to tell Ruby which encoding it should assume there.

If you are providing UTF-8 encoded data, you can set the expected encoding using the -E switch when invoking ruby, e.g.:

ruby -E utf-8 your_program.rb

You can also set this in an environment variable of your Windows shell using

set RUBYOPT=-Eutf-8

In Ruby 3.0, the default external encoding on Windows was changed so that it now defaults to UTF-8 on Windows, similar to other platforms. See https://bugs.ruby-lang.org/issues/16604 for details.

Ruby encoding issue showing weird characters on emails

Rails sanitises the string of InquirySetting.confirmation_message(Globalize.locale).gsub("%name%", @inquiry.name). It does this in order to prevent html tags and the like within possibly user provided strings to be rendered by the server. Otherwise your application would be open to XSS attacks.

If you know the string for the confirmation_message to not be user changeable contents you can deactivate Rails' security mechanism by declaring the string as safe. Do so by changing the contents of the template to <%= raw InquirySetting.confirmation_message(Globalize.locale).gsub("%name%", @inquiry.name) %>.

Again, please only do so if you trust the source of the template's contents.

Ruby on Rails 3, incompatible character encodings: UTF-8 and ASCII-8BIT with i18n

Ok so problem solved after some hours of googling...

There was actually two bugs in my code. The first one was a file encoding error and the second was the problem with the MySQL Data base configuration.

First, to solve the error caused by MySQL I used this two articles :

http://www.dotkam.com/2008/09/14/configure-rails-and-mysql-to-support-utf-8/

http://www.rorra.com.ar/2010/07/30/rails-3-mysql-and-utf-8/

Second, to solve the file encoding problem I added these 2 lines in my config/environment.rb

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

Hopefully this will help someone :)

ruby on rails actionmailer receive character encoding issues

Are you sure the encoding of the incoming email is UTF-8? My solution was to grab the charset from the header.raw_source

/charset=(?<charset>[^\s]+)\s?/ =~ part.header.raw_source

and if that is not null, gsub out any quotes and try to encode that

charset.gsub!('"', '')
raw = part.body.to_s
raw.force_encoding(charset)
decoded = raw.encode("UTF-8")


Related Topics



Leave a reply



Submit