Ruby: How to Add "# Encoding: Utf-8" Automatically

Ruby: how to add # encoding: UTF-8 automatically?

Try magic_encoding gem, it can insert uft-8 magic comment to all ruby files in your app.

[EDIT]
Having switched to SublimeText now I use auto-encoding-for-ruby plugin.

Set UTF-8 as default for Ruby 1.9.3

To change the source encoding (i.e. the encoding your actual written source code is in), you have to use the magic comment currently:

# encoding: utf-8

It is not enough to either set the internal encoding (the encoding of the internal string representation after conversion) or the external encoding (the assumed encoding of read files). You actually have to set the magic encoding comment on top of files to set the source encoding.

In ChiliProject we have a rake task which sets the correct encoding header in all files automatically before a release.

As for encoding defaults:

  • Ruby 1.8 and below didn't knew the concept of string encodings at all. Strings were more or less byte arrays.
  • Ruby 1.9: default string encoding is US_ASCII everywhere.
  • Ruby 2.0 and above: default string encoding is UTF-8.

Thus, if you use Ruby 2.0, you could skip the encoding comment and correctly assume UTF-8 encoding everywhere by default.

Effectively UTF-8 encode a string

"J\u00E9r\u00E9my".encoding
#=> #<Encoding:UTF-8>
"J\u00E9r\u00E9my".each_codepoint.to_a
#=> [74, 233, 114, 233, 109, 121]

The strings are perfectly fine. They contain the correct bytes and have the correct encoding.

They are printed this way because your external encoding is set to (or recognised as) US-ASCII:

Encoding.default_external
#=> #<Encoding:US_ASCII>

Ruby assumes that your terminal can only render ASCII characters and therefore prints UTF-8 characters using escape sequences. (when using p / String#inspect)

The external encoding is usually determined automatically based on your locale:

$ LANG=C            ruby -e 'p Encoding.default_external'
#<Encoding:US-ASCII>

$ LANG=en_US.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:UTF-8>

Setting your terminal's or system's encoding / locale to UTF-8 should fix the problem.

In Ruby on Rails, are '#encoding: utf-8' and 'config.encoding = utf-8 ' different?

The config.encoding = "utf-8" part in config/application.rb is related to how rails should interpret content.

#encoding: utf-8 in a ruby file tells ruby that this file contains non-ascii characters.

These two cases are different. The first one (in config/application.rb) tells rails something, and has nothing at all to do with how ruby itself should interpret source files.

You can set the environment variable RUBYOPT=-Ku if you're lazy and want ruby to automatically set the default file encoding of .rb files to utf-8, but I'd rather recommend that you put your non-ascii bits in a translation file and reference that with I18n.t.

How can I avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

Explicit is better than implicit. Writing out the name of the encoding is good for your text editor, your interpreter, and anyone else who wants to look at the file. Different platforms have different defaults -- UTF-8, Windows-1252, Windows-1251, etc. -- and you will either hamper portability or platform integration if you automatically pick one over the other. Requiring more explicit encodings is a Good Thing.

It might be a good idea to integrate your Rails app with GetText. Then all of your UTF-8 strings will be isolated to a small number of translation files, and your Ruby modules will be clean ASCII.

Rails and Utf-8 encoding

I couldn't fix this problem. I "solved" it by purchasing a Mac and continue to develop my Rails apps on it instead...

Encoding - me: 1-0



Related Topics



Leave a reply



Submit