How Does the Magic Comment ( # Encoding: Utf-8 ) in Ruby​​ Work

How does the magic comment ( # Encoding: utf-8 ) in ruby​​ work?

Ruby interpreter instructions at the top of the source file - this is called magic comment. Before processing your source code interpreter reads this line and sets proper encoding. It's quite common for interpreted languages I believe. At least Python uses the same approach.

You can specify encoding in a number of different ways (some of them are recognized by editors):

# encoding: UTF-8
# coding: UTF-8
# -*- coding: UTF-8 -*-

You can read some interesting stuff about source encoding in this article.

The only thing I'm aware of that has similar construction is shebang, but it is related to Unix shells in general and is not Ruby-specific.

magic_comments defined in ruby/ruby

How can I avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

Explicit is better than implicit. Writing out the name of the encoding is good for your text editor, your interpreter, and anyone else who wants to look at the file. Different platforms have different defaults -- UTF-8, Windows-1252, Windows-1251, etc. -- and you will either hamper portability or platform integration if you automatically pick one over the other. Requiring more explicit encodings is a Good Thing.

It might be a good idea to integrate your Rails app with GetText. Then all of your UTF-8 strings will be isolated to a small number of translation files, and your Ruby modules will be clean ASCII.

utf-8 encoding in gemspec, does it apply to the source files?

The file encoding header specifies the encoding for that file. It doesn't specify the encoding of other files. How could it?

Effectively UTF-8 encode a string

"J\u00E9r\u00E9my".encoding
#=> #<Encoding:UTF-8>
"J\u00E9r\u00E9my".each_codepoint.to_a
#=> [74, 233, 114, 233, 109, 121]

The strings are perfectly fine. They contain the correct bytes and have the correct encoding.

They are printed this way because your external encoding is set to (or recognised as) US-ASCII:

Encoding.default_external
#=> #<Encoding:US_ASCII>

Ruby assumes that your terminal can only render ASCII characters and therefore prints UTF-8 characters using escape sequences. (when using p / String#inspect)

The external encoding is usually determined automatically based on your locale:

$ LANG=C            ruby -e 'p Encoding.default_external'
#<Encoding:US-ASCII>

$ LANG=en_US.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:UTF-8>

Setting your terminal's or system's encoding / locale to UTF-8 should fix the problem.

Ruby - UTF-8 file encoding

No, there are not "exactly 3 ways" to specify the 'magic comment' -- there are an infinite number of them. Any comment on the first line that contains coding: will work, according to JEG2:

... the preferred way to set your source Encoding ... it's called a magic comment. If the first line of your code is a comment that includes the word coding, followed by a colon and space, and then an Encoding name, the source Encoding for that file is changed to the indicated Encoding.

So, any of these should work:

# coding: UTF-8
# encoding: UTF-8
# zencoding: UTF-8
# vocoding: UTF-8
# fun coding: UTF-8
# decoding: UTF-8
# 863280148705622662 coding: UTF-8 0072364213
# It was the night before Christmas and all through the house, not a creature was coding: UTF-8, not even with a mouse.

Ruby: how to add # encoding: UTF-8 automatically?

Try magic_encoding gem, it can insert uft-8 magic comment to all ruby files in your app.

[EDIT]
Having switched to SublimeText now I use auto-encoding-for-ruby plugin.

UTF-8 encoding not work with gets method in Ruby

Setting the file encoding using the "magic" comment on top of the file only specifies the encoding of your source code in the file (that is: the encoding of string literals created directly from the parser in your code).

Ruby knows two other default encodings:

  • the external encoding - this specifies the default encoding of data read from external sources (such as the console, opened files, network sockets, ...)
  • the internal encoding - data read from external sources will be transformed into the default internal encoding after reading to ensure you can use compatible encodings everywhere (this is not used by default, the external encoding is thus preserved).

In your case, you have not set the external encoding. On Windows and with Ruby before version 3.0, Ruby assumes the local console encoding of your Windows installation here (such as cp850 in Western Europe).

When Ruby reads your String, it assumes it to be in cp850 encoding (or whatever your default encoding is) while you likely provide utf-8 encoded data. As spoon as you start to operate on this incorrectly encoded data, you will get errors similar to the one you have seen there.

Thus, to be able to correctly read data you need to either provide it with an encoding matching your shell encoding, or you need to tell Ruby which encoding it should assume there.

If you are providing UTF-8 encoded data, you can set the expected encoding using the -E switch when invoking ruby, e.g.:

ruby -E utf-8 your_program.rb

You can also set this in an environment variable of your Windows shell using

set RUBYOPT=-Eutf-8

In Ruby 3.0, the default external encoding on Windows was changed so that it now defaults to UTF-8 on Windows, similar to other platforms. See https://bugs.ruby-lang.org/issues/16604 for details.

Ruby magic comment doesn't work?

? I'm not quite sure but I ussualy use this for Cyrillic

$KCODE = 'u'
require 'jcode'


Related Topics



Leave a reply



Submit