How to Avoid Putting the Magic Encoding Comment on Top of Every Utf-8 File in Ruby 1.9

How can I avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

Explicit is better than implicit. Writing out the name of the encoding is good for your text editor, your interpreter, and anyone else who wants to look at the file. Different platforms have different defaults -- UTF-8, Windows-1252, Windows-1251, etc. -- and you will either hamper portability or platform integration if you automatically pick one over the other. Requiring more explicit encodings is a Good Thing.

It might be a good idea to integrate your Rails app with GetText. Then all of your UTF-8 strings will be isolated to a small number of translation files, and your Ruby modules will be clean ASCII.

How does the magic comment ( # Encoding: utf-8 ) in ruby​​ work?

Ruby interpreter instructions at the top of the source file - this is called magic comment. Before processing your source code interpreter reads this line and sets proper encoding. It's quite common for interpreted languages I believe. At least Python uses the same approach.

You can specify encoding in a number of different ways (some of them are recognized by editors):

# encoding: UTF-8
# coding: UTF-8
# -*- coding: UTF-8 -*-

You can read some interesting stuff about source encoding in this article.

The only thing I'm aware of that has similar construction is shebang, but it is related to Unix shells in general and is not Ruby-specific.

magic_comments defined in ruby/ruby

Make Ruby 1.9 regard all source files to be UTF-8 encoded. (Even if recompiling the interpreter is necessary)

I found a workaround:
set the RUBYOPT environment variable, for example by executing

export RUBYOPT=-Ku

in your shell.

This will set -Ku als default option when calling ruby. You can now call all other tools which invoke ruby without worrying about parameters. rails server or rake works and regards all files as UTF-8. No BOM or magic comments necessary!

utf-8 encoding in gemspec, does it apply to the source files?

The file encoding header specifies the encoding for that file. It doesn't specify the encoding of other files. How could it?

Reading ASCII-encoded files with Ruby 1.9 in a UTF-8 environment

What's your locale set to in the shell? In Linux-based systems you can check this by running the locale command and change it by e.g.

$ export LANG=en_US

My guess is that you are using locale settings which have UTF-8 encoding and this is causing Ruby to assume that the text files were created according to utf-8 encoding rules. You can see this by trying

$ LANG=en_GB ruby -e 'warn "foo".encoding.name'
US-ASCII
$ LANG=en_GB.UTF-8 ruby -e 'warn "foo".encoding.name'
UTF-8

For a more general treatment of how string encoding has changed in Ruby 1.9 I thoroughly recommend
http://blog.grayproductions.net/articles/ruby_19s_string

(code examples assume bash or similar shell - C-shell derivatives are different)

How do i prevent emacs from adding coding information in the first line?

It looks like this is part of the ruby-mode in emacs.

I found a link to an article that shows how to edit the ruby-mode.el file. Not sure if it works, but there is also a comment on that article that may work better:

(setq ruby-insert-encoding-magic-comment nil)

If instead of using ruby-mode your are using enh-ruby-mode you should set this variable:

(setq enh-ruby-add-encoding-comment-on-save nil)

Links:

Fix: Emacs/Aquamacs keeps adding encoding comments to my files

Also, semi-related question but pertinent answer by Michael Kohl: How can I avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

Enh-ruby-mode comment encoding line

Set global default encoding for ruby 1.9

You can either:

  1. set your RUBYOPT environment variable to "-E utf-8"
  2. or use https://github.com/m-ryan/magic_encoding

Batch convert to UTF8 using Ruby


Unfortunately that's not how it is done - the file is still in ANSI. At least that's what my Notepad++ says.

UTF-8 was designed to be a superset of ASCII, which means that most of the printable ASCII characters are the same in UTF-8. For this reason it's not possible to distinguish between ASCII and UTF-8 unless you have "special" characters. These special characters are represented using multiple bytes in UTF-8.

It's well possible that your conversion is actually working, but you can double-check by trying your program with special characters.

Also, one of the best utilities for converting between encodings is iconv, which also has ruby bindings.

How do i prevent emacs from adding coding information in the first line?

It looks like this is part of the ruby-mode in emacs.

I found a link to an article that shows how to edit the ruby-mode.el file. Not sure if it works, but there is also a comment on that article that may work better:

(setq ruby-insert-encoding-magic-comment nil)

If instead of using ruby-mode your are using enh-ruby-mode you should set this variable:

(setq enh-ruby-add-encoding-comment-on-save nil)

Links:

Fix: Emacs/Aquamacs keeps adding encoding comments to my files

Also, semi-related question but pertinent answer by Michael Kohl: How can I avoid putting the magic encoding comment on top of every UTF-8 file in Ruby 1.9?

Enh-ruby-mode comment encoding line



Related Topics



Leave a reply



Submit