Unicode Characters in Ruby 1.9.3 Irb with Rvm

Unicode characters in Ruby 1.9.3 IRB with RVM

RVM has issues with readline installed via homebrew. This gist worked perfectly for me:

$ rvm get latest
$ rvm pkg install readline
$ rvm install 1.9.3 --with-readline-dir=$rvm_path/usr

Instead of install you can use reinstall.

to_json not converting special characters to unicode style

I ❤ Rails (just kidding.)

In Rails3 there was a hilarious method to damage UTF-8 in JSON. Rails4, thanks DHH, freed from this drawback.

So, whether one wants the time-back machine, the simplest way is to monkeypatch ::ActiveSupport::JSON::Encoding#escape:

module ::ActiveSupport::JSON::Encoding
def self.escape(string)
if string.respond_to?(:force_encoding)
string = string.encode(::Encoding::UTF_8, :undef => :replace)
.force_encoding(::Encoding::BINARY)
end
json = string.
gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
gsub(/([\xC0-\xDF][\x80-\xBF]|
[\xE0-\xEF][\x80-\xBF]{2}|
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
}
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
json
end
end

More robust solution would be to corrupt the result:

class String
def rails3_style
string = encode(::Encoding::UTF_8, :undef => :replace).
force_encoding(::Encoding::BINARY)
json = string.
gsub(/([\xC0-\xDF][\x80-\xBF]|
[\xE0-\xEF][\x80-\xBF]{2}|
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
}
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
json
end
end

puts "“".to_json.rails3_style
#⇒ "\u201c"

I hardly could understand why anybody might want to do this on purpose, but the solution is here.

Why doesn't this Unicode gsub substitution work in Ruby?

"ich bin doch nicht blöd, mann!".gsub("ä","ae").gsub("ö","oe").gsub("ü","ue")

Should do the trick

How can I input multibyte characters in rails console (or irb)?

I found the solution for me, it need to re-compile the readline. And now I can input non-ASCII characters!

Because I am using rvm, so I found this article to teach you how to re-compile readline under rvm. http://rvm.beginrescueend.com/packages/readline/

And for someone who is not using rvm, maybe you can follow this post and have a try:
http://henrik.nyh.se/2008/03/irb-readline

By the way, ruby-1.9.2 irb already supports non-ASCII inputing.

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

"€foo\xA0".chars.select(&:valid_encoding?).join

how to ensure irb accepts emoji input instead of escaping it?

Make sure Ruby is compiled with GNU Readline. When rvm compiles Ruby it automatically checks if Readline is installed, and if it is it will be included automatically.

You can check your Readline version in irb. Example:

ruby 2.1.3p242 (2014-09-19 revision 47630) [x86_64-linux]
irb(main):001:0> Readline::VERSION
=> "6.3"

So, copying your solution for installing the latest Readline with Homebrew, and then recompiling Ruby:

brew update; brew uninstall readline; brew install readline; rbenv install

Change Rails Console (IRB) Ruby Version OSX

It seems like your rbenv is actually configured correctly. Test it by simply running which ruby and you should see /Users/USERNAME/.rbenv/shims/ruby. The real problem is when you run rails console. The rails command comes with osx, and you probably don't have a shim for it in rbenv.

Try script/rails console from inside the project dir.

String with URL-ENCODING

You can use URI.unescape:

irb(main):003:0> require 'uri'
=> true
irb(main):006:0> URI.unescape("http%3A%2F%2Fmydomain.com%2Fimage%2Fflv%2F1%2F8%2Fa%2Fimage_18060.jpg%3Fe%3D13777194")
=> "http://mydomain.com/image/flv/1/8/a/image_18060.jpg?e=13777194"


Related Topics



Leave a reply



Submit