Ruby Output Unicode Character

Ruby Output Unicode Character

In Ruby 1.9.x+

Use String#encode:

checkmark = "\u2713"
puts checkmark.encode('utf-8')

prints


In Ruby 1.8.7

puts '\u2713'.gsub(/\\u[\da-f]{4}/i) { |m| [m[-4..-1].to_i(16)].pack('U') }

How to print unicode charaters in Command Prompt with Ruby

You need to enclose the unicode character in { and } if the number of hex digits isn't 4 (credit : /u/Stefan) e.g.:

heart = "\u2665"
package = "\u{1F4E6}"
fire_and_one_hundred = "\u{1F525 1F4AF}"
puts heart
puts package
puts fire_and_one_hundred

Alternatively you could also just put the unicode character directly in your source, which is quite easy at least on macOS with the Emoji & Symbols menu accessed by Ctrl + Command + Space by default (a similar menu can be accessed on Windows 10 by Win + ; ) in most applications including your text editor/Ruby IDE most likely:

macOS Emoji Menu

heart = "♥"
package = "br>fire_and_one_hundred = "br>puts heart
puts package
puts fire_and_one_hundred

Output:


br>br>

How it looks in the macOS terminal:

How it looks in the macOS terminal

Ruby display unicode character

I figured it out:

  • For cmd and Powershell, I have to change the font (I changed to Consolas)
  • In Eclipse, I have to change the encoding: go to Run/Run configurations, select the Common tab, change the Encoding to Other: UTF-8

How to print unicode character U-1F4A9 'pile of poo' emoji

Unicode code points with more than four hex digits must be enclosed in curly braces:

puts "\u{1f4a9}"
# => br>

This is pretty poorly documented, so don't feel bad about not figuring it out. A nice thing about the curly brace syntax is that you can embed multiple code points separated by spaces:

puts "\u{1f4a9 1f60e}"
# => br>

Of course, since Ruby 2.0, UTF-8 has been the default encoding, so you can always just put the emoji directly into your source:

puts "br># => br>

Unicode & Ruby - Expected behavior?

Ruby may show a string with various backslash encodings for various reasons, one of which is irregular characters. For example:

"
"
# => "\n"
'"'
# => "\""

This doesn't mean the string contains an actual backslash, but rather that the version shown by inspect contains one. This is a long tradition dating back at least to the era of C in the 1970s where \n and such have been understood to mean "newline character".

In the case of emoji you might find that some are displayed and others aren't. This may be an interaction between the version of Ruby you're using and the terminal settings. As emoji are constantly being introduced you might find older ones display properly but Ruby's not confident enough with new ones to render them as-is, perhaps concerned that's an invalid Unicode character. Rather than showing something blank or the infamous question mark character, it shows the literal code for the character.

ruby: unicode character decimal value to \uXXXX conversion? .ord method not working

mu is too short's answer is cool.

But, the simplest answer is:

'好'.ord.to_s(16)     # => '597d'

How do I escape a Unicode string with Ruby?

In Ruby 1.8.x, String#inspect may be what you are looking for, e.g.

>> multi_byte_str = "hello\330\271!"
=> "hello\330\271!"

>> multi_byte_str.inspect
=> "\"hello\\330\\271!\""

>> puts multi_byte_str.inspect
"hello\330\271!"
=> nil

In Ruby 1.9 if you want multi-byte characters to have their component bytes escaped, you might want to say something like:

>> multi_byte_str.bytes.to_a.map(&:chr).join.inspect
=> "\"hello\\xD8\\xB9!\""

In both Ruby 1.8 and 1.9 if you are instead interested in the (escaped) unicode code points, you could do this (though it escapes printable stuff too):

>> multi_byte_str.unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u0068\\u0065\\u006c\\u006c\\u006f\\u0639\\u0021"


Related Topics



Leave a reply



Submit