Convert Unicode into Character with Ruby

How to convert a unicode string to its symbol characters in Ruby?

In Ruby 1.9:

"\u041D\u043E\u0432\u0438\u043D\u0438".encode("UTF-8")
=> "Новини"

convert unicode into character with ruby

[22269].pack('U*') #=> "国" or "\345\233\275"

Edit: Works in 1.8.6+ (verified in 1.8.6, 1.8.7, and 1.9.2). In 1.8.x you get a three-byte string representing the single Unicode character, but using puts on that causes the correct Chinese character to appear in the terminal.

Convert unicode to characters in a file using Ruby

So, I have tried to reproduce your problem and got the same result as described by using your solution.

I have noticed that \u003B (for example) is a unicode code for semicolon character. So, I analyzed the string for each "U+" notation using regex /\\u(.{4})/, as it marks "hexadecimal digits" as being Unicode code points. Then used gsub! and Array#pack to convert and substitute each of the Unicode chars.

[$1.to_i(16)].pack('U') # => "\n", "\n", "<", "&", "\n", "=" ...etc.

And finally wrote the result to a file. So, my final approach looks like this:

code = File.read('code.txt')

code.gsub!(/\\u(.{4})/) do |match|
[$1.to_i(16)].pack('U')
end

File.open('solution.cpp', 'w') { |f| f.puts code.gsub!(/\A"|"\Z/, '') }

Also note, I have used gsub again at the end, to search for the leading or trailing quote and replace it with an empty string when writing to a file.

Convert unicode codepoint to string character in Ruby

How about:

# Using pack
puts ["2B71F".hex].pack("U")

# Using chr
puts (0x2B71F).chr(Encoding::UTF_8)

In Ruby 1.9+ you can also do:

puts "\u{2B71F}"

I.e. the \u{} escape sequence can be used to decode Unicode codepoints.

In Ruby, how to convert special characters like ë,à,é,ä all to e,a,e,a?

Starting with Ruby 2.2, there is String#unicode_normalize to normalize unicode strings. The NFKD form separates character and punctuation:

'ë'.unicode_normalize(:nfkd).chars
#=> ["e", "̈"]
# ^ ^
# char punctuation

Since the character is a valid ASCII codepoint and the punctuation is not, this can be used to remove the latter:

'ë,à,é,ä'.unicode_normalize(:nfkd).encode('ASCII', replace: '')
#=> "e,a,e,a"

ruby: unicode character decimal value to \uXXXX conversion? .ord method not working

mu is too short's answer is cool.

But, the simplest answer is:

'好'.ord.to_s(16)     # => '597d'

Convert unicode chars in ruby string which already encoded in UTF-8

utf-8 is an encoding for unicode characters. You don't have to convert anything, your characters are already encoded in utf-8. If they are displayed as \u0131 or ı depends on the displaying program.

Ruby Output Unicode Character

In Ruby 1.9.x+

Use String#encode:

checkmark = "\u2713"
puts checkmark.encode('utf-8')

prints


In Ruby 1.8.7

puts '\u2713'.gsub(/\\u[\da-f]{4}/i) { |m| [m[-4..-1].to_i(16)].pack('U') }

convert unicode to text in ruby

After some researches and with help from another forum, I managed to use CSV instead. This was the code that worked for me:

CSV.foreach(filename, { :row_sep => :auto, :col_sep => "\t", :encoding => 'UTF-16:UTF-8'}) do |row|

In the end, CSV suited me better, because this is a tab-delimited file.

Thank you all for your comments anyway !



Related Topics



Leave a reply



Submit