How to convert a unicode string to its symbol characters in Ruby?
In Ruby 1.9:
"\u041D\u043E\u0432\u0438\u043D\u0438".encode("UTF-8")
=> "Новини"
convert unicode into character with ruby
[22269].pack('U*') #=> "国" or "\345\233\275"
Edit: Works in 1.8.6+ (verified in 1.8.6, 1.8.7, and 1.9.2). In 1.8.x you get a three-byte string representing the single Unicode character, but using puts
on that causes the correct Chinese character to appear in the terminal.
Convert unicode to characters in a file using Ruby
So, I have tried to reproduce your problem and got the same result as described by using your solution.
I have noticed that \u003B
(for example) is a unicode code for semicolon character. So, I analyzed the string for each "U+" notation using regex /\\u(.{4})/
, as it marks "hexadecimal digits" as being Unicode code points. Then used gsub! and Array#pack to convert and substitute each of the Unicode chars.
[$1.to_i(16)].pack('U') # => "\n", "\n", "<", "&", "\n", "=" ...etc.
And finally wrote the result to a file. So, my final approach looks like this:
code = File.read('code.txt')
code.gsub!(/\\u(.{4})/) do |match|
[$1.to_i(16)].pack('U')
end
File.open('solution.cpp', 'w') { |f| f.puts code.gsub!(/\A"|"\Z/, '') }
Also note, I have used gsub
again at the end, to search for the leading or trailing quote and replace it with an empty string when writing to a file.
Convert unicode codepoint to string character in Ruby
How about:
# Using pack
puts ["2B71F".hex].pack("U")
# Using chr
puts (0x2B71F).chr(Encoding::UTF_8)
In Ruby 1.9+ you can also do:
puts "\u{2B71F}"
I.e. the \u{}
escape sequence can be used to decode Unicode codepoints.
In Ruby, how to convert special characters like ë,à,é,ä all to e,a,e,a?
Starting with Ruby 2.2, there is String#unicode_normalize
to normalize unicode strings. The NFKD form separates character and punctuation:
'ë'.unicode_normalize(:nfkd).chars
#=> ["e", "̈"]
# ^ ^
# char punctuation
Since the character is a valid ASCII codepoint and the punctuation is not, this can be used to remove the latter:
'ë,à,é,ä'.unicode_normalize(:nfkd).encode('ASCII', replace: '')
#=> "e,a,e,a"
ruby: unicode character decimal value to \uXXXX conversion? .ord method not working
mu is too short's answer is cool.
But, the simplest answer is:
'好'.ord.to_s(16) # => '597d'
Convert unicode chars in ruby string which already encoded in UTF-8
utf-8 is an encoding for unicode characters. You don't have to convert anything, your characters are already encoded in utf-8. If they are displayed as \u0131
or ı
depends on the displaying program.
Ruby Output Unicode Character
In Ruby 1.9.x+
Use String#encode
:
checkmark = "\u2713"
puts checkmark.encode('utf-8')
prints
✓
In Ruby 1.8.7
puts '\u2713'.gsub(/\\u[\da-f]{4}/i) { |m| [m[-4..-1].to_i(16)].pack('U') }
✓
convert unicode to text in ruby
After some researches and with help from another forum, I managed to use CSV instead. This was the code that worked for me:
CSV.foreach(filename, { :row_sep => :auto, :col_sep => "\t", :encoding => 'UTF-16:UTF-8'}) do |row|
In the end, CSV suited me better, because this is a tab-delimited file.
Thank you all for your comments anyway !
Related Topics
How to Structure a Layout Template in Haml
Carrierwave Crop Specific Version
Ruby: How to Make a Public Static Method
Ruby: How to Count the Number of Times a String Appears in Another String
Ruby Koan 151 Raising Exceptions
Ruby Concatenate Strings and Add Spaces
Rails: Logging for Code in the Lib Directory
How to Access a Ruby Module Method
How to Require a Specific Version of a Ruby Gem
Differencebetween Methods and Attributes in Ruby
Redirect the "Puts" Command Output to a Log File
Rails: Ensure Only One Boolean Field Is Set to True at a Time
Resque Multiple Workers in Development Mode
Setting Ruby Hash .Default to a List
Accessing Ruby Class Variables with Class_Eval and Instance_Eval
Activerecord Objects in Hashes Aren't Garbage Collected -- a Bug or a Sort of Caching Feature