Convert unicode codepoint to string character in Ruby
How about:
# Using pack
puts ["2B71F".hex].pack("U")
# Using chr
puts (0x2B71F).chr(Encoding::UTF_8)
In Ruby 1.9+ you can also do:
puts "\u{2B71F}"
I.e. the \u{}
escape sequence can be used to decode Unicode codepoints.
Use Ruby to generate hex codepoints for Unicode values
Use String#rjust
:
[97, 127016].map { |i| "U+" << i.to_s(16).upcase.rjust(4, '0') }
#⇒ ["U+0061", "U+1F028"]
For other operations:
"U+0061"[/(?<=\AU\+).*/].to_i(16)
#⇒ 97
"U+0061"[/(?<=\AU\+).*/].prepend('0x')
#⇒ "0x0061"
NB: 0x61
might live as string only, since 0x61
and 97
are the same value internally, both represented by 97
.
convert unicode into character with ruby
[22269].pack('U*') #=> "国" or "\345\233\275"
Edit: Works in 1.8.6+ (verified in 1.8.6, 1.8.7, and 1.9.2). In 1.8.x you get a three-byte string representing the single Unicode character, but using puts
on that causes the correct Chinese character to appear in the terminal.
How to convert a unicode string to its symbol characters in Ruby?
In Ruby 1.9:
"\u041D\u043E\u0432\u0438\u043D\u0438".encode("UTF-8")
=> "Новини"
Convert unicode to characters in a file using Ruby
So, I have tried to reproduce your problem and got the same result as described by using your solution.
I have noticed that \u003B
(for example) is a unicode code for semicolon character. So, I analyzed the string for each "U+" notation using regex /\\u(.{4})/
, as it marks "hexadecimal digits" as being Unicode code points. Then used gsub! and Array#pack to convert and substitute each of the Unicode chars.
[$1.to_i(16)].pack('U') # => "\n", "\n", "<", "&", "\n", "=" ...etc.
And finally wrote the result to a file. So, my final approach looks like this:
code = File.read('code.txt')
code.gsub!(/\\u(.{4})/) do |match|
[$1.to_i(16)].pack('U')
end
File.open('solution.cpp', 'w') { |f| f.puts code.gsub!(/\A"|"\Z/, '') }
Also note, I have used gsub
again at the end, to search for the leading or trailing quote and replace it with an empty string when writing to a file.
Replacing %uXXXX to the corresponding Unicode codepoint in Ruby
Try this code:
string.gsub(/%u([0-9A-F]{4})/i){[$1.hex].pack("U")}
In the comments, cremno has a better faster solution:
string.gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}
In the comments, bobince adds important restrictions, worth reading in full.
converting Unicode code point numbers to Unicode characters
What you may want to look at is the raw_unicode_escape
encoding.
>>> len(b'\\uffff')
6
>>> b'\\uffff'.decode('raw_unicode_escape')
'\uffff'
>>> len(b'\\uffff'.decode('raw_unicode_escape'))
1
So, the function would be:
def ParseString2Unicode(sInString):
try:
decoded = sInString.encode('utf-8')
return decoded.decode('raw_unicode_escape')
except UnicodeError:
return sInString
This, however, also matches other unicode escape sequences, like \Uxxxxxxxx
. If you just want to match \uxxxx
, use a regex, like so:
import re
escape_sequence_re = re.compile(r'\\u[0-9a-fA-F]{4}')
def _escape_sequence_to_char(match):
return chr(int(match[0][2:], 16))
def ParseString2Unicode(sInString):
return re.sub(escape_sequence_re, _escape_sequence_to_char, sInString)
Related Topics
Ruby:Difference Between Instance and Local Variables in Ruby
Is Order of a Ruby Hash Literal Guaranteed
Private Module Methods in Ruby
How to Implement a Friendship Model in Rails 3 for a Social Networking Application
Rails Devise: User_Signed_In? Not Working
What Does the '&' Mean in the Following Ruby Syntax
What Is Java Interface Equivalent in Ruby
Automatic Counter in Ruby For Each
What Is the "Right" Way to Iterate Through an Array in Ruby
Equivalent of .Try() for a Hash to Avoid "Undefined Method" Errors on Nil
"Certificate Verify Failed" Openssl Error When Using Ruby 1.9.3
Extract Single String from HTML Using Ruby/Mechanize (And Nokogiri)
How Do Rvm and Rbenv Actually Work
Merge and Interleave Two Arrays in Ruby
How Does Array#Map Have Parameter to Do Something Like This
Extending Devise Sessionscontroller to Authenticate Using JSON