Creating Unicode character from its number
Just cast your int
to a char
. You can convert that to a String
using Character.toString()
:
String s = Character.toString((char)c);
EDIT:
Just remember that the escape sequences in Java source code (the \u
bits) are in HEX, so if you're trying to reproduce an escape sequence, you'll need something like int c = 0x2202
.
How can I get a Unicode character from a number?
Use wchar_t
instead
for (wchar_t i = 0; i < 500; i++) {
wcout << i;
}
You can also use char16_t
and char32_t
if you're using C++11 or newer
However you still need a capable terminal and also need to set the correct codepage to get the expected output. On Linux it's quite straight forward but if you're using (an older) Windows it's much trickier. See Output unicode strings in Windows console app
How to put Unicode char in Java String?
The UTF-16 encoding of your character U+1F604
is 0xD83D 0xDE04
, so it should be:
String s = "\uD83D\uDE04";
How to generate all possible unicode characters?
There may be easier ways to do this, but here goes. The Unicode
package contains everything you need.
First we can get a list of unicode scripts and the block ranges:
library(Unicode)
uranges <- u_scripts()
Check what we've got:
head(uranges, 3)
$Adlam
[1] U+1E900..U+1E943 U+1E944..U+1E94A U+1E94B U+1E950..U+1E959 U+1E95E..U+1E95F
$Ahom
[1] U+11700..U+1171A U+1171D..U+1171F U+11720..U+11721 U+11722..U+11725 U+11726 U+11727..U+1172B U+11730..U+11739 U+1173A..U+1173B U+1173C..U+1173E U+1173F
[11] U+11740..U+11746
$Anatolian_Hieroglyphs
[1] U+14400..U+14646
Next we can convert the ranges into their sequences.
expand_uranges <- lapply(uranges, as.u_char_seq)
To get a single vector of all characters we can unlist it. This won't be easy to work with so really it would be better to keep them as a list:
all_unicode_chars <- unlist(expand_uranges)
# The Wikipedia page linked states there are 144,697 characters
length(all_unicode_chars)
[1] 144762
So seems to be all of them and the page needs updating. They are stored as integers so to print them (assuming the glyph is supported) we can do, for example, printing Japanese katakana:
intToUtf8(expand_uranges$Katakana[[1]])
[1] "ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ"
How can I get a Unicode character's code?
Just convert it to int
:
char registered = '®';
int code = (int) registered;
In fact there's an implicit conversion from char
to int
so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.
This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as char
values in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt()
.
Once you've got the UTF-16 code unit or Unicode code points, both of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kind of representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+
, e.g. "U+0020"
for space.) That's beyond the scope of this question though, as we don't know what the requirements are.
Integer Object (Unicode) to Characterobject
The problem is that a Character
object represents a char
; i.e. a number on the range 0
through 0xffff
. Unicode code-points range up to U+10FFFF
and many cannot be represented as a single char
value.
So this gives you a problem:
If the code-points that you want to represent are all between
U+0000
andU+FFFF
, then you can represent them asCharacter
values.If any are
U+10000
or larger, then it won't work.
So, if you have an int
that represents a Unicode code-point, you need to do do something like this:
int value = ...
if (Character.isDefined(value)) {
if (value <= 0xffff) {
return Character.valueOf((char) value);
} else {
// code point not representable as a `Character`
}
} else {
// Not a valid code-point at all
}
Note:
int
values that are not valid code points include negative values, values greater than0x10ffff
and lower and upper surrogate code-units.- A number of commonly used Unicode code-points are great than U+10000. For example, the code-points for Emojis! This means that using
Character
is a bad idea. It would be better to use either aString
, achar[]
or anInteger
.
It seems to work so far.
I guess you haven't tried @Shawn's approach with an Emoji yet. /p>
Is there a way around using a downcast?
No.
if(i == 0)
throw new NullPointerException();
That is just wrong:
Zero is a valid code-point.
Even if it wasn't valid, it is NOT a
null
. So throwingNullPointerException
is totally inappropriate.If you are concerned about the case where
i
isnull
, don't worry. Any operation that unboxesi
will automatically throwNullPointerException
if it isnull
. Just let it happen ...
Java: Convert String \uFFFF into char
char c = "\uFFFF".toCharArray()[0];
The value is directly interpreted as the desired string, and the whole sequence is realized as a single character.
Another way, if you are going to hard-code the value:
char c = '\uFFFF';
Note that \uFFFF
doesn't seem to be a proper unicode character, but try with \u041f
for example.
Read about unicode escapes here
Get unicode value of a character
You can do it for any Java char using the one liner here:
System.out.println( "\\u" + Integer.toHexString('÷' | 0x10000).substring(1) );
But it's only going to work for the Unicode characters up to Unicode 3.0, which is why I precised you could do it for any Java char.
Because Java was designed way before Unicode 3.1 came and hence Java's char primitive is inadequate to represent Unicode 3.1 and up: there's not a "one Unicode character to one Java char" mapping anymore (instead a monstrous hack is used).
So you really have to check your requirements here: do you need to support Java char or any possible Unicode character?
Related Topics
Tomcat 7 "Severe: a Child Container Failed During Start"
Process Thymeleaf Variable as HTML Code and Not Text
What Does a Jvm Have to Do When Calling a Native Method
Org.Openqa.Selenium.Webdriverexception: Unknown Error: Call Function Result Missing 'Value'
What Is the Time Complexity Performance of Hashset.Contains() in Java
Printf %F with Only 2 Numbers After the Decimal Point
Enable Partial Compilation in Intellij Idea
Case Insensitive JSON to Pojo Mapping Without Changing the Pojo
Creating Runnable Jar with External Files Included
Why Can't I Use \U000D and \U000A as Cr and Lf in Java
How to Change Java Logging Console Output from Std Err to Std Out
Are Thread.Sleep(0) and Thread.Yield() Statements Equivalent
Getting Xml Node Text Value with Java Dom
404 Error Redirect in Spring with Java Config
Why Can't You Reduce the Visibility of a Method in a Java Subclass
Problems with Local Variable Scope. How to Solve It
Does Java Guarantee That Object.Getclass() == Object.Getclass()