How to convert a string with Unicode encoding to a string of letters
Technically doing:
String myString = "\u0048\u0065\u006C\u006C\u006F World";
automatically converts it to "Hello World"
, so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX
and just get XXXX
) then do Integer.ParseInt(XXXX, 16)
to get a hex value and then case that to char
to get the actual character.
Edit: Some code to accomplish this:
String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
int hexVal = Integer.parseInt(arr[i], 16);
text += (char)hexVal;
}
// Text will now have Hello
python3: How to convert a string with Unicode encoding to a string of letters
This should work for you.
string = '\\u00A9 PNG'
print (string.encode('utf8').decode('unicode-escape'))
output:
© PNG
Convert a Unicode string to a string in Python (containing extra symbols)
See unicodedata.normalize
title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii', 'ignore')
'Kluft skrams infor pa federal electoral groe'
How to convert string with Unicode literal characters in it to a Unicode string
There are a number of ways to do this, however this might work for you.
Disclaimer: it's assumed your string looks like this in your db, Universidad de M\u00e1laga
var test1 = "Universidad de M\\u00e1laga";
var test2 = Regex.Unescape(test1);
Console.WriteLine(test1);
Console.WriteLine(test2);
Output
Universidad de M\u00e1laga
Universidad de Málaga
Note : This maybe pointing to an overall structural or design problem with this entire situation. Though, who knows what APIs give you back
Full demo here
How to convert a string with unicode characters in a string with utf-8 hex characters?
For URL-encoding, you want urllib.parse.quote
:
import urllib.parse
s = "Marta Ga\u0142szewska"
q = urllib.parse.quote(s)
=> 'Marta%20Ga%C5%82szewska'
If you prefer +
to %20
, you can use quote_plus
:
q = urllib.parse.quote_plus(s)
=> 'Marta+Ga%C5%82szewska'
How to convert string with Unicode characters to normal String?
As far as I know this is not a standard encoding, at least not one of the UTF-* or ISO-*.
You need to decode it yourself, e.g.
public static String decode(String encoded) {
// "%u" followed by 4 hex digits, capture the digits
Pattern p = Pattern.compile("%u([0-9a-f]{4})", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(encoded);
StringBuffer decoded = new StringBuffer(encoded.length());
// replace every occurrences (and copy the parts between)
while (m.find()) {
m.appendReplacement(decoded, Character.toString((char)Integer.parseInt(m.group(1), 16)));
}
m.appendTail(decoded);
return decoded.toString();
}
This gives :
System.out.println(decode("%u0419%u043E"));
Йо
Unicode as String without conversion Python
Here's how to do it the hard way.
ascii_printable = set(unichr(i) for i in range(0x20, 0x7f))
def convert(ch):
if ch in ascii_printable:
return ch
ix = ord(ch)
if ix < 0x100:
return '\\x%02x' % ix
elif ix < 0x10000:
return '\\u%04x' % ix
return '\\U%08x' % ix
output = ''.join(convert(ch) for ch in input)
For Python 3 use chr
instead of unichr
.
Python: convert strings containing unicode code point back into normal characters
convert this string and others like it back to their original strings with unicode characters?
Yes, let file.txt
content be
\u9001\u5206200000
then
with open("file.txt","rb") as f:
content = f.read()
text = content.decode("unicode_escape")
print(text)
output
送分200000
If you want to know more read Text Encodings in codecs
built-in module docs
Related Topics
Java: Notify() VS. Notifyall() All Over Again
What's the Difference Between Primitive and Reference Types
How to Do the Equivalent of Pass by Reference for Primitives in Java
Java Random Numbers Using a Seed
Build Eclipse Java Project from Command Line
What Is the Equivalent Lambda Expression for System.Out::Println
How to Convert Long to Byte[] and Back in Java
Simplest Way to Set Image as JPAnel Background
Mapping Postgresql JSON Column to a Hibernate Entity Property
Convert HTML Character Back to Text Using Java Standard Library
How Is the Java Memory Pool Divided
Convert Timestamp in Milliseconds to String Formatted Time in Java
What Are Classes, References, and Objects
How Does Auto Boxing/Unboxing Work in Java
What Is the Use of Interface Constants
Deploying Spring 5.X on Tomcat 10.X
How to Measure Distance and Create a Bounding Box Based on Two Latitude+Longitude Points in Java