How to Convert a String with Unicode Encoding to a String of Letters

How to convert a string with Unicode encoding to a string of letters

Technically doing:

String myString = "\u0048\u0065\u006C\u006C\u006F World";

automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.

Edit: Some code to accomplish this:

String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
int hexVal = Integer.parseInt(arr[i], 16);
text += (char)hexVal;
}
// Text will now have Hello

python3: How to convert a string with Unicode encoding to a string of letters

This should work for you.

string = '\\u00A9 PNG'
print (string.encode('utf8').decode('unicode-escape'))

output:

© PNG

Convert a Unicode string to a string in Python (containing extra symbols)

See unicodedata.normalize

title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii', 'ignore')
'Kluft skrams infor pa federal electoral groe'

How to convert string with Unicode literal characters in it to a Unicode string

There are a number of ways to do this, however this might work for you.

Disclaimer: it's assumed your string looks like this in your db, Universidad de M\u00e1laga

var test1 = "Universidad de M\\u00e1laga";  
var test2 = Regex.Unescape(test1);
Console.WriteLine(test1);
Console.WriteLine(test2);

Output

Universidad de M\u00e1laga
Universidad de Málaga

Note : This maybe pointing to an overall structural or design problem with this entire situation. Though, who knows what APIs give you back

Full demo here

How to convert a string with unicode characters in a string with utf-8 hex characters?

For URL-encoding, you want urllib.parse.quote:

import urllib.parse
s = "Marta Ga\u0142szewska"
q = urllib.parse.quote(s)

=> 'Marta%20Ga%C5%82szewska'

If you prefer + to %20, you can use quote_plus:

q = urllib.parse.quote_plus(s)

=> 'Marta+Ga%C5%82szewska'

How to convert string with Unicode characters to normal String?

As far as I know this is not a standard encoding, at least not one of the UTF-* or ISO-*.

You need to decode it yourself, e.g.

public static String decode(String encoded) {
// "%u" followed by 4 hex digits, capture the digits
Pattern p = Pattern.compile("%u([0-9a-f]{4})", Pattern.CASE_INSENSITIVE);

Matcher m = p.matcher(encoded);
StringBuffer decoded = new StringBuffer(encoded.length());

// replace every occurrences (and copy the parts between)
while (m.find()) {
m.appendReplacement(decoded, Character.toString((char)Integer.parseInt(m.group(1), 16)));
}

m.appendTail(decoded);
return decoded.toString();
}

This gives :

System.out.println(decode("%u0419%u043E"));
Йо

Unicode as String without conversion Python

Here's how to do it the hard way.

ascii_printable = set(unichr(i) for i in range(0x20, 0x7f))

def convert(ch):
if ch in ascii_printable:
return ch
ix = ord(ch)
if ix < 0x100:
return '\\x%02x' % ix
elif ix < 0x10000:
return '\\u%04x' % ix
return '\\U%08x' % ix

output = ''.join(convert(ch) for ch in input)

For Python 3 use chr instead of unichr.

Python: convert strings containing unicode code point back into normal characters

convert this string and others like it back to their original strings with unicode characters?

Yes, let file.txt content be

\u9001\u5206200000

then

with open("file.txt","rb") as f:
content = f.read()
text = content.decode("unicode_escape")
print(text)

output

送分200000

If you want to know more read Text Encodings in codecs built-in module docs



Related Topics



Leave a reply



Submit