How to Decode Unicode Escape Sequences Like "\U00Ed" to Proper Utf-8 Encoded Characters

How to decode Unicode escape sequences like \u00ed to proper UTF-8 encoded characters?

Try this:

$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, $str);

In case it's UTF-16 based C/C++/Java/Json-style:

$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE');
}, $str);

How to decode this \ud835\udcdf\ud835\udcea

Here's a way to do it:

<?php

$str = "\ud835\udcdf\ud835\udcea\ud835\udcfd\ud835\udcfb\ud835\udcf2\ud835\udcec\ud835\udcf2\ud835\udcea";
echo json_decode('"'.$str.'"');

?>

Decode unicode charmap (most likely non-standard) with PHP

So for reference, your source data was UTF8, and then someone ran something equivalent to utf8_encode() [which translates ISO8859-1 to UTF8, without regard to what the input actually is] on it twice.

function unescape_unicode($input) {
return preg_replace_callback(
'/\\\\u([0-9a-fA-F]{4})/',
function ($match) {
return mb_convert_encoding(
pack('H*', $match[1]),
'UTF-8',
'UTF-16BE'
);
},
$input
);

}

$input = "\u00c3\u0083\u00c2\u00b6";

var_dump(
bin2hex(
utf8_decode( // un-mojibake #1
utf8_decode( // un-mojibake #2
unescape_unicode($input)
)
)
)
);

Output:

string(4) "c3b6"

Where 0xc3 0xb6 is the UTF8 representation of ö.

Do NOT put this code into production. You should only use it to un-hose data that cannot be otherwise recovered or retrieved properly from underlying storage. The primary intent of the above code is to illustrate how it is broken.

This is your new bible: UTF-8 all the way through

How to decode '\u0040' to '@' by PHP

Try this, here we are using json_decode will itself take care of \u0040 to @

Try this code snippet here

<?php
ini_set('display_errors', 1);
$string='{
"id": "674271626114503",
"email": "duc2521997\u0040gmail.com"
}';
$array= json_decode($string,true); //this itself will take care of `\u0040`
echo $array["email"];

Output: duc2521997@gmail.com

PHP and accent characters (Ba\u015f\u00e7\u0131l)

My educated guess is that you obtained such values from a JSON string. If that's the case, you should properly decode the full piece of data with json_decode():

<?php

header('Content-Type: text/plain; charset=utf-8');

$data = '"Ba\u015f\u00e7\u0131l"';
var_dump( json_decode($data) );

?>

Unicode character in PHP string

Because JSON directly supports the \uxxxx syntax the first thing that comes into my mind is:

$unicodeChar = '\u1000';
echo json_decode('"'.$unicodeChar.'"');

Another option would be to use mb_convert_encoding()

echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');

or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:

echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');


Related Topics



Leave a reply



Submit