How to decode Unicode escape sequences like \u00ed to proper UTF-8 encoded characters?
Try this:
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, $str);
In case it's UTF-16 based C/C++/Java/Json-style:
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE');
}, $str);
How to decode this \ud835\udcdf\ud835\udcea
Here's a way to do it:
<?php
$str = "\ud835\udcdf\ud835\udcea\ud835\udcfd\ud835\udcfb\ud835\udcf2\ud835\udcec\ud835\udcf2\ud835\udcea";
echo json_decode('"'.$str.'"');
?>
Decode unicode charmap (most likely non-standard) with PHP
So for reference, your source data was UTF8, and then someone ran something equivalent to utf8_encode()
[which translates ISO8859-1 to UTF8, without regard to what the input actually is] on it twice.
function unescape_unicode($input) {
return preg_replace_callback(
'/\\\\u([0-9a-fA-F]{4})/',
function ($match) {
return mb_convert_encoding(
pack('H*', $match[1]),
'UTF-8',
'UTF-16BE'
);
},
$input
);
}
$input = "\u00c3\u0083\u00c2\u00b6";
var_dump(
bin2hex(
utf8_decode( // un-mojibake #1
utf8_decode( // un-mojibake #2
unescape_unicode($input)
)
)
)
);
Output:
string(4) "c3b6"
Where 0xc3 0xb6
is the UTF8 representation of ö
.
Do NOT put this code into production. You should only use it to un-hose data that cannot be otherwise recovered or retrieved properly from underlying storage. The primary intent of the above code is to illustrate how it is broken.
This is your new bible: UTF-8 all the way through
How to decode '\u0040' to '@' by PHP
Try this, here we are using json_decode
will itself take care of \u0040
to @
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string='{
"id": "674271626114503",
"email": "duc2521997\u0040gmail.com"
}';
$array= json_decode($string,true); //this itself will take care of `\u0040`
echo $array["email"];
Output: duc2521997@gmail.com
PHP and accent characters (Ba\u015f\u00e7\u0131l)
My educated guess is that you obtained such values from a JSON string. If that's the case, you should properly decode the full piece of data with json_decode():
<?php
header('Content-Type: text/plain; charset=utf-8');
$data = '"Ba\u015f\u00e7\u0131l"';
var_dump( json_decode($data) );
?>
Unicode character in PHP string
Because JSON directly supports the \uxxxx
syntax the first thing that comes into my mind is:
$unicodeChar = '\u1000';
echo json_decode('"'.$unicodeChar.'"');
Another option would be to use mb_convert_encoding()
echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');
or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:
echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');
Related Topics
How to Bind an Array of Strings With MySQLi Prepared Statement
How to Get a Variable Name as a String in PHP
The Ultimate Clean/Secure Function
How to Find All Youtube Video Ids in a String Using a Regex
Best Way to Insert Many Values in MySQLi
How to Prevent Browser Cache For PHP Site
Curl Post Format For Curlopt_Postfields
Difference Between Require, Include, Require_Once and Include_Once
Open_Basedir Restriction in Effect. File(/) Is Not Within the Allowed Path(S):
Only Variables Should Be Passed by Reference
Multiple Returns from a Function
Compare 2-Dimensional Data Sets Based on a Specified Second Level Value