json_encode() non utf-8 strings?
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?
If you have an ANSI encoded string, using utf8_encode()
is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082
from the json output, but technically these sequences are valid for json, you must not fear them.
Converting ANSI to UTF-8 with PHP
json_encode
works with UTF-8
encoded strings only. If you need to create valid json
successfully from an ANSI
encoded string, you need to re-encode/convert it to UTF-8
first. Then json_encode
will just work as documented.
To convert an encoding from ANSI
(more correctly I assume you have a Windows-1252
encoded string, which is popular but wrongly referred to as ANSI
) to UTF-8
you can make use of the mb_convert_encoding()
function:
$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");
Another function in PHP that can convert the encoding / charset of a string is called iconv
based on libiconv. You can use it as well:
$str = iconv("CP1252", "UTF-8", $str);
Note on utf8_encode()
utf8_encode()
does only work for Latin-1
, not for ANSI
. So you will destroy part of your characters inside that string when you run it through that function.
Related: What is ANSI format?
For a more fine-grained control of what json_encode()
returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).
Changing the encoding of an array/iteratively (PDO comment)
As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode
. That's just a standard array operation, for the simpler case of pdo::fetch()
a foreach
iteration:
while($row = $q->fetch(PDO::FETCH_ASSOC))
{
foreach($row as &$value)
{
$value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
}
unset($value); # safety: remove reference
$items[] = array_map('utf8_encode', $row );
}
How to keep json_encode() from dropping strings with invalid characters
php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors
setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors
on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.
Here's some code that demonstrates this:
error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);
ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing
ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument
That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.
workaround:
Cleaning the string before passing it to json_encode may be a workable solution.
$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
// one or more chars were invalid, and so they were stripped out.
// if you need to know where in the string the first stripped character was,
// then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);
http://php.net/manual/en/function.iconv.php
The manual says
//IGNORE
silently discards characters that are illegal in the target
charset.
So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE
flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.
Edit
in php 7.2+, there seems to be some new flags for json_encode
:JSON_INVALID_UTF8_IGNORE
and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior:
https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt
And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR
. See http://php.net/manual/en/class.jsonexception.php
Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?
Since PHP/5.4.0, there is an option called JSON_UNESCAPED_UNICODE
. Check it out:
https://php.net/function.json-encode
Therefore you should try:
json_encode( $text, JSON_UNESCAPED_UNICODE );
How to use PHP json_encode without UTF8?
The only thing you need to do is to convert your data to UTF-8 before passing it to json_encode
. That function requires UTF-8 data, and unless you want to reimplement json_encode
yourself it's a lot easier to go along with its requirements:
function recursivelyConvertToUTF8($data, $from = 'ISO-8859-1') {
if (!is_array($data)) {
return iconv($from, 'UTF-8', $data);
}
return array_map(function ($value) use ($from) {
return recursivelyConvertToUTF8($value, $from);
}, $data);
}
echo json_encode(recursivelyConvertToUTF8($myData));
This is not necessarily a complete solution covering every possible use case, but it should illustrate the idea.
Any way to return PHP `json_encode` with encode UTF-8 and not Unicode?
{"a":"\u00e1"}
and {"a":"á"}
are different ways to write the same JSON document; The JSON decoder will decode the unicode escape.
In php 5.4+, php's json_encode
does have the JSON_UNESCAPED_UNICODE
option for plain output. On older php versions, you can roll out your own JSON encoder that does not encode non-ASCII characters, or use Pear's JSON encoder and remove line 349 to 433.
json_encode() excluding UTF-8 encoded string from output
It looks like your database connection is not set to utf-8, which is the most important part. So add 'encoding' => 'utf8'
to the database configuration in your app/config/database.php
, for example:
'default' => array(
'driver' => 'mysql',
'host' => 'YOURHOST',
'login' => 'YOURLOGIN',
'password' => 'YOURPASS',
'database' => 'YOURDB',
'encoding' => 'utf8'
),
If you don't set the encoding in the connection, a "default" encoding will be used. The default is likely not utf8.
Related Topics
Pass a Percent (%) Sign in a Url and Get Exact Value of It Using PHP
PHP String Interpolation Syntax
Php: iPad Does Not Play Mp4 Videos Delivered by PHP, But If Accessed Directly It Does
Warning: Array_Combine(): Both Parameters Should Have an Equal Number of Elements
Mod_Rewrite, PHP and the .Htaccess File
PHP in Background Exec() Function
How to Install Composer Globally on Windows
Best Way to Clear a PHP Array's Values
How to Get Open Graph Protocol of a Webpage by PHP
Get Variables from the Outside, Inside a Function in PHP
How to Use Array_Unique on an Array of Arrays
Bubble Sort Implementation in PHP
$_Post Is Empty After Form Submit
Dynamic Shipping Fee Based on Custom Radio Buttons in Woocommerce
Checking If a Number Is Float in PHP
How to Run the PHP Script at Scheduled Time
How to Properly Handle Session and Access Token with Facebook PHP Sdk 3.0