JSON_Encode() Non Utf-8 Strings

json_encode() non utf-8 strings?

Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?

If you have an ANSI encoded string, using utf8_encode() is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082 from the json output, but technically these sequences are valid for json, you must not fear them.

Converting ANSI to UTF-8 with PHP

json_encode works with UTF-8 encoded strings only. If you need to create valid json successfully from an ANSI encoded string, you need to re-encode/convert it to UTF-8 first. Then json_encode will just work as documented.

To convert an encoding from ANSI (more correctly I assume you have a Windows-1252 encoded string, which is popular but wrongly referred to as ANSI) to UTF-8 you can make use of the mb_convert_encoding() function:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:

$str = iconv("CP1252", "UTF-8", $str);

Note on utf8_encode()

utf8_encode() does only work for Latin-1, not for ANSI. So you will destroy part of your characters inside that string when you run it through that function.


Related: What is ANSI format?


For a more fine-grained control of what json_encode() returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).

Changing the encoding of an array/iteratively (PDO comment)

As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode. That's just a standard array operation, for the simpler case of pdo::fetch() a foreach iteration:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
foreach($row as &$value)
{
$value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
}
unset($value); # safety: remove reference
$items[] = array_map('utf8_encode', $row );
}

How to keep json_encode() from dropping strings with invalid characters

php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.

Here's some code that demonstrates this:

error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);

ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing

ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument

That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.

workaround:

Cleaning the string before passing it to json_encode may be a workable solution.

$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
// one or more chars were invalid, and so they were stripped out.
// if you need to know where in the string the first stripped character was,
// then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);

http://php.net/manual/en/function.iconv.php

The manual says

//IGNORE silently discards characters that are illegal in the target
charset.

So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.

Edit
in php 7.2+, there seems to be some new flags for json_encode:
JSON_INVALID_UTF8_IGNORE and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior:
https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt

And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR. See http://php.net/manual/en/class.jsonexception.php

Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?

Since PHP/5.4.0, there is an option called JSON_UNESCAPED_UNICODE. Check it out:

https://php.net/function.json-encode

Therefore you should try:

json_encode( $text, JSON_UNESCAPED_UNICODE );

How to use PHP json_encode without UTF8?

The only thing you need to do is to convert your data to UTF-8 before passing it to json_encode. That function requires UTF-8 data, and unless you want to reimplement json_encode yourself it's a lot easier to go along with its requirements:

function recursivelyConvertToUTF8($data, $from = 'ISO-8859-1') {
if (!is_array($data)) {
return iconv($from, 'UTF-8', $data);
}
return array_map(function ($value) use ($from) {
return recursivelyConvertToUTF8($value, $from);
}, $data);
}

echo json_encode(recursivelyConvertToUTF8($myData));

This is not necessarily a complete solution covering every possible use case, but it should illustrate the idea.

Any way to return PHP `json_encode` with encode UTF-8 and not Unicode?

{"a":"\u00e1"} and {"a":"á"} are different ways to write the same JSON document; The JSON decoder will decode the unicode escape.

In php 5.4+, php's json_encode does have the JSON_UNESCAPED_UNICODE option for plain output. On older php versions, you can roll out your own JSON encoder that does not encode non-ASCII characters, or use Pear's JSON encoder and remove line 349 to 433.

json_encode() excluding UTF-8 encoded string from output

It looks like your database connection is not set to utf-8, which is the most important part. So add 'encoding' => 'utf8' to the database configuration in your app/config/database.php, for example:

    'default' => array(
'driver' => 'mysql',
'host' => 'YOURHOST',
'login' => 'YOURLOGIN',
'password' => 'YOURPASS',
'database' => 'YOURDB',
'encoding' => 'utf8'
),

If you don't set the encoding in the connection, a "default" encoding will be used. The default is likely not utf8.



Related Topics



Leave a reply



Submit