How to keep json_encode() from dropping strings with invalid characters
php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors
setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors
on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.
Here's some code that demonstrates this:
error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);
ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing
ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument
That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.
workaround:
Cleaning the string before passing it to json_encode may be a workable solution.
$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
// one or more chars were invalid, and so they were stripped out.
// if you need to know where in the string the first stripped character was,
// then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);
http://php.net/manual/en/function.iconv.php
The manual says
//IGNORE
silently discards characters that are illegal in the target
charset.
So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE
flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.
Edit
in php 7.2+, there seems to be some new flags for json_encode
:JSON_INVALID_UTF8_IGNORE
and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior:
https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt
And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR
. See http://php.net/manual/en/class.jsonexception.php
json_encode strings with special characters and spaces
Thanks to @deceze in the comments. The attribute had to be in quotes. Also @WPhil is right, the JSON.parse() part was also missing.
json_encode(): Invalid UTF-8 sequence in argument
Seems like the symbol was Å
, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0]
, which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1)
- works like a charm.
php json_encode() show's null instead of text
json_encode
expects strings in the data to be encoded as UTF-8.
Convert them to UTF-8 if they aren't already:
$results = array_map(function($r) {
$r['text'] = utf8_encode($r['text']);
return $r;
}, $results);
echo json_encode($results);
json_encode php result is NULL?
json_encode()
has the (undocumented) habit of silently null
ing properties that contain invalid (= non-UTF-8) characters.
Make sure your input data is UTF-8 encoded, which is a documented requirement of that function.
In the event of a failure to encode,
json_last_error()
can be used to determine the exact nature of the error. (Available in PHP 5.3 only)
Related: How to keep json_encode() from dropping strings with invalid characters
PHP's json_encode does not escape all JSON control characters
D'oh - you need to double-encode: JSON.parse is expecting a string of course:
<script type="text/javascript">
JSON.parse(<?php echo json_encode($s) ?>);
</script>
Related Topics
Access a Global Variable in a PHP Function
How to Make Strings "Xml Safe"
PHP Debug_Backtrace in Production Code to Get Information About Calling Method
How to Use "Root" Namespace of PHP
Which Line Break in PHP Mail Header, \R\N or \N
Should I Use Prepared Statements for MySQL in PHP Performance-Wise
Improve Password Hashing with a Random Salt
Finding Common Prefix of Array of Strings
MySQL Performance - "In" Clause VS. Equals (=) for a Single Value
Highlight the Word in the String, If It Contains the Keyword
Setting Up Ssl on a Local Xampp/Apache Server
Passing Multiple PHP Variables to Shell_Exec()
Best Method for Sum Two Arrays
Why Doesn't Var_Dump Work with Domdocument Objects, While Print($Dom->Savehtml()) Does