Reference: Why Are My "Special" Unicode Characters Encoded Weird Using Json_Encode

Reference: Why are my special Unicode characters encoded weird using json_encode?

First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:

Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).

"馬"
"\u99ac"

These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.

PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:

php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}

To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.

Unknown characters displaying while encoding UTF-8 words into JSON format using json_encode in PHP

JSON fully supports Unicode (rather should I say the standard for parsers does). The problem is that PHP does not fully support Unicode.

In this stack overflow question, I'll quote

Some frameworks, including PHP's implementation of JSON, always do the safe numeric encodings on the encoder side. This is intended for maximum compatibility with buggy/limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have problems with UTF-8.

Those "unknown characters" that you are referring to are actually known as Unicode Escape Sequences, and are there for parsers built in programming languages that do not fully support Unicode. These sequences are also used in CSS files, for displaying Unicode characters (see CSS content property).

If you want to display this in your client side app (I'm going to assume you're using Java), then I'll refer you to this question

tl;dr: There is nothing wrong with your JSON file. Those encodings are there to help the parser.

Any way to return PHP `json_encode` with encode UTF-8 and not Unicode?

{"a":"\u00e1"} and {"a":"á"} are different ways to write the same JSON document; The JSON decoder will decode the unicode escape.

In php 5.4+, php's json_encode does have the JSON_UNESCAPED_UNICODE option for plain output. On older php versions, you can roll out your own JSON encoder that does not encode non-ASCII characters, or use Pear's JSON encoder and remove line 349 to 433.

json encode php showing strange characters

You need a special encoding to have readable json result of arabic character (unicode).

You can specify otherwise with JSON_UNESCAPED_UNICODE PHP 5.4 or later.

json_encode('yourarabiccharacters', JSON_UNESCAPED_UNICODE);

php json_encode different output than js JSON.stringify

The escaping in PHP is optional but not technically required for valid JSON (which can contain arbitrary Unicode aside from a few reserved whitespace characters). The feature can be turned off with json_encode($data, JSON_UNESCAPED_UNICODE).

Unfortunately, the JS version doesn't have the feature at all. If you want to escape multibyte characters to \u...., you should do it explicitly; see JSON.stringify and unicode characters.



Related Topics



Leave a reply



Submit