How to Properly Encode Utf-8 for JavaScript and Json

How to properly encode UTF-8 for JavaScript and JSON?

It does not seem to be possible. The only working solution I have found is to encode all data with encodeURIComponent() on browser side and with rawurlencode() on PHP side and then calculate the JSON from these values in arrays.

What does Content-type: application/json; charset=utf-8 really mean?

The header just denotes what the content is encoded in. It is not necessarily possible to deduce the type of the content from the content itself, i.e. you can't necessarily just look at the content and know what to do with it. That's what HTTP headers are for, they tell the recipient what kind of content they're (supposedly) dealing with.

Content-type: application/json; charset=utf-8 designates the content to be in JSON format, encoded in the UTF-8 character encoding. Designating the encoding is somewhat redundant for JSON, since the default (only?) encoding for JSON is UTF-8. So in this case the receiving server apparently is happy knowing that it's dealing with JSON and assumes that the encoding is UTF-8 by default, that's why it works with or without the header.

Does this encoding limit the characters that can be in the message body?

No. You can send anything you want in the header and the body. But, if the two don't match, you may get wrong results. If you specify in the header that the content is UTF-8 encoded but you're actually sending Latin1 encoded content, the receiver may produce garbage data, trying to interpret Latin1 encoded data as UTF-8. If of course you specify that you're sending Latin1 encoded data and you're actually doing so, then yes, you're limited to the 256 characters you can encode in Latin1.

Converting JSON response into correct encoding in JavaScript

The problem is in the source data: the JSON sequence "\u00e2\u0080\u0099"does not represent a right closing quotation mark. There are three Unicode code points here, and the first represent "â", while the other two are control characters.

You can verify this in a dev console, or by running the snippet below:

console.log(JSON.parse('"\u00e2\u0080\u0099"'));

JSON character encoding - is UTF-8 well-supported by browsers or should I use numeric escape sequences?

The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in between your encoder and the intended decoder is not binary-safe.

Another reason you may want to use numeric escape sequences is to prevent certain characters appearing in the stream, such as <, & and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \).

Some frameworks, including PHP's json_encode() (by default), always do the numeric escape sequences on the encoder side for any character outside of ASCII. This is a mostly unnecessary extra step intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that any JSON decoders have a problem with UTF-8.

So, I guess you just could decide which to use like this:

  • Just use UTF-8, unless any software you are using for storage or transport between encoder and decoder isn't binary-safe.

  • Otherwise, use the numeric escape sequences.

Failed to declare and display UTF-8 character properly in JSON

I think that the problem is that the font doesn't have support for such symbol, hence the square character being drawn. If there is not an specific reason as why you are using this character, you could draw it with an icon, or using a character in an icon font.

decode encoded UTF-8 json text in swift

I think the problem here is you're converting the String with UTF8:

let responseData: Data =  self.facilitiesService.UpdateJsonString.data(using: String.Encoding.utf8)!

I created a playground file to test the encoding/decoding of Japanese and French alphabet and it works correctly.

import UIKit

// the text you want to encode/decode
let myChars = "星空観察に最適なロケーション. Also other chars. Même du Français écrit par un Brésilien."

// your json node represented as a Struct
struct MyContent: Codable {
let id: Int
let content: String
}

// your data
let myContent = MyContent(id: 1, content: myChars)

do {
// we encode the content, the encoded variable here would be the data you would get from your webservice.
let encoded = try JSONEncoder().encode(myContent)

// we decode your data without using any encoding type
let val = try JSONDecoder().decode(MyContent.self, from: encoded)

// the content printed is correctly presented.
print(val.content)

} catch {
// should not get any error, but just in case.
print("Error: \(error.localizedDescription)")
}

// version using a String as a starting point

let jsonString = """
{
"id": 1,
"content": \"\(myChars)\"
}
"""

print(jsonString)

if let data = jsonString.data(using: .utf8) {
do {
// we decode your data without using any encoding type
let val = try JSONDecoder().decode(MyContent.self, from: data)

// the content printed is correctly presented.
print(val.content)

} catch {
// should not get any error, but just in case.
print("Error: \(error)")
}

}

I updated the playground so you can better visualise all these conversions.

Reviewing the bit of code you posted, I see you have this self.facilitiesService.UpdateJsonString which is a String, usually when we make a web service request we get the result as Data, so I would say that data has already been decoded. I would advise you to check how it's done. There could have been something like String(data: data, encoding: .utf8), maybe you should modify this UpdateJsonString variable to be Data rather than a String. So you avoid one operation.

I hope this helps! :)



Related Topics



Leave a reply



Submit