Convert a JSON into a Utf-8 String

Converting json String to UTF-8

It's really bad that my post given negative reputation because they don't know the answer but i found the solution myself

I simply converted text to html content in code and displayed it using

String contentN = json.getJSONArray("get_data").getJSONObject(i).getString("c_Alert_Msg");
Html.fromHtml(contentN))

full code

 contenTs = json.getJSONArray("get_data");
itemList = new ArrayList<HashMap<String, String>>();

for (int i = 0; i < contenTs.length(); i++) {
JSONObject c = contenTs.getJSONObject(i);

// Storing each json item in variable
String id = c.getString("id");
String headN = c.getString("Header");
String contentN = c.getString("Content");
String time_s = c.getString("Begin");
String time_e = c.getString("End");
String linkIn = c.getString("link");

HashMap map = new HashMap<String, Spannable>();
String txtHeadN = "<font color=#cc0029><strong>" + String.valueOf(i + 1) + " - " + headN + "</font>";
map.put("head", Html.fromHtml(txtHeadN));
map.put("content",Html.fromHtml(contentN));
map.put("Link", time_s + " to " + time_e);
map.put("links",linkIn);
// adding HashList to ArrayList
itemList.add(map);
}

And it worked perfectly fine

How to decode json string as UTF-8?

Just an aside first: UTF-8 is typically an external format, and typically represented by an array of bytes. It's what you might send over the network as part of an HTTP response. Internally, Dart stores strings as UTF-16 code points. The utf8 encoder/decoder converts between internal format strings and external format arrays of bytes.

This is why you are using utf8.decode(response.bodyBytes); taking the raw body bytes and converting them to an internal string. (response.body basically does this too, but it chooses the bytes->string decoder based on the response header charset. When this charset header is missing (as it often is) the http package picks Latin-1, which obviously doesn't work if you know that the response is in a different charset.) By using utf8.decode yourself, you are overriding the (potentially wrong) choice being made by http because you know that this particular server always sends UTF-8. (It may not, of course!)

Another aside: setting a content type header on a request is rarely useful. You typically aren't sending any content - so it doesn't have a type! And that doesn't influence the content type or content type charset that the server will send back to you. The accept header might be what you are looking for. That's a hint to the server of what type of content you'd like back - but not all servers respect it.

So why are your special characters still incorrect? Try printing utf8.decode(response.bodyBytes) before decoding it. Does it look right in the console? (It very useful to create a simple Dart command line application for this type of issue; I find it easier to set breakpoints and inspect variables in a simple ten line Dart app.) Try using something like Wireshark to capture the bytes on the wire (again, useful to have the simple Dart app for this). Or try using Postman to send the same request and inspect the response.

How are you trying to show the characters. If may simply be that the font you are using doesn't have them.

How to ensure that the JSON string is UTF-8 encoded in Java

You need to set the character encoding for OutputStreamWriter when you create it:

 httpConn.connect();
wr = new OutputStreamWriter(httpConn.getOutputStream(), StandardCharsets.UTF_8);
wr.write(jsonObject.toString());
wr.flush();

Otherwise it defaults to the "platform default encoding," which is some encoding that has been used historically for text files on whatever system you are running.

Reading json files with utf-8 characters with python

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8. This reverses whichever process produced the mojibake. (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.)

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).

String to JSONObject and back to String without losing UTF-8 encoding

You are seeing unicode escape sequences because of how toString is implemented. It is implemented this way probably to make it clear which characters are in the string, which makes it easier to debug your code, because sometimes different code points can look very similar.

The actual strings are still unescaped. Printing individual strings in the array will not show escape sequences:

System.out.println(resultJSON.getJSONObject("result").getJSONObject("opening_hours").getJSONArray("weekday_text").getString(0));

JSON.stringify() to UTF-8

JavaScript engines are allowed to use either UCS-2 or UTF-16.

So, yes, JSON.stringify() will return a string in whatever encoding your implementation uses for strings. If you were to find a way to change that encoding within the context of your script, it would no longer be a valid JavaScript string.

For serialising it over a network, though, I would expect it to automatically be transcoded into the character set of the HTTP request (assuming you're talking about HTTP). So if you send it via HTTP POST with a character set of UTF-8, your browser should transparently handle the transcoding of that data before it is sent.

Otherwise browsers would really struggle with character set handling.



Related Topics



Leave a reply



Submit