How to decode json string as UTF-8?
Just an aside first: UTF-8 is typically an external format, and typically represented by an array of bytes. It's what you might send over the network as part of an HTTP response. Internally, Dart stores strings as UTF-16 code points. The utf8
encoder/decoder converts between internal format strings and external format arrays of bytes.
This is why you are using utf8.decode(response.bodyBytes)
; taking the raw body bytes and converting them to an internal string. (response.body
basically does this too, but it chooses the bytes->string decoder based on the response header charset. When this charset header is missing (as it often is) the http
package picks Latin-1, which obviously doesn't work if you know that the response is in a different charset.) By using utf8.decode
yourself, you are overriding the (potentially wrong) choice being made by http
because you know that this particular server always sends UTF-8. (It may not, of course!)
Another aside: setting a content type header on a request is rarely useful. You typically aren't sending any content - so it doesn't have a type! And that doesn't influence the content type or content type charset that the server will send back to you. The accept
header might be what you are looking for. That's a hint to the server of what type of content you'd like back - but not all servers respect it.
So why are your special characters still incorrect? Try printing utf8.decode(response.bodyBytes)
before decoding it. Does it look right in the console? (It very useful to create a simple Dart command line application for this type of issue; I find it easier to set breakpoints and inspect variables in a simple ten line Dart app.) Try using something like Wireshark to capture the bytes on the wire (again, useful to have the simple Dart app for this). Or try using Postman to send the same request and inspect the response.
How are you trying to show the characters. If may simply be that the font you are using doesn't have them.
Decode UTF-8 encoding in JSON string
Your text is already encoded and you need to tell this to Python by using a b
prefix in your string but since you're using json and the input needs to be string you have to decode your encoded text manually. Since your input is not byte you can use 'raw_unicode_escape'
encoding to convert the string to byte without encoding and prevent the open
method to use its own default encoding. Then you can simply use aforementioned approach to get the desired result.
Note that since you need to do the encoding and decoding your have to read file content and perform the encoding on loaded string, then you should use json.loads()
instead of json.load()
.
In [168]: with open('test.json', encoding='raw_unicode_escape') as f:
...: d = json.loads(f.read().encode('raw_unicode_escape').decode())
...:
In [169]: d
Out[169]: {'sender_name': 'Horníková'}
How to ensure that the JSON string is UTF-8 encoded in Java
You need to set the character encoding for OutputStreamWriter
when you create it:
httpConn.connect();
wr = new OutputStreamWriter(httpConn.getOutputStream(), StandardCharsets.UTF_8);
wr.write(jsonObject.toString());
wr.flush();
Otherwise it defaults to the "platform default encoding," which is some encoding that has been used historically for text files on whatever system you are running.
encode string from json utf-8
Assuming you are running this code on Windows, then the problem is two-fold:
you are not telling
TStringList.LoadFromFile()
what the encoding of the file is. So, unless the file begins with a UTF-8 BOM (which is unlikely with a JSON file), it will be decoded as ANSI, not as UTF-8, thus corrupting any NON-ASCII characters.you are converting the decoded text back into bytes without specifying an encoding. The overload of
ParseJSONValue()
you are using expects UTF-8 encoded bytes, butBytesOf()
will encode to ANSI, not to UTF-8, thus corrupting non-ASCII characters even futher.
That is why you are getting garbage text from the JSON.
There are other problems with your code, too. Namely, a memory leak and a double-free, due to you mismanaging the initlal TJSONObject
.
Try this instead.
procedure TForm1.jsonTest;
var
JSONData, JSON: TJSONObject;
jArr: TJSONArray;
s: TStringList;
i, j: Integer;
jValue: TJSonValue;
data: string;
begin
s := TStringList.Create;
try
s.LoadFromFile('clientOrders.json', TEncoding.UTF8);
data := s.Text;
finally
s.Free;
end;
{ Alternatively:
data := IOUtils.TFile.ReadAllText('clientOrders.json', TEncoding.UTF8);
}
jValue := TJSONObject.ParseJSONValue(TEncoding.UTF8.GetBytes(data), 0);
if jValue = nil then
raise Exception.Create('This is not a JSON');
try
JSON := jValue as TJSONObject;
jArr := JSON.Get(0).JsonValue as TJSONArray;
for I := 0 to jArr.Size-1 do
begin
JSONData := jArr.Get(I) as TJSONObject;
for j := 0 to JSONData.Size - 1 do
begin
ShowMessage(JSONData.Get(j).JsonValue.ToString);
end;
end;
end;
finally
jValue.Free;
end;
end;
Alternatively, don't decode the file bytes into a string
just to convert them back into bytes, just load them as-is into ParseJSONValue()
, eg:
procedure TForm1.jsonTest;
var
...
jValue: TJSonValue;
data: TBytesStream;
begin
data := TBytesStream.Create;
try
data.LoadFromFile('clientOrders.json');
jValue := TJSONObject.ParseJSONValue(data.Bytes, 0);
...
finally
data.Free;
end;
end;
Or:
procedure TForm1.jsonTest;
var
...
jValue: TJSonValue;
data: TBytes;
begin
data := IOUtils.TFile.ReadAllBytes('clientOrders.json');
jValue := TJSONObject.ParseJSONValue(data, 0);
...
end;
PHP json_encode json_decode UTF-8
This is an encoding issue. It looks like at some point, the data gets represented as ISO-8859-1.
Every part of your process needs to be UTF-8 encoded.
The database connection
The database tables
Your PHP file (if you are using special characters inside that file as shown in your example above)
The
content-type
headers that you output
Not able to decode UTF-8 encoded json text with Cpanel::JSON::XS
decode_json
expects UTF-8, but you are providing decoded text (a string of Unicode Code Points).
Use
use utf8;
use Encode qw( encode_utf8 );
my $json_utf8 = encode_utf8( '{ "title": "Outlining — How to outline" }' );
my $data = decode_json( $json_utf8 );
or
use utf8;
my $json_utf8 = do { no utf8; '{ "title": "Outlining — How to outline" }' };
my $data = decode_json( $json_utf8 );
or
use utf8;
my $json_ucp = '{ "title": "Outlining — How to outline" }';
my $data = Cpanel::JSON::XS->new->decode( $json_ucp ); # Implied: ->utf8(0)
(The middle one seems hackish to me. The first one might be used if you get data from multiple source, and the others provide it encoded.)
Related Topics
Python Pandas - Get Row Based on Previous Row Value
How to Remove Parentheses from a String
How to Remove Square Brackets from List in Python
How to Remove Strings Present in a List from a Column in Pandas
How to Clear Only Last One Line in Python Output Console
Removing Backslashes from a String in Python
How to Limit the User Input to Only Integers in Python
Python: How to Check If Cell in CSV File Is Empty
Python Selenium - Element Is Not Currently Interactable and May Not Be Manipulated
How to Change Python Version in Anaconda Spyder
Replace Single Quote With Double Quote in a String Python
How to Divide Each Column of Pandas Dataframe by a Series
Python Format Size Application (Converting B to Kb, Mb, Gb, Tb)
How to Verify If a Button Is Enabled and Disabled in Webdriver Python
Get Discord User Id from Username
How to Remove the Double Quote When the Value Is Empty in Spark