How to get string objects instead of Unicode from JSON?
A solution with object_hook
[edit]: Updated for Python 2.7 and 3.x compatibility.
import json
def json_load_byteified(file_handle):
return _byteify(
json.load(file_handle, object_hook=_byteify),
ignore_dicts=True
)
def json_loads_byteified(json_text):
return _byteify(
json.loads(json_text, object_hook=_byteify),
ignore_dicts=True
)
def _byteify(data, ignore_dicts = False):
if isinstance(data, str):
return data
# if this is a list of values, return list of byteified values
if isinstance(data, list):
return [ _byteify(item, ignore_dicts=True) for item in data ]
# if this is a dictionary, return dictionary of byteified keys and values
# but only if we haven't already byteified it
if isinstance(data, dict) and not ignore_dicts:
return {
_byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
for key, value in data.items() # changed to .items() for python 2.7/3
}
# python 3 compatible duck-typing
# if this is a unicode string, return its string representation
if str(type(data)) == "<type 'unicode'>":
return data.encode('utf-8')
# if it's anything else, return it in its original form
return data
Example usage:
>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}
How does this work and why would I use it?
Mark Amery's function is shorter and clearer than these ones, so what's the point of them? Why would you want to use them?
Purely for performance. Mark's answer decodes the JSON text fully first with unicode strings, then recurses through the entire decoded value to convert all strings to byte strings. This has a couple of undesirable effects:
- A copy of the entire decoded structure gets created in memory
- If your JSON object is really deeply nested (500 levels or more) then you'll hit Python's maximum recursion depth
This answer mitigates both of those performance issues by using the object_hook
parameter of json.load
and json.loads
. From the docs:
object_hook
is an optional function that will be called with the result of any object literal decoded (adict
). The return value of object_hook will be used instead of thedict
. This feature can be used to implement custom decoders
Since dictionaries nested many levels deep in other dictionaries get passed to object_hook
as they're decoded, we can byteify any strings or lists inside them at that point and avoid the need for deep recursion later.
Mark's answer isn't suitable for use as an object_hook
as it stands, because it recurses into nested dictionaries. We prevent that recursion in this answer with the ignore_dicts
parameter to _byteify
, which gets passed to it at all times except when object_hook
passes it a new dict
to byteify. The ignore_dicts
flag tells _byteify
to ignore dict
s since they already been byteified.
Finally, our implementations of json_load_byteified
and json_loads_byteified
call _byteify
(with ignore_dicts=True
) on the result returned from json.load
or json.loads
to handle the case where the JSON text being decoded doesn't have a dict
at the top level.
Remove unicode string and spaces from json response using python
It is double-JSON-encoded. Just json.loads
the response twice to fix it, but if you can fix the upstream problem:
# From OP's example
>>> response_dict ='"{\\u000d\\u000a \\"SOURCE\\": \\"APPDEV\\",\\u000d\\u000a \\"TIMESTAMP\\": \\"2022-04-19 12:29:27\\",\\u000d\\u000a \\"TAGERRORS\\": []\\u000d\\u000a}"'
>>> print(response_dict) # This is valid JSON
"{\u000d\u000a \"SOURCE\": \"APPDEV\",\u000d\u000a \"TIMESTAMP\": \"2022-04-19 12:29:27\",\u000d\u000a \"TAGERRORS\": []\u000d\u000a}"
>>> json.loads(response_dict)
'{\r\n "SOURCE": "APPDEV",\r\n "TIMESTAMP": "2022-04-19 12:29:27",\r\n "TAGERRORS": []\r\n}'
>>> json.loads(json.loads(response_dict))
{'SOURCE': 'APPDEV', 'TIMESTAMP': '2022-04-19 12:29:27', 'TAGERRORS': []}
Python2 json: load using strings instead of unicode
You could supply an object_hook
or object_pairs_hook
parameter to json.loads()
.
from pprint import pprint
import json
def str_hook(obj):
return {k.encode('utf-8') if isinstance(k,unicode) else k :
v.encode('utf-8') if isinstance(v, unicode) else v
for k,v in obj}
j = '''{
"first": 1,
"second": "two",
"third": {
"first": "one",
"second": null
}
}'''
d = json.loads(j, object_pairs_hook=str_hook)
pprint(d)
How to display Unicode Smiley from json response dynamically in flutter
Issue resolved by using below code snippet.
Client client = Client();
final response = await client.get(Uri.parse('YOUR_API_URL'));
if (response.statusCode == 200) {
// If the server did return a 200 OK response,
// then parse the JSON.
final extractedData = json.decode(response.body.replaceAll("\\\\", "\\"));
}
Here we need to replace double backslash to single backslash and then decode JSON respone before set into Text like this we can display multiple unicode like this:
final extractedData = json.decode(response.body.replaceAll("\\",
"\"));
Hope this answer help to other
How to convert json file contains Unicode to string and save as a json file in python?
it need ensure_ascii=False
flag in json.dumps()
import json
filename = 'quran.json' # file name we want to compress
newname = filename.replace('.json', '.min.json') # Output file name
with open(filename, encoding="utf8") as fp:
print("Compressing file: " + filename)
print('Compressing...')
jload = json.load(fp)
newfile = json.dumps(jload, indent=None, separators=(',', ':'), ensure_ascii=False)
#newfile = str.encode(newfile) # remove this
with open(newname, 'w', encoding="utf8") as f: # add encoding="utf8"
f.write(newfile)
print('Compression complete!')
Can't parse Json text with unicode codes with Python 3.7
I think you need to use 'raw_unicode_escape'.
import json
with open("j.json", encoding='raw_unicode_escape') as f:
data = json.loads(f.read().encode('raw_unicode_escape').decode())
print(data[0])
OUT: {'timestamp': 1575826804, 'attachments': [], 'data': [{'post': 'This is a test line with character í and ó'}, {'update_timestamp': 1575826804}], 'title': 'My Name'}
Does this help?
Related Topics
Importerror: No Module Named Psycopg2 After Install
Pandas - Calculate Average of Columns With Condition Based on Values in Other Columns
How to Get All Days in Current Month
Python 2D List Performance, Without Numpy
Python: Searching for Common Values in Two Files
How to Enable Autocomplete (Intellisense) for Python Package Modules
Pickle - Cpickle.Unpicklingerror: Invalid Load Key, '?'
How to Make a Grade Calculator in Python
How to Count Duplicate Rows in Pandas Dataframe
How to Call Python Script on Excel Vba
If-Condition With Multiple Actions in Robot Framework
Tensorflow:Attributeerror: 'Module' Object Has No Attribute 'Mul'
Splitting Dataframe into Multiple Dataframes
Using Buttons in Tkinter to Navigate to Different Pages of the Application
How to Get the Coordinates of the Bounding Box in Yolo Object Detection
How to Share Single Sqlite Connection in Multi-Threaded Python Application