How to Parse Somewhat Wrong JSON with Python

How to parse somewhat wrong JSON with Python?

since YAML (>=1.2) is a superset of JSON, you can do:

>>> import yaml
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> yaml.load(s)
{'id': 17893, 'value': '82363549923gnyh49c9djl239pjm01223'}

How to decode an invalid json string in python

Use demjson module, which has ability to decode in non-strict mode.

In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}

What's the most Pythonic way to parse out this value from a JSON-like blob?

The returned json string is incorrect because last item of the dictionary ends with ,, which json cannot parse.

": [\n  "RS256"\n ],\n}'
^^^

But ast.literal_eval can do that (as python parsing accepts lists/dicts that end with a comma). As long as you don't have booleans or null values, it is possible and pythonic

>>> ast.literal_eval(response.text)["jwks_uri"]
'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt@system.gserviceaccount.com'

HTTP requests and JSON parsing in Python

I recommend using the awesome requests library:

import requests

url = 'http://maps.googleapis.com/maps/api/directions/json'

params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)

resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below

JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content

Skipping broken jsons python

Update

You're getting an expecting string or buffer - you need to be using row[0] as the results will be 1-tuples... and you wish to take the first and only column.

If you did want to check for bad json

You can put a try/except around it:

for row in dataJSON:
try:
jsonparse = json.loads(row)
except Exception as e:
pass

Now - instead of using Exception as above - use the type of exception that's occuring at the moment so that you don't capture non-json loading related errors... (It's probably ValueError)

Parsing incomplete json array

If your data will always look somewhat similar, you could do something like this:

import json

json_string = """[{
"first": "bob",
"address": {
"street": 13301,
"zip": 1920
}
}, {
"first": "sarah",
"address": {
"street": 13301,
"zip": 1920
}
}, {"first" : "tom"
"""

while True:
if not json_string:
raise ValueError("Couldn't fix JSON")
try:
data = json.loads(json_string + "]")
except json.decoder.JSONDecodeError:
json_string = json_string[:-1]
continue
break

print(data)

This assumes that the data is a list of dicts. Step by step, the last character is removed and a missing ] appended. If the new string can be interpreted as JSON, the infinite loop breaks. Otherwise the next character is removed and so on. If there are no characters left ValueError("Couldn't fix JSON") is raised.

For the above example, it prints:

[{'first': 'bob', 'address': {'zip': 1920, 'street': 13301}}, {'first': 'sarah', 'address': {'zip': 1920, 'street': 13301}}]

How to convert JSON data into a Python object?

You could try this:

class User(object):
def __init__(self, name, username):
self.name = name
self.username = username

import json
j = json.loads(your_json)
u = User(**j)

Just create a new object, and pass the parameters as a map.


You can have a JSON with objects too:

import json
class Address(object):
def __init__(self, street, number):
self.street = street
self.number = number

def __str__(self):
return "{0} {1}".format(self.street, self.number)

class User(object):
def __init__(self, name, address):
self.name = name
self.address = Address(**address)

def __str__(self):
return "{0} ,{1}".format(self.name, self.address)

if __name__ == '__main__':
js = '''{"name":"Cristian", "address":{"street":"Sesame","number":122}}'''
j = json.loads(js)
print(j)
u = User(**j)
print(u)


Related Topics



Leave a reply



Submit