Python JSON Parser Allow Duplicate Keys

Python json parser allow duplicate keys

You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.

However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:

from json import JSONDecoder

def parse_object_pairs(pairs):
return pairs

data = """
{"foo": {"baz": 42}, "foo": 7}
"""

decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj

Output:

[(u'foo', [(u'baz', 42)]), (u'foo', 7)]

How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.

So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.


Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:

from collections import OrderedDict
from json import JSONDecoder

def make_unique(key, dct):
counter = 0
unique_key = key

while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key

def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value

return dct

data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""

decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj

Output:

OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])

The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.

Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.

Dealing with JSON with duplicate keys

You can't have duplicate keys. You can change the object to array instead.

[
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]

json.loads allows duplicate keys in a dictionary, overwriting the first value

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:

The names within an object SHOULD be unique.

From rfc 2119:

SHOULD This word, or the adjective "RECOMMENDED", mean that there

may exist valid reasons in particular circumstances to ignore a

particular item, but the full implications must be understood and

carefully weighed before choosing a different course.

import json

def dict_raise_on_duplicates(ordered_pairs):
"""Reject duplicate keys."""
d = {}
for k, v in ordered_pairs:
if k in d:
raise ValueError("duplicate key: %r" % (k,))
else:
d[k] = v
return d

json.loads(raw_post_data, object_pairs_hook=dict_raise_on_duplicates)
# -> ValueError: duplicate key: u'1'

Python3 JSON parse with duplicate keys

clean_dict = {x['name_space']: x['value'] for x in jsonObj['data']} would give you the following dict:

{'name': 'Angelina', 'surname': 'Jolie', 'year': '1975'}

That code uses the x['name_space'] value as your new keys in the dict.

Then you should be able to print it however you'd like, such as print(clean_dict.values()).

Does JSON syntax allow duplicate keys in an object?

From the standard (p. ii):

It is expected that other standards will refer to this one, strictly adhering to the JSON text format, while
imposing restrictions on various encoding details. Such standards may require specific behaviours. JSON
itself specifies no behaviour.

Further down in the standard (p. 2), the specification for a JSON object:

An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs.
A name is a string. A single colon token follows each name, separating the name from the value. A single
comma token separates a value from a following name.

Diagram for JSON Object

It does not make any mention of duplicate keys being invalid or valid, so according to the specification I would safely assume that means they are allowed.

That most implementations of JSON libraries do not accept duplicate keys does not conflict with the standard, because of the first quote.

Here are two examples related to the C++ standard library. When deserializing some JSON object into a std::map it would make sense to refuse duplicate keys. But when deserializing some JSON object into a std::multimap it would make sense to accept duplicate keys as normal.

load duplicate keys from nested json file as dictionary of list

You can use a custom duplicate key resolver function that turns the values of the value keys into a list:

def value_resolver(pairs):
if all(k == 'value' for k, _ in pairs):
return [v for _, v in pairs]
return dict(pairs)

so that:

json.load(f, object_pairs_hook=value_resolver)

returns:

{'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}

And to dump the new data structure back to the original JSON format by converting lists to dicts with duplicate value keys, you can use a custom json.JSONEncoder subclass:

class restore_value(json.JSONEncoder):
def encode(self, o):
if isinstance(o, dict):
return '{%s}' % ', '.join(': '.join((json.encoder.py_encode_basestring(k), self.encode(v))) for k, v in o.items())
if isinstance(o, list):
return '{%s}' % ', '.join('"value": %s' % self.encode(v) for v in o)
return super().encode(o)

so that:

d = {'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}
print(json.dumps(d, cls=restore_value))

would output:

{"details": {"hawk_branch": {"tandem": {"value": "4210bnd72"}}, "uclif_branch": {"tandem": {"value": "e2nc712nma89", "value": "23s24212", "value": "12338cm82"}}}}



Related Topics



Leave a reply



Submit