json.loads allows duplicate keys in a dictionary, overwriting the first value
The rfc 4627 for application/json
media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
From rfc 2119:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
import json
def dict_raise_on_duplicates(ordered_pairs):
"""Reject duplicate keys."""
d = {}
for k, v in ordered_pairs:
if k in d:
raise ValueError("duplicate key: %r" % (k,))
else:
d[k] = v
return d
json.loads(raw_post_data, object_pairs_hook=dict_raise_on_duplicates)
# -> ValueError: duplicate key: u'1'
Dealing with JSON with duplicate keys
You can't have duplicate keys. You can change the object to array instead.
[
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]
make a dict/json from string with duplicate keys Python
something like the following can be done.
import json
def join_duplicate_keys(ordered_pairs):
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) == list:
d[k].append(v)
else:
newlist = []
newlist.append(d[k])
newlist.append(v)
d[k] = newlist
else:
d[k] = v
return d
raw_post_data = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
newdict = json.loads(raw_post_data, object_pairs_hook=join_duplicate_keys)
print (newdict)
Please note that above code depends on value type, if type(d[k]) == list
. So if original string itself gives a list then there could be some error handling required to make the code robust.
Python json parser allow duplicate keys
You can use JSONDecoder.object_pairs_hook
to customize how JSONDecoder
decodes objects. This hook function will be passed a list of (key, value)
pairs that you usually do some processing on, and then turn into a dict
.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value)
pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key]
would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique
function is responsible for returning a collision-free key. In this example it just suffixes the key with _n
where n
is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook
receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict
, I included that as well.
JSON convert duplicate entries into array, but recover original order
One workaround I thought of was to generate duplicated keys for each entry, anna_1, anna_2, etc as suggested here: https://stackoverflow.com/a/29323197/7471760, so that one can have unique entries, and then hook the pair to an OrderedDict.
Other option would be to return in the hook the key-value tuples directly and process it later https://stackoverflow.com/a/29322077/7471760.
However, it was quite useful for me to keep the array structure, and what suited me most was to use this workaround that keeps the order explicitly in an extra key:
def array_on_duplicates_keep_order(ordered_pairs):
"""Convert duplicate keys to arrays and store order on an extra key."""
# https://www.semicolonworld.com/question/56998/python-json-parser-allow-duplicate-keys
# https://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value
d = {}
order = 0
for k, v in ordered_pairs:
if type(v) is dict:
v['o'] = order
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
order += 1
return d
which produces:
jobj = json.loads(jstring, object_pairs_hook=array_on_duplicates_keep_order)
{'anna': [{'age': 23, 'color': 'green', 'o': 0},
{'age': 41, 'color': 'pink', 'o': 4}],
'john': [{'age': 35, 'color': 'blue', 'o': 1},
{'age': 31, 'color': 'black', 'o': 3}],
'laura': {'age': 32, 'color': 'red', 'o': 2}}
Finally, I can recover the original order of students by using a named tuple and sorting by the order key:
class Student(NamedTuple):
name: str
age: int
color: str
o: int
studentList = []
for k, v in jobj.items():
if not type(v) is list:
studentList.append(Student(k, v['age'], v['color'], v['o']))
else:
for s in v:
studentList.append(Student(k, s['age'], s['color'], s['o']))
orderedList = sorted(studentList, key=lambda s: s.o)
Which gives me what I wanted, without changing the input and still using JSON as intermediate storage variable:
studentList
[Student(name='anna', age=23, color='green', o=0),
Student(name='anna', age=41, color='pink', o=4),
Student(name='john', age=35, color='blue', o=1),
Student(name='john', age=31, color='black', o=3),
Student(name='laura', age=32, color='red', o=2)]
orderedList
[Student(name='anna', age=23, color='green', o=0),
Student(name='john', age=35, color='blue', o=1),
Student(name='laura', age=32, color='red', o=2),
Student(name='john', age=31, color='black', o=3),
Student(name='anna', age=41, color='pink', o=4)]
Related Topics
How to Efficiently Handle European Decimal Separators Using the Pandas Read_CSV Function
Generalise Slicing Operation in a Numpy Array
@Csrf_Exempt Does Not Work on Generic View Based Class
Using Psycopg2 with Lambda to Update Redshift (Python)
Attributeerror: 'Client' Object Has No Attribute 'Send_Message' (Discord Bot)
How to Use Python Numpy.Savetxt to Write Strings and Float Number to an Ascii File
How to Return a Subset of a List That Matches a Condition
Python: Excluding Modules Pyinstaller
How to Let a Raw_Input Repeat Until I Want to Quit
How to Force/Ensure Class Attributes Are a Specific Type
Get Column Index from Column Name in Python Pandas
How to Make Sessions Timeout in Flask
How to Share Numpy Random State of a Parent Process with Child Processes
How to Get Exception Message in Python Properly
Open Cv Error: (-215) Scn == 3 || Scn == 4 in Function Cvtcolor
Axes Class - Set Explicitly Size (Width/Height) of Axes in Given Units