How to Use the 'JSON' Module to Read in One JSON Object at a Time

How do I use the 'json' module to read in one JSON object at a time?

Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.

The following will yield complete objects as the parser finds them:

from json import JSONDecoder
from functools import partial

def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
buffer = ''
for chunk in iter(partial(fileobj.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:].lstrip()
except ValueError:
# Not enough data to decode, read more
break

This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.

Use it like this:

with open('yourfilename', 'r') as infh:
for data in json_parse(infh):
# process object

Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.

Loading and parsing a JSON file with multiple JSON objects

You have a JSON Lines format text file. You need to parse your file line by line:

import json

data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))

Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.

Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.

If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.

How do I dump and load multiple python objects into and from a json file?

You'd generally write one JSON object to a file; that object can contain your other objects:

json_data = {
'p_id': p_id,
'word_list': word_list,
# ...
}
with open('data.json', 'w') as fp:
json.dump(json_data, fp, sort_keys=True, indent=4)

Now all you have to do is read that one object and address the values by the same keys.

If you must write multiple JSON documents, avoid using newlines so you can read the file line by line, as parsing the file one JSON object at a time is a lot more involved.

How to extract multiple JSON objects from one file?

Use a json array, in the format:

[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]

Then import it into your python code

import json

with open('file.json') as json_file:

data = json.load(json_file)

Now the content of data is an array with dictionaries representing each of the elements.

You can access it easily, i.e:

data[0]["ID"]

how to read array of json objects from json file in python

Open the file with any text editor. Add a [ at the very beginning of the file, and a ] at the very end. This will transform the data you have into an actual valid JSON array.

Then use the json module to work with it.

import json
arr = json.loads("example.json")
# Do nifty stuff with resulting array.

How I can I lazily read multiple JSON values from a file/stream in Python?

Here's a much, much simpler solution. The secret is to try, fail, and use the information in the exception to parse correctly. The only limitation is the file must be seekable.

def stream_read_json(fn):
import json
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except json.JSONDecodeError as e:
f.seek(start_pos)
json_str = f.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj

Edit: just noticed that this will only work for Python >=3.5. For earlier, failures return a ValueError, and you have to parse out the position from the string, e.g.

def stream_read_json(fn):
import json
import re
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except ValueError as e:
f.seek(start_pos)
end_pos = int(re.match('Extra data: line \d+ column \d+ .*\(char (\d+).*\)',
e.args[0]).groups()[0])
json_str = f.read(end_pos)
obj = json.loads(json_str)
start_pos += end_pos
yield obj


Related Topics



Leave a reply



Submit