How do I use the 'json' module to read in one JSON object at a time?
Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode()
method.
The following will yield complete objects as the parser finds them:
from json import JSONDecoder
from functools import partial
def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
buffer = ''
for chunk in iter(partial(fileobj.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:].lstrip()
except ValueError:
# Not enough data to decode, read more
break
This function will read chunks from the given file object in buffersize
chunks, and have the decoder
object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.
Use it like this:
with open('yourfilename', 'r') as infh:
for data in json_parse(infh):
# process object
Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.
Loading and parsing a JSON file with multiple JSON objects
You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
How do I dump and load multiple python objects into and from a json file?
You'd generally write one JSON object to a file; that object can contain your other objects:
json_data = {
'p_id': p_id,
'word_list': word_list,
# ...
}
with open('data.json', 'w') as fp:
json.dump(json_data, fp, sort_keys=True, indent=4)
Now all you have to do is read that one object and address the values by the same keys.
If you must write multiple JSON documents, avoid using newlines so you can read the file line by line, as parsing the file one JSON object at a time is a lot more involved.
How to extract multiple JSON objects from one file?
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
how to read array of json objects from json file in python
Open the file with any text editor. Add a [
at the very beginning of the file, and a ]
at the very end. This will transform the data you have into an actual valid JSON array.
Then use the json module to work with it.
import json
arr = json.loads("example.json")
# Do nifty stuff with resulting array.
How I can I lazily read multiple JSON values from a file/stream in Python?
Here's a much, much simpler solution. The secret is to try, fail, and use the information in the exception to parse correctly. The only limitation is the file must be seekable.
def stream_read_json(fn):
import json
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except json.JSONDecodeError as e:
f.seek(start_pos)
json_str = f.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
Edit: just noticed that this will only work for Python >=3.5. For earlier, failures return a ValueError, and you have to parse out the position from the string, e.g.
def stream_read_json(fn):
import json
import re
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except ValueError as e:
f.seek(start_pos)
end_pos = int(re.match('Extra data: line \d+ column \d+ .*\(char (\d+).*\)',
e.args[0]).groups()[0])
json_str = f.read(end_pos)
obj = json.loads(json_str)
start_pos += end_pos
yield obj
Related Topics
How to Read and Write Ini File with Python3
How to Generate a Random Number with a Specific Amount of Digits
Python - Is a Dictionary Slow to Find Frequency of Each Character
Is It Safe to Replace a Self Object by Another Object of the Same Type in a Method
Save Classifier to Disk in Scikit-Learn
Why Is Semicolon Allowed in This Python Snippet
How to Get Rid of Python Tkinter Root Window
Find Length of Sequences of Identical Values in a Numpy Array (Run Length Encoding)
Repeating Each Element of a Numpy Array 5 Times
Python: Pandas Series - Why Use Loc
How to Make a Selenium Script Undetectable Using Geckodriver and Firefox Through Python
Unsupported Operand Type(S) for +: 'Int' and 'Str'
How to Sort Objects by Multiple Keys
Django - How to Create a File and Save It to a Model's Filefield