Reading rather large JSON files
The issue here is that JSON, as a format, is generally parsed in full and then handled in-memory, which for such a large amount of data is clearly problematic.
The solution to this is to work with the data as a stream - reading part of the file, working with it, and then repeating.
The best option appears to be using something like ijson - a module that will work with JSON as a stream, rather than as a block file.
Edit: Also worth a look - kashif's comment about json-streamer
and Henrik Heino's comment about bigjson
.
How to handle huge JSON files?
The big size of json file requires too much source to be handled. As mentioned in @cizario's link, it should be used some stream logic that access json objects without storing all the content of the file.
One py library that works in streaming can be found at https://www.npmjs.com/package/stream-json
Is there a memory efficient and fast way to load big JSON files?
Update
See the other answers for advice.
Original answer from 2010, now outdated
Short answer: no.
Properly dividing a json file would take intimate knowledge of the json object graph to get right.
However, if you have this knowledge, then you could implement a file-like object that wraps the json file and spits out proper chunks.
For instance, if you know that your json file is a single array of objects, you could create a generator that wraps the json file and returns chunks of the array.
You would have to do some string content parsing to get the chunking of the json file right.
I don't know what generates your json content. If possible, I would consider generating a number of managable files, instead of one huge file.
Reading rather large JSON files
The issue here is that JSON, as a format, is generally parsed in full and then handled in-memory, which for such a large amount of data is clearly problematic.
The solution to this is to work with the data as a stream - reading part of the file, working with it, and then repeating.
The best option appears to be using something like ijson - a module that will work with JSON as a stream, rather than as a block file.
Edit: Also worth a look - kashif's comment about json-streamer
and Henrik Heino's comment about bigjson
.
Parse very large JSON files with dynamic data
If you are going to collect all items to a list anyways (instead of processing them immediately one by another), using streaming API makes not much sense. It can be done much simpler:
val response = Klaxon().parseJsonObject(StringReader(testJson))
val result = response["result"]
val items = response.array<JsonObject>("items") ?: JsonArray()
...
Streaming processing is a bit more involved. First of all you would like to make sure, that the server response is not read entirely into the memory before starting processing (i.e. the parser input should not be a string, but rather an input stream. Details depend on the http client library of your choice). Secondly, you would need to provide some kind of callback, to process the items as they arrive, e.g.:
fun parse(input: Reader, onResult: (String) -> Unit, onItem: (JsonObject) -> Unit) {
JsonReader(input).use { reader ->
reader.beginObject {
while (reader.hasNext()) {
when (reader.nextName()) {
"result" -> onResult(reader.nextString())
"items" -> reader.beginArray {
while (reader.hasNext()) {
val item = Parser(passedLexer = reader.lexer, streaming = true).parse(reader) as JsonObject
onItem(item)
}
}
}
}
}
}
}
fun main(args: Array<String>) {
// "input" simulates the server response
val input = ByteArrayInputStream(testJson.encodeToByteArray())
InputStreamReader(input).use {
parse(it,
onResult = { println("""Result: $it""") },
onItem = { println(it.asIterable().joinToString(", ")) }
)
}
}
Yet better would be integrating Klaxon with the Kotlin Flow or Sequence, but I found it difficult due to the beginObject and beginArray wrappers, which do not play well with suspend functions.
Reading big arrays from big json file in php
JSON is a great format and way better alternative to XML.
In the end JSON is almost one on one convertible to XML and back.
Big files can get bigger, so we don't want to read all the stuff in memory and we don't want to parse the whole file. I had the same issue with XXL size JSON files.
I think the issue lays not in a specific programming language, but in a realisation and specifics of the formats.
I have 3 solutions for you:
- Native PHP implementation (preferred)
Almost as fast as streamed XMLReader, there is a library https://github.com/pcrov/JsonReader. Example:
use pcrov\JsonReader\JsonReader;
$reader = new JsonReader();
$reader->open("data.json");
while ($reader->read("type")) {
echo $reader->value(), "\n";
}
$reader->close();
This library will not read the whole file into memory or parse all the lines. It is step by step on command traverse through the tree of JSON object.
- Let go formats (cons: multiple conversions)
Preprocess file to a different format like XML or CSV.
There is very lightweight nodejs libs like https://www.npmjs.com/package/json2csv to CSV from JSON.
- Use some NoSQL DB (cons: additional complex software to install and maintain)
For example Redis or CouchDB(import json file to couch db-)
Related Topics
How to Get the Logical Xor of Two Variables in Python
Keep Only Date Part When Using Pandas.To_Datetime
How Would You Make a Comma-Separated String from a List of Strings
Creating a Simple Xml File Using Python
How Are Post and Get Variables Handled in Python
I Need to Securely Store a Username and Password in Python, What Are My Options
How to Open a File Using the Open with Statement
How to Time a Code Segment for Testing Performance with Pythons Timeit
Convert Pandas Timezone-Aware Datetimeindex to Naive Timestamp, But in Certain Timezone
Display a Decimal in Scientific Notation
Changing User Agent on Urllib2.Urlopen
Efficient String Matching in Apache Spark
How to Do Exponential and Logarithmic Curve Fitting in Python? I Found Only Polynomial Fitting
What's the Scope of a Variable Initialized in an If Statement
How to Find an Element That Contains Specific Text in Selenium Webdriver (Python)
Sum a List of Numbers in Python