Processing Large Json Files in PHP

Parse large JSON file

This really depends on what the json files contain.

If opening the file one shot into memory is not an option, your only other option, as you eluded to, is fopen/fgets.

Reading line by line is possible, and if these json objects have a consistent structure, you can easily detect when a json object in a file starts, and ends.

Once you collect a whole object, you insert it into a db, then go on to the next one.

There isn't much more to it. the algorithm to detect the beginning and end of a json object may get complicating depending on your data source, but I hvae done something like this before with a far more complex structure (xml) and it worked fine.

Reading big arrays from big json file in php

JSON is a great format and way better alternative to XML.
In the end JSON is almost one on one convertible to XML and back.

Big files can get bigger, so we don't want to read all the stuff in memory and we don't want to parse the whole file. I had the same issue with XXL size JSON files.

I think the issue lays not in a specific programming language, but in a realisation and specifics of the formats.

I have 3 solutions for you:

  1. Native PHP implementation (preferred)

Almost as fast as streamed XMLReader, there is a library https://github.com/pcrov/JsonReader. Example:

use pcrov\JsonReader\JsonReader;

$reader = new JsonReader();
$reader->open("data.json");

while ($reader->read("type")) {
echo $reader->value(), "\n";
}
$reader->close();

This library will not read the whole file into memory or parse all the lines. It is step by step on command traverse through the tree of JSON object.


  1. Let go formats (cons: multiple conversions)

Preprocess file to a different format like XML or CSV.
There is very lightweight nodejs libs like https://www.npmjs.com/package/json2csv to CSV from JSON.


  1. Use some NoSQL DB (cons: additional complex software to install and maintain)

For example Redis or CouchDB(import json file to couch db-)

Processing large JSON files in PHP

I decided on working on an event based parser. It's not quite done yet and will edit the question with a link to my work when I roll out a satisfying version.

EDIT:

I finally worked out a version of the parser that I am satisfied with. It's available on GitHub:

https://github.com/kuma-giyomu/JSONParser

There's probably room for some improvement and am welcoming feedback.

PHP | json_decode huge json file

Another alternative is to use halaxa/json-machine.

Usage in case of iteration over JSON is the same as in case of json_decode, but it will not hit memory limit no matter how big your file is. No need to implement anything, just your foreach.

Example:

$users = \JsonMachine\JsonMachine::fromFile('500MB-users.json');

foreach ($users as $id => $user) {
// process $user as usual
}

See github readme for more details.

Large json data is not decoding in php

The (possible) smart quotes around powder coated is messing up the json string. To fix it, use

json_decode(utf8_decode($cartDecodeData));

which will convert those quotes.

Best way to manipulate large json objects

I think the approach you're using already is probably the most practical, but I'm intrigued by your idea of searching the JSON file directly.

Here's how I'd take a stab at implementing this, having worked on a Web application that used the similar approach of searching an XML file on disk rather than a database (and, remarkably, was still fast enough for production use):

  • Sort the JSON data first. Creating a new master file with the objects reordered to match how they're indexed in the database will maximize the efficiency of a linear search through the data.

  • Use a streaming JSON parser for searches. This will allow the file to be parsed object-by-object without needing to load the entire document in memory first. If the file is sorted, only half the document on average will need to be parsed for each lookup.

    Streaming JSON parsers are rare, but they exist. Salsify has created one for PHP.

  • Benchmark searching the file directly using the above two strategies. You may discover this is enough to make the application usable, especially if it supports only a small number of users. If not:

  • Build separate indices on disk. Instead of having the application search the entire JSON file directly, parse it once when it's received and create one or more index files that associate key values with byte offsets into the original file. The application can then search a (much smaller) index file for the object it needs; once it retrieves the matching offset, it can seek immediately to the corresponding JSON object in the master file and parse it directly.

  • Consider using a more efficient data format. JSON is lightweight, but there may be better options. You might experiment with
    generating a new master file using serialize to output a "frozen" representation of each parsed JSON object in PHP's native serialization format. The application can then use unserialize to obtain an array or object it can use immediately.

    Combining this with the use of index files, especially if they're generated as trees rather than lists, will probably give you about the best performance you can hope for from a simple, purely filesystem-based solution.



Related Topics



Leave a reply



Submit