Parse Large JSON File in Nodejs

Parse large JSON file in Nodejs

To process a file line-by-line, you simply need to decouple the reading of the file and the code that acts upon that input. You can accomplish this by buffering your input until you hit a newline. Assuming we have one JSON object per line (basically, format B):

var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
var buf = '';

stream.on('data', function(d) {
buf += d.toString(); // when data is read, stash it in a string buffer
pump(); // then process the buffer
});

function pump() {
var pos;

while ((pos = buf.indexOf('\n')) >= 0) { // keep going while there's a newline somewhere in the buffer
if (pos == 0) { // if there's more than one newline in a row, the buffer will now start with a newline
buf = buf.slice(1); // discard it
continue; // so that the next iteration will start with data
}
processLine(buf.slice(0,pos)); // hand off the line
buf = buf.slice(pos+1); // and slice the processed data off the buffer
}
}

function processLine(line) { // here's where we do something with a line

if (line[line.length-1] == '\r') line=line.substr(0,line.length-1); // discard CR (0x0D)

if (line.length > 0) { // ignore empty lines
var obj = JSON.parse(line); // parse the JSON
console.log(obj); // do something with the data here!
}
}

Each time the file stream receives data from the file system, it's stashed in a buffer, and then pump is called.

If there's no newline in the buffer, pump simply returns without doing anything. More data (and potentially a newline) will be added to the buffer the next time the stream gets data, and then we'll have a complete object.

If there is a newline, pump slices off the buffer from the beginning to the newline and hands it off to process. It then checks again if there's another newline in the buffer (the while loop). In this way, we can process all of the lines that were read in the current chunk.

Finally, process is called once per input line. If present, it strips off the carriage return character (to avoid issues with line endings – LF vs CRLF), and then calls JSON.parse one the line. At this point, you can do whatever you need to with your object.

Note that JSON.parse is strict about what it accepts as input; you must quote your identifiers and string values with double quotes. In other words, {name:'thing1'} will throw an error; you must use {"name":"thing1"}.

Because no more than a chunk of data will ever be in memory at a time, this will be extremely memory efficient. It will also be extremely fast. A quick test showed I processed 10,000 rows in under 15ms.

How do I read a Huge Json file into a single object using NodeJS?

Using big-json solves this problem.

npm install big-json
const fs = require('fs');
const path = require('path');
const json = require('big-json');

const readStream = fs.createReadStream('file.json');
const parseStream = json.createParseStream();

parseStream.on('data', function(pojo) {
// => receive reconstructed POJO
});

readStream.pipe(parseStream);

Best way to read a large JSON file

You probably should use SAX strategy and read the file piece by piece. The DOM strategy means you decoding entire JSON file into the tree structure. When you using SAX strategy, you having an event to get each separated value and it's key it to do anything with it.

Parse large JSON file in Nodejs and handle each object independently

There is a nice module named 'stream-json' that does exactly what you want.

It can parse JSON files far exceeding available memory.

and

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

Here is a very basic example:

const StreamArray = require('stream-json/streamers/StreamArray');const path = require('path');const fs = require('fs');
const jsonStream = StreamArray.withParser();
//You'll get json objects here//Key is an array-index herejsonStream.on('data', ({key, value}) => { console.log(key, value);});
jsonStream.on('end', () => { console.log('All done');});
const filename = path.join(__dirname, 'sample.json');fs.createReadStream(filename).pipe(jsonStream.input);

Extract and Parse Huge incomplete JSON in NodeJS

I think you need to fix the JSON structure first.
Just try this approach:

import untruncateJson from "untruncate-json";

const str = `[{ "name": { "first": "foo", "last": "bar" } }, { "name": {
"first": "ind", "last": "go`;

const fixJson = untruncateJson.default;

const json = fixJson(str);

console.log(json);


Related Topics



Leave a reply



Submit