Parse large JSON file in Nodejs
To process a file line-by-line, you simply need to decouple the reading of the file and the code that acts upon that input. You can accomplish this by buffering your input until you hit a newline. Assuming we have one JSON object per line (basically, format B):
var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
var buf = '';
stream.on('data', function(d) {
buf += d.toString(); // when data is read, stash it in a string buffer
pump(); // then process the buffer
});
function pump() {
var pos;
while ((pos = buf.indexOf('\n')) >= 0) { // keep going while there's a newline somewhere in the buffer
if (pos == 0) { // if there's more than one newline in a row, the buffer will now start with a newline
buf = buf.slice(1); // discard it
continue; // so that the next iteration will start with data
}
processLine(buf.slice(0,pos)); // hand off the line
buf = buf.slice(pos+1); // and slice the processed data off the buffer
}
}
function processLine(line) { // here's where we do something with a line
if (line[line.length-1] == '\r') line=line.substr(0,line.length-1); // discard CR (0x0D)
if (line.length > 0) { // ignore empty lines
var obj = JSON.parse(line); // parse the JSON
console.log(obj); // do something with the data here!
}
}
Each time the file stream receives data from the file system, it's stashed in a buffer, and then pump
is called.
If there's no newline in the buffer, pump
simply returns without doing anything. More data (and potentially a newline) will be added to the buffer the next time the stream gets data, and then we'll have a complete object.
If there is a newline, pump
slices off the buffer from the beginning to the newline and hands it off to process
. It then checks again if there's another newline in the buffer (the while
loop). In this way, we can process all of the lines that were read in the current chunk.
Finally, process
is called once per input line. If present, it strips off the carriage return character (to avoid issues with line endings – LF vs CRLF), and then calls JSON.parse
one the line. At this point, you can do whatever you need to with your object.
Note that JSON.parse
is strict about what it accepts as input; you must quote your identifiers and string values with double quotes. In other words, {name:'thing1'}
will throw an error; you must use {"name":"thing1"}
.
Because no more than a chunk of data will ever be in memory at a time, this will be extremely memory efficient. It will also be extremely fast. A quick test showed I processed 10,000 rows in under 15ms.
How do I read a Huge Json file into a single object using NodeJS?
Using big-json solves this problem.
npm install big-json
const fs = require('fs');
const path = require('path');
const json = require('big-json');
const readStream = fs.createReadStream('file.json');
const parseStream = json.createParseStream();
parseStream.on('data', function(pojo) {
// => receive reconstructed POJO
});
readStream.pipe(parseStream);
Best way to read a large JSON file
You probably should use SAX strategy and read the file piece by piece. The DOM strategy means you decoding entire JSON file into the tree structure. When you using SAX strategy, you having an event to get each separated value and it's key it to do anything with it.
Parse large JSON file in Nodejs and handle each object independently
There is a nice module named 'stream-json' that does exactly what you want.
It can parse JSON files far exceeding available memory.
and
StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.
Here is a very basic example:
const StreamArray = require('stream-json/streamers/StreamArray');const path = require('path');const fs = require('fs');
const jsonStream = StreamArray.withParser();
//You'll get json objects here//Key is an array-index herejsonStream.on('data', ({key, value}) => { console.log(key, value);});
jsonStream.on('end', () => { console.log('All done');});
const filename = path.join(__dirname, 'sample.json');fs.createReadStream(filename).pipe(jsonStream.input);
Extract and Parse Huge incomplete JSON in NodeJS
I think you need to fix the JSON structure first.
Just try this approach:
import untruncateJson from "untruncate-json";
const str = `[{ "name": { "first": "foo", "last": "bar" } }, { "name": {
"first": "ind", "last": "go`;
const fixJson = untruncateJson.default;
const json = fixJson(str);
console.log(json);
Related Topics
Angular 2: How to Detect Changes in an Array? (@Input Property)
How to Test If a Variable Does Not Equal Either of Two Values
Chrome Extension - How to Get Http Response Body
What Is "Context" in Jquery Selector
Which Characters Are Valid/Invalid in a JSON Key Name
Replace a Regex Capture Group with Uppercase in JavaScript
How to Delete Document from Firestore Using Where Clause
How to Scrape Pages with Dynamic Content Using Node.Js
JavaScript Syntax (0, Fn)(Args)
How to Detect Escape Key Press with Pure Js or Jquery
Correct Use of Arrow Functions in React
Create Array of Unique Objects by Property
How to Return a View from an Ajax Call in Laravel 5
Adding Rows Dynamically with Jquery