How to Parse Huge JSON File as Stream in JSON.Net

How to parse huge JSON file as stream in Json.NET?

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
// deserialize only when there's "{" character in the stream
if (reader.TokenType == JsonToken.StartObject)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
}

Parsing large JSON file in .NET

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(stringUrl))
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;

var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
Contact c = serializer.Deserialize<Contact>(reader);
Console.WriteLine(c.FirstName + " " + c.LastName);
}
}
}

Full demo here: https://dotnetfiddle.net/2TQa8p

Way to read (or edit) big JSON from / to stream

Gason translated to C# is probably quickest parser in C# language now, speed similar to C++ version (Debug Build, 2x slower in Release), memory consumption 2x bigger:
https://github.com/eltomjan/gason

(Disclaimer: I am affiliated with this C# fork of Gason.)

Parser has experimental feature - exit after parsing predefined # of lines in last array and next time continue after last item with next batch:

using Gason;

int endPos = -1;
JsonValue jsn;
Byte[] raw;

String json = @"{""id"":""0001"",""type"":""donut"",""name"":""Cake"",""ppu"":0.55,
""batters"": [ { ""id"": ""1001"", ""type"": ""Regular"" },
{ ""id"": ""1002"", ""type"": ""Chocolate"" },
{ ""id"": ""1003"", ""type"": ""Blueberry"" },
{ ""id"": ""1004"", ""type"": ""Devil's Food"" } ]
}"
raw = Encoding.UTF8.GetBytes(json);
ByteString[] keys = new ByteString[]
{
new ByteString("batters"),
null
};
Parser jsonParser = new Parser(true); // FloatAsDecimal (,JSON stack array size=32)
jsonParser.Parse(raw, ref endPos, out jsn, keys, 2, 0, 2); // batters / null path...
ValueWriter wr = new ValueWriter(); // read only 1st 2
using (StreamWriter sw = new StreamWriter(Console.OpenStandardOutput()))
{
sw.AutoFlush = true;
wr.DumpValueIterative(sw, jsn, raw);
}
Parser.Parse(raw, ref endPos, out jsn, keys, 2, endPos, 2); // and now following 2
using (StreamWriter sw = new StreamWriter(Console.OpenStandardOutput()))
{
sw.AutoFlush = true;
wr.DumpValueIterative(sw, jsn, raw);
}

It is a quick and simple option to split long JSONs now - whole 1/4GB, <18Mio rows in main array in <5,3s on a quick machine (Debug Build) using <950MB RAM, Newtonsoft.Json consumed >30s/5.36GB. If parsing only first 100 rows <330ms, >250MB RAM.

In Release Build even better <3.2s where Newton spent >29.3s (>10.8x better performance).

1st Parse:
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters": [
{
"id": "1001",
"type": "Regular"
},
{
"id": "1002",
"type": "Chocolate"
}
]
}
2nd Parse:
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters": [
{
"id": "1003",
"type": "Blueberry"
},
{
"id": "1004",
"type": "Devil's Food"
}
]
}

Read as much JSON data from the stream as possible

I assume you're going to use Json.Net library. In this case it will cater for your particular use case out of the box.

consider the following:

    var s = "{ \"SomeData\": \"blahblah\", \"SubObject\": {\"SomeData\": \"blahblah}{{\"    } } sdfsdfsdf... and some more data";
var obj1 = JsonConvert.DeserializeObject(s, new JsonSerializerSettings() {
CheckAdditionalContent = false // this is the key here, otherwise you will get an exception
});

JsonSerializer serializer = new JsonSerializer();
var obj2 = serializer.Deserialize(new JsonTextReader(new StringReader(s))); // no issues here either


Related Topics



Leave a reply



Submit