Parsing Large JSON File in .Net

Parsing large JSON file in .NET

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(stringUrl))
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;

var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
Contact c = serializer.Deserialize<Contact>(reader);
Console.WriteLine(c.FirstName + " " + c.LastName);
}
}
}

Full demo here: https://dotnetfiddle.net/2TQa8p

How to parse huge JSON file as stream in Json.NET?

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
// deserialize only when there's "{" character in the stream
if (reader.TokenType == JsonToken.StartObject)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
}

Reading Large JSON file into variable in C#.net

Use 64 bit, check RredCat's answer on a similar question:

Newtonsoft.Json - Out of memory exception while deserializing big object

NewtonSoft Jason Performance Tips

Read the article by David Cox about tokenizing:

"The basic approach is to use a JsonTextReader object, which is part of the Json.NET library. A JsonTextReader reads a JSON file one token at a time. It, therefore, avoids the overhead of reading the entire file into a string. As tokens are read from the file, objects are created and pushed onto and off of a stack. When the end of the file is reached, the top of the stack contains one object — the top of a very big tree of objects corresponding to the objects in the original JSON file"

Parsing Big Records with Json.NET

Deserializing large json from WebService using Json.NET

What you can do is to adopt the basic approach of Issues parsing a 1GB json file using JSON.NET and Deserialize json array stream one item at a time, which is to stream through the JSON and deserialize and yield return each object; but in addition apply some stateful filtering expression to deserialize only StartObject tokens matching the path d.results[*].

To do this, first define the following interface and extension method:

public interface IJsonReaderFilter
{
public bool ShouldDeserializeToken(JsonReader reader);
}

public static class JsonExtensions
{
public static IEnumerable<T> DeserializeSelectedTokens<T>(Stream stream, IJsonReaderFilter filter, JsonSerializerSettings settings = null, bool leaveOpen = false)
{
using (var sr = new StreamReader(stream, leaveOpen : leaveOpen))
using (var reader = new JsonTextReader(sr))
foreach (var item in DeserializeSelectedTokens<T>(reader, filter, settings))
yield return item;
}

public static IEnumerable<T> DeserializeSelectedTokens<T>(JsonReader reader, IJsonReaderFilter filter, JsonSerializerSettings settings = null)
{
var serializer = JsonSerializer.CreateDefault(settings);
while (reader.Read())
if (filter.ShouldDeserializeToken(reader))
yield return serializer.Deserialize<T>(reader);
}
}

Now, to filter only those items matching the path d.results[*], define the following filter:

class ResultsFilter : IJsonReaderFilter
{
const string path = "d.results";
const int pathDepth = 2;
bool inArray = false;

public bool ShouldDeserializeToken(JsonReader reader)
{
if (!inArray && reader.Depth == pathDepth && reader.TokenType == JsonToken.StartArray && string.Equals(reader.Path, "d.results", StringComparison.OrdinalIgnoreCase))
{
inArray = true;
return false;
}
else if (inArray && reader.Depth == pathDepth + 1 && reader.TokenType == JsonToken.StartObject)
return true;
else if (inArray && reader.Depth == pathDepth && reader.TokenType == JsonToken.EndArray)
{
inArray = false;
return false;
}
else
{
return false;
}
}
}

Next, create the following data model for each result:

public class Metadata
{
public string id { get; set; }
public string uri { get; set; }
public string type { get; set; }
}

public class Result
{
public Metadata metadata { get; set; }
public string ID { get; set; }
public string Value1 { get; set; }
public string Value2 { get; set; }
public string Value3 { get; set; }
}

And now you can deserialize your JSON stream incrementally as follows:

foreach (var result in JsonExtensions.DeserializeSelectedTokens<Result>(dataStream, new ResultsFilter()))
{
// Process each result in some manner.
result.Dump();
}

Demo fiddle here.

How to Save Large Json Data?

Instead of serializing to a string, and then writing the string to a stream, stream it directly:

using var stream = File.Create("file.json");
JsonSerializer.Serialize(stream, content, new JsonSerializerOptions
{
WriteIdented = true
});


Related Topics



Leave a reply



Submit