Deserialize JSON Array Stream One Item at a Time

Deserialize json array stream one item at a time

In order to read the JSON incrementally, you'll need to use a JsonTextReader in combination with a StreamReader. But, you don't necessarily have to read all the JSON manually from the reader. You should be able to leverage the Linq-To-JSON API to load each large object from the reader so that you can work with it more easily.

For a simple example, say I had a JSON file that looked like this:

[
{
"name": "foo",
"id": 1
},
{
"name": "bar",
"id": 2
},
{
"name": "baz",
"id": 3
}
]

Code to read it incrementally from the file might look something like the following. (In your case you would replace the FileStream with your response stream.)

using (FileStream fs = new FileStream(@"C:\temp\data.json", FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
using (JsonTextReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
// Load each object from the stream and do something with it
JObject obj = JObject.Load(reader);
Console.WriteLine(obj["id"] + " - " + obj["name"]);
}
}
}

Output of the above would look like this:

1 - foo
2 - bar
3 - baz

Deserializing large json from WebService using Json.NET

What you can do is to adopt the basic approach of Issues parsing a 1GB json file using JSON.NET and Deserialize json array stream one item at a time, which is to stream through the JSON and deserialize and yield return each object; but in addition apply some stateful filtering expression to deserialize only StartObject tokens matching the path d.results[*].

To do this, first define the following interface and extension method:

public interface IJsonReaderFilter
{
public bool ShouldDeserializeToken(JsonReader reader);
}

public static class JsonExtensions
{
public static IEnumerable<T> DeserializeSelectedTokens<T>(Stream stream, IJsonReaderFilter filter, JsonSerializerSettings settings = null, bool leaveOpen = false)
{
using (var sr = new StreamReader(stream, leaveOpen : leaveOpen))
using (var reader = new JsonTextReader(sr))
foreach (var item in DeserializeSelectedTokens<T>(reader, filter, settings))
yield return item;
}

public static IEnumerable<T> DeserializeSelectedTokens<T>(JsonReader reader, IJsonReaderFilter filter, JsonSerializerSettings settings = null)
{
var serializer = JsonSerializer.CreateDefault(settings);
while (reader.Read())
if (filter.ShouldDeserializeToken(reader))
yield return serializer.Deserialize<T>(reader);
}
}

Now, to filter only those items matching the path d.results[*], define the following filter:

class ResultsFilter : IJsonReaderFilter
{
const string path = "d.results";
const int pathDepth = 2;
bool inArray = false;

public bool ShouldDeserializeToken(JsonReader reader)
{
if (!inArray && reader.Depth == pathDepth && reader.TokenType == JsonToken.StartArray && string.Equals(reader.Path, "d.results", StringComparison.OrdinalIgnoreCase))
{
inArray = true;
return false;
}
else if (inArray && reader.Depth == pathDepth + 1 && reader.TokenType == JsonToken.StartObject)
return true;
else if (inArray && reader.Depth == pathDepth && reader.TokenType == JsonToken.EndArray)
{
inArray = false;
return false;
}
else
{
return false;
}
}
}

Next, create the following data model for each result:

public class Metadata
{
public string id { get; set; }
public string uri { get; set; }
public string type { get; set; }
}

public class Result
{
public Metadata metadata { get; set; }
public string ID { get; set; }
public string Value1 { get; set; }
public string Value2 { get; set; }
public string Value3 { get; set; }
}

And now you can deserialize your JSON stream incrementally as follows:

foreach (var result in JsonExtensions.DeserializeSelectedTokens<Result>(dataStream, new ResultsFilter()))
{
// Process each result in some manner.
result.Dump();
}

Demo fiddle here.

How to filter objects in a JSON array as they are read from a large file

What you can do is to adopt the basic approaches of Issues parsing a 1GB json file using JSON.NET and Deserialize json array stream one item at a time, which is to stream through the array and yield return each item; but in addition apply a where expression to filter incomplete items, or a select clause to transform some intermediate deserialized object such as a JObject or a DTO to your final data model. By applying the where clause during streaming, unwanted objects will never get added to the list being deserialized, and thus will get cleaned up by the garbage collector during streaming. Filtering array contents while streaming can be done at the root level, when the root JSON container is an array, or as part of some custom JsonConverter for List<T> when the array to be deserialized is nested with some outer JSON.

As a concrete example, consider your first JSON example. You would like to deserialize it to a data model that looks like:

public class PurpleAirData
{
public PurpleAirData(DateTime createdAt, double airQuality)
{
this.CreatedAt = createdAt;
this.AirQuality = airQuality;
}
// Required properties
public DateTime CreatedAt { get; set; }
public double AirQuality { get; set; }

// Optional properties, thus nullable
public double? Temperature { get; set; }
public double? Humidity { get; set; }
}

public class RootObject
{
public Channel channel { get; set; } // Define this using http://json2csharp.com/
public List<PurpleAirData> feeds { get; set; }
}

To do this, first introduce the following extension methods:

public static partial class JsonExtensions
{
public static IEnumerable<T> DeserializeArrayItems<T>(this JsonSerializer serializer, JsonReader reader)
{
if (reader.MoveToContent().TokenType == JsonToken.Null)
yield break;
if (reader.TokenType != JsonToken.StartArray)
throw new JsonSerializationException(string.Format("Current token {0} is not an array at path {1}", reader.TokenType, reader.Path));
// Process the collection items
while (reader.Read())
{
switch (reader.TokenType)
{
case JsonToken.EndArray:
yield break;

case JsonToken.Comment:
break;

default:
yield return serializer.Deserialize<T>(reader);
break;
}
}
// Should not come here.
throw new JsonReaderException(string.Format("Unclosed array at path {0}", reader.Path));
}

public static JsonReader MoveToContent(this JsonReader reader)
{
if (reader.TokenType == JsonToken.None)
reader.Read();
while (reader.TokenType == JsonToken.Comment && reader.Read())
;
return reader;
}
}

Next, introduce the following JsonConverter for List<PurpleAirData>:

class PurpleAirListConverter : JsonConverter
{
class PurpleAirDataDTO
{
// Required properties
[JsonProperty("created_at")]
public DateTime? CreatedAt { get; set; }
[JsonProperty("Field8")]
public double? AirQuality { get; set; }

// Optional properties
[JsonProperty("Field6")]
public double? Temperature { get; set; }
[JsonProperty("Field7")]
public double? Humidity { get; set; }
}

public override bool CanConvert(Type objectType)
{
return objectType == typeof(List<PurpleAirData>);
}

public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
if (reader.MoveToContent().TokenType == JsonToken.Null)
return null;
var list = existingValue as List<PurpleAirData> ?? new List<PurpleAirData>();

var query = from dto in serializer.DeserializeArrayItems<PurpleAirDataDTO>(reader)
where dto != null && dto.CreatedAt != null && dto.AirQuality != null
select new PurpleAirData(dto.CreatedAt.Value, dto.AirQuality.Value) { Humidity = dto.Humidity, Temperature = dto.Temperature };

list.AddRange(query);

return list;
}

public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}

The purpose of this converter is to stream through the "feeds" array, deserialize each JSON item to an intermediate PurpleAirDataDTO, check for the presence of required members, then convert the DTO to the final model.

Finally, deserialize the entire file as follows:

static RootObject DeserializePurpleAirDataFile(TextReader textReader)
{
var settings = new JsonSerializerSettings
{
Converters = { new PurpleAirListConverter() },
NullValueHandling = NullValueHandling.Ignore,
};
var serializer = JsonSerializer.CreateDefault(settings);
using (var reader = new JsonTextReader(textReader) { CloseInput = false })
{
return serializer.Deserialize<RootObject>(reader);
}
}

Demo fiddle here.

When the array to be filtered is the root container in the JSON file, the extension method JsonExtensions.DeserializeArrayItems() can be used directly, e.g. as follows:

static bool IsValid(WeatherData data)
{
// Return false if certain fields are missing

// Otherwise return true;
return true;
}

static List<WeatherData> DeserializeFilteredWeatherData(TextReader textReader)
{
var serializer = JsonSerializer.CreateDefault();
using (var reader = new JsonTextReader(textReader) { CloseInput = false })
{
var query = from data in serializer.DeserializeArrayItems<WeatherData>(reader)
where IsValid(data)
select data;

return query.ToList();
}
}

Notes:

  • nullable types can be used to track whether or not value type members were actually encountered during deserialization.

  • Here the conversion from DTO to final data model is done manually, but for more complicated models something like automapper could be used instead.

How to control deserialization of large array of heterogenous objects in JSON.net

Putting together answers to

  • Deserialize json array stream one item at a time
  • Deserializing polymorphic json classes without type information using json.net,

First, assume you have a custom SerializationBinder (or something similar) that will map type names to types.

Next, you can enumerate through the top-level objects in streaming JSON data (walking into top-level arrays) with the following extension method:

public static class JsonExtensions
{
public static IEnumerable<JObject> WalkObjects(TextReader textReader)
{
using (JsonTextReader reader = new JsonTextReader(textReader))
{
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
JObject obj = JObject.Load(reader);
if (obj != null)
{
yield return obj;
}
}
}
}
}
}

Then, assuming you have some stream for reading your JSON data, you can stream the JSON in and convert top-level array elements one by one for processing as follows:

        SerializationBinder binder = new MyBinder(); // Your custom binder.
using (var stream = GetStream(json))
using (var reader = new StreamReader(stream, Encoding.Unicode))
{
var assemblyName = System.Reflection.Assembly.GetExecutingAssembly().GetName().Name;
var items = from obj in JsonExtensions.WalkObjects(reader)
let jType = obj["Type"]
let jInstance = obj["Instance"]
where jType != null && jType.Type == JTokenType.String
where jInstance != null && jInstance.Type == JTokenType.Object
let type = binder.BindToType(assemblyName, (string)jType)
where type != null
select jInstance.ToObject(type); // Deserialize to bound type!

foreach (var item in items)
{
// Handle each item.
Debug.WriteLine(JsonConvert.SerializeObject(item));
}
}

How do I Stream a Json array that is within a json object?

I solved it using Jackson API.

JsonFactory f = new MappingJsonFactory();

JsonParser jp = new JsonFactory().createParser(new File("C:\\Users\\Downloads\\rootpart.json"));

ObjectMapper mapper = new ObjectMapper();
ReportData reportData = null;

JsonToken current;
current = jp.nextToken();
if (current != JsonToken.START_OBJECT) {
System.out.println("Error: root should be object: quitting.");
return;
}
while (jp.nextToken() != JsonToken.END_OBJECT) {
String fieldName = jp.getCurrentName();
current = jp.nextToken();
if (fieldName.equals("Report_Entry")) {
if (current == JsonToken.START_ARRAY) {
while (current != JsonToken.END_ARRAY) {
if (current == JsonToken.START_OBJECT) {
reportData = mapper.readValue(jp, ReportData.class);
System.out.println(reportData.getKey());
}
current = jp.nextToken();
}
} else {
jp.skipChildren();
}
} else {
jp.skipChildren();
}

}

Thanks!

How to parse huge JSON file as stream in Json.NET?

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
while (reader.Read())
{
// deserialize only when there's "{" character in the stream
if (reader.TokenType == JsonToken.StartObject)
{
o = serializer.Deserialize<MyObject>(reader);
}
}
}


Related Topics



Leave a reply



Submit