Serialize as Ndjson Using JSON.Net

Serialize as NDJSON using Json.NET

As Json.NET does not currently have a built-in method to serialize a collection to NDJSON, the simplest answer would be to write to a single TextWriter using a separate JsonTextWriter for each line, setting CloseOutput = false for each:

public static partial class JsonExtensions
{
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}

public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();

foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// https://web.archive.org/web/20180513150745/http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
}

Sample fiddle.

Since the individual NDJSON lines are likely to be short but the number of lines might be large, this answer suggests a streaming solution to avoid the necessity of allocating a single string larger than 85kb. As explained in Newtonsoft Json.NET Performance Tips, such large strings end up on the large object heap and may subsequently degrade application performance.

Line delimited json serializing and de-serializing

You can do so by manually parsing your JSON using JsonTextReader and setting the SupportMultipleContent flag to true.

If we look at your first example, and create a POCO called Foo:

public class Foo
{
[JsonProperty("some")]
public string Some { get; set; }
}

This is how we parse it:

var json = "{\"some\":\"thing1\"}\r\n{\"some\":\"thing2\"}\r\n{\"some\":\"thing3\"}";
var jsonReader = new JsonTextReader(new StringReader(json))
{
SupportMultipleContent = true // This is important!
};

var jsonSerializer = new JsonSerializer();
while (jsonReader.Read())
{
Foo foo = jsonSerializer.Deserialize<Foo>(jsonReader);
}

If you want list of items as result simply add each item to a list inside the while loop to your list.

listOfFoo.Add(jsonSerializer.Deserialize<Foo>(jsonReader));

Note: with Json.Net 10.0.4 and later same code also supports comma separated JSON entries see How to deserialize dodgy JSON (with improperly quoted strings, and missing brackets)?)

JSON Deserialize using .Net reads only one object on my json file

You are not serializing a list of 3 DeskObjects - Your JSON suggests that you are serializing a list of 1 DeskObject three times.

I used something close to your code in this code. Without seeing your full serialization/object creation code, I can't tell what's going on exactly. But it looks like you are serializing one DeskObject in your comment above and not a list of 3 DeskObjects.

In my example, I instantiate 3 DeskObjects inline in a list like so:

List<DeskObject> list = new List<DeskObject>() { new DeskObject(){...}, new DeskObject(){...}, new DeskObject(){...} } ;

You could also create your DeskObjects and get them into the list provided they were already created previously:

list.Add(myDeskObject1);
list.Add(myDeskObject2);
list.Add(myDeskObject3);

Then, after the list is completely full:

var serializer = new JsonSerializer();
using (StreamWriter file = new StreamWriter(@"quotes.json", false))
{ serializer.Serialize(file, list); }

This results in the following JSON:

[
{
"CurrentDate": "09 Feb 2018",
"CustomerName": "Jonathan Smith",
"Width": 43,
"Depth": 43,
"numOfDrawers": 1,
"surfMaterial": "Oak",
"RushOrderDays": 3,
"TotalQuote": 1339
},
{
"CurrentDate": "09 Feb 2018",
"CustomerName": "Tim Taylor",
"Width": 24,
"Depth": 44,
"numOfDrawers": 3,
"surfMaterial": "Laminate",
"RushOrderDays": 5,
"TotalQuote": 556
},
{
"CurrentDate": "09 Feb 2018",
"CustomerName": "Cindy Crawford",
"Width": 24,
"Depth": 24,
"numOfDrawers": 5,
"surfMaterial": "Pine",
"RushOrderDays": 5,
"TotalQuote": 570
}
]

Which correctly parses into a list of 3 DeskObjects. The only way I could manage to get JSON like yours into the file was to use this code.

tl;dr make sure you fill your list before serializing it.

Edit:

Reviewing your code, this is your issue.

 var deskObjects = new List<Desk.DeskObject>();
deskObjects.Add(deskObject);
var result = JsonConvert.SerializeObject(deskObject, Formatting.Indented);
File.AppendAllText(@"quotes.json", result);

You are creating a new List every time you have a new quote, and serializing it, then appending that to file. This means you are appending '[{item}]' to a file that contains [{item1}][{item2}], etc.

Instead you need to read the List of DeskObjects out of the JSON first, so that you actually have a C# collection. Then add the current DeskObject to it. Finally, rewrite the entire file again (do not append) by serializing the new list.

I am trying to direct you there without writing the code for you - it's important that you understand the concept that when you call AppendAllText, you are not inserting something into a JSON tree. You are just appending raw text to the end of a file containing raw text.

C# - OutOfMemoryException saving a List on a JSON file

Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. What's worse, you are doing this in two different places:

  1. You serialize your entire list of samples to a JSON string json before writing the string to a file.

    Instead, as explained in Performance Tips: Optimize Memory Usage, you should serialize and deserialize directly to and from your file in such situations. For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? and also Serialize JSON to a file.

  2. The recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made.

    A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that.

So, how to attack issue #2?

First, let's modify your data model as follows, partitioning the header data into a separate class:

public class PressureMap
{
public double[,] PressureMatrix { get; set; }
}

public class CalibrationConfiguration
{
// Data model not included in question
}

public class RepresentationConfiguration
{
// Data model not included in question
}

public class RecordedDataHeader
{
public string SoftwareVersion { get; set; }
public CalibrationConfiguration CalibrationConfiguration { get; set; }
public RepresentationConfiguration RepresentationConfiguration { get; set; }
}

public class RecordedData
{
// Ensure the header is serialized first.
[JsonProperty(Order = 1)]
public RecordedDataHeader RecordedDataHeader { get; set; }
// Ensure the pressure data is serialized last.
[JsonProperty(Order = 2)]
public IEnumerable<PressureMap> PressureData { get; set; }
}

Option #1 is a version of the producer-comsumer pattern. It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData. The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable()
as the value of RecordedData.PressureData.

The following code gives a skeleton for how to do this:

var sampleCount = 400;    // Or whatever stopping criterion you prefer
var sampleInterval = 10; // in ms

using (var pressureData = new BlockingCollection<PressureMap>())
{
// Adapted from
// https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
// https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2

// Spin up a Task to sample the pressure maps
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < sampleCount; i++)
{
var data = GetPressureMap(i);
Console.WriteLine("Generated sample {0}", i);
pressureData.Add(data);
System.Threading.Thread.Sleep(sampleInterval);
}
pressureData.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};

var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};

using (var stream = new FileStream(this.filePath, FileMode.Create))
using (var textWriter = new StreamWriter(stream))
using (var jsonWriter = new JsonTextWriter(textWriter))
{
int j = 0;

var query = pressureData
.GetConsumingEnumerable()
.Select(p =>
{
// Flush the writer periodically in case the process terminates abnormally
jsonWriter.Flush();
Console.WriteLine("Serializing item {0}", j++);
return p;
});

var recordedData = new RecordedData
{
RecordedDataHeader = recordedDataHeader,
// Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
PressureData = query,
};

Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
}
}))
{
Task.WaitAll(t1, t2);
}
}
}

Notes:

  • This solution uses the fact that, when serializing an IEnumerable<T>, Json.NET will not materialize the enumerable as a list. Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered.

  • The first thread samples PressureData and adds them to the blocking collection.

  • The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData.

    During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available.

  • You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. If not, you may need to adopt a different strategy.

  • PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample.

  • In this technique the JSON file remains open for the duration of the sampling session. If sampling terminates abnormally the file may be truncated. I make some attempt to ameliorate the problem by flushing the writer periodically.

  • While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap>. This may possibly cause memory issues during downstream processing.

Demo fiddle #1 here.

Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. Such a file consists of sequences of JSON objects separated by newline characters. In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap:

var sampleCount = 100; // Or whatever
var sampleInterval = 10;

var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};

var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};

// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);

using (var stream = new FileStream(this.filePath, FileMode.Create))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}

// Write each sample incrementally

for (int i = 0; i < sampleCount; i++)
{
Thread.Sleep(sampleInterval);
Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
var map = GetPressureMap(i);

using (var stream = new FileStream(this.filePath, FileMode.Append))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
}
}

Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);

Using the extension methods:

public static partial class JsonExtensions
{
// Adapted from the answer to
// https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
// by dbc https://stackoverflow.com/users/3744182/dbc
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}

public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();

foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}

// Adapted from the answer to
// https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
// by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
where THeader : TBase
where TRow : TBase
{
bool first = true;

using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
{
var serializer = JsonSerializer.CreateDefault();

while (jsonReader.Read())
{
if (jsonReader.TokenType == JsonToken.Comment)
continue;
if (first)
{
yield return serializer.Deserialize<THeader>(jsonReader);
first = false;
}
else
{
yield return serializer.Deserialize<TRow>(jsonReader);
}
}
}
}
}

Later, you can process the newline delimited JSON file as follows:

using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
{
if (obj is RecordedDataHeader)
{
var header = (RecordedDataHeader)obj;
// Process the header
Console.WriteLine(JsonConvert.SerializeObject(header));
}
else
{
var row = (PressureMap)obj;
// Process the row.
Console.WriteLine(JsonConvert.SerializeObject(row));
}
}
}

Notes:

  • This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container.

  • With this approach both serialization and downstream processing can be done with bounded memory use.

  • The sample file does not remain open for the duration of sampling, so is less likely to be truncated.

  • Downstream applications may not have built-in tools for processing newline delimited JSON.

  • This strategy may integrate more simply with your current threading code.

Demo fiddle #2 here.



Related Topics



Leave a reply



Submit