High Memory Consumption with Enumerable.Range

High memory consumption with Enumerable.Range?

This probably relates to the doubling algorithm used to resize the backing buffer when adding to a list. When you allocate as an array, the length of that is known, and can be queried by checking for IList[<T>] and/or ICollection[<T>]; thus it can allocate a single array, right-sized the first time, and then just block-copy the contents.

With the sequence this is not possible (the sequence does not expose the length in any accessible way); thus it must instead fall back to "keep filling up the buffer; if full, double it and copy".

Obviously this needs approx double the memory.

An interesting test would be:

var list = new List<int>(10000000);
list.AddRange(Enumerable.Range(1, 10000000));

This will allocate the right size initially, while still using the sequence.

tl;dr; the constructor, when passed a sequence, first checks to see if it can obtain the length by casting to a well-known interface.

c# Leaner way of initializing int array

int[] array = Enumerable.Range(0, nums).ToArray();

High Memory consumption by while running console application with large database in c#

It doesn't matter that you are using a SqlReader, your creating a List of objects from your DB query and that's most likely where your memory is going.

Best practice is to bring back the least amount of data possible.

I suspect that your batching logic, is causing memory issues due to the looping you are using to get each batch.

Add some tracing and see how many times each of your db functions are being called.

You may find that one or more functions are being called numerous times, when you expected it to be called only once. Then this will help you narrow down the trouble area.

High memory usage when retrieving values from database

There are several reasons in LiteDB to allocate much more memory than direct List<Double>.

To understand this, you need know that your typed class are converted into a BsonDocument structure (with BsonValues). This structure has an overhead (+1 or +5 bytes per BsonValue).

Also, to serialize this class (when you insert), LiteDB must create one single byte[] with all this BsonDocument (in BSON format). After, this super large byte[] are copied to many extend pages (each page contains a byte[4070]).

Not only this, also LiteDB must keep track original data to store in journal area. So, this size can be doubled.

To deserialize, LiteDB must do inverse process: read all pages from disk to memory, join all pages into a single byte[], deserialize into BsonDocument to finish map to your class.

This operations, for small objects, are ok. This memory are reused for each new document read/write so memory keeps in control.

In next v5 version this process has some optimizations, like:

  • Deserialize do not need allocated all data into a single byte[] to read document. This can be done using new ChunkStream(IEnumerable<byte[]>). Serialization still need this single byte[]
  • Journal file was changed to WAL (write ahead log) - don't need keep original data.
  • ExtendPage are not stored in cache anymore

For future versions I thinking in use new Span<T> class to re-use previous memory allocations. But I need study more about this.


But, store a single document with 185,000 values are best solution in any nosql database. MongoDB limit BSON document size in 16Mb (and early versions was ~368kb limit)... I limited LiteDB to 1Mb in v2... but I remove this check size and just add as recommendation to avoid large single documents.

Try split your class into 2 collections: one for your data and another for each value. You can also split this large array into chunks, like LiteDB FileStorage or MongoDB GridFS.

Multithreaded program NOT using 100% CPU for expensive IEnumerables

The fact that your threads are blocked on clr.dll!WKS::gc_heap::wait_for_gc_done shows that the garbage collector is the bottleneck of your application. As much as possible, you should try to limit the number of heap allocations in your program, to put less stress on the gc.

That said, there is another way to speed-up things. Per default, on desktop, the GC is configured to use limited resources on the computer (to avoid slowing down other applications). If you want to fully use the resources available, then you can activate server GC. This mode assumes that your application is the most important thing running on the computer. It will provide a significant performance boost, but use a lot more CPU and memory.



Related Topics



Leave a reply



Submit