Large Object Heap Fragmentation

Large Object Heap Fragmentation

The CLR uses the LOH to preallocate a few objects (such as the array used for interned strings). Some of these are less than 85000 bytes and thus would not normally be allocated on the LOH.

It is an implementation detail, but I assume the reason for this is to avoid unnecessary garbage collection of instances that are supposed to survive as long as the process it self.

Also due to a somewhat esoteric optimization, any double[] of 1000 or more elements is also allocated on the LOH.

Large Object Heap fragmentation: CLR has any solution to it?

Unfortunately, all the info I've ever seen only suggests managing risk factors yourself: reuse large objects, allocate them at the beginning, make sure they're of sizes that are multiples of each other, use alternative data structures (lists, trees) instead of arrays. That just gave me an another idea of creating a non-fragmenting List that instead of one large array, splits into smaller ones. Arrays / Lists seem to be the most frequent culprits IME.

Here's an MSDN magazine article about it:
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx, but there isn't that much useful in it.

Large Arrays, and LOH Fragmentation. What is the accepted convention?

Firstly, the garbage collector does collect the LOH, so do not be immediately scared by its prescence. The LOH gets collected when generation 2 gets collected.

The difference is that the LOH does not get compacted, which means that if you have an object in there that has a long lifetime then you will effectively be splitting the LOH into two sections — the area before and the area after this object. If this behaviour continues to happen then you could end up with the situation where the space between long-lived objects is not sufficiently large for subsequent assignments and .NET has to allocate more and more memory in order to place your large objects, i.e. the LOH gets fragmented.

Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).

Starting from .NET 4.5.1, LOH could be compacted, see GCSettings.LargeObjectHeapCompactionMode property.

Strategies to avoid LOH fragmentation are:

  • Avoid creating large objects that hang around. Basically this just means large arrays, or objects which wrap large arrays (such as the MemoryStream which wraps a byte array), as nothing else is that big (components of complex objects are stored separately on the heap so are rarely very big). Also watch out for large dictionaries and lists as these use an array internally.
  • Watch out for double arrays — the threshold for these going into the LOH is much, much smaller — I can't remember the exact figure but its only a few thousand.
  • If you need a MemoryStream, considering making a chunked version that backs onto a number of smaller arrays rather than one huge array. You could also make custom version of the IList and IDictionary which using chunking to avoid stuff ending up in the LOH in the first place.
  • Avoid very long Remoting calls, as Remoting makes heavy use of MemoryStreams which can fragment the LOH during the length of the call.
  • Watch out for string interning — for some reason these are stored as pages on the LOH and can cause serious fragmentation if your application continues to encounter new strings to intern, i.e. avoid using string.Intern unless the set of strings is known to be finite and the full set is encountered early on in the application's life. (See my earlier question.)
  • Use Son of Strike to see what exactly is using the LOH memory. Again see this question for details on how to do this.
  • Consider pooling large arrays.

Edit: the LOH threshold for double arrays appears to be 8k.

Is Large Object Heap fragmentation on 64-bit platforms an issue at all compared to 32-bits, on .NET?

There is no bad design on part of the garbage collector in the CLR. The issue is that in order to defragment the LOH you need to make space and then reorder and compact the objects. With large objects, a reorder could be moving several large objects for very little gain in memory (e.g. if you had say 100 objects which were each 0.5MB in size, you would potentially have to copy and reorder 200MB of memory in order to compact this memory space. There is a good explanation of this phenomenon at this link

The 64bit CLR has the same size threshold for the LOH (as this was chosen based on real world applications) and it won't be any more of a problem than it was in the 32bit CLR. What will help is if you move to .Net 4.0+ which has improvements that have been made to the LOH algorithms to prevent out of memory and improve the reuse of holes in the stack. .Net 4.5 even has compaction of the LOH LOH compaction MSDN which would negate most issues with custom applications that deal with large arrays.

You will have an advantage using 64bit simply because of the size of the address space. However none of this discussion negates a quality design of your software. The Garbage Collector is one aspect of application that may be responsible for it running slowly. You should have a look at your algorithms and data structures to ensure that you are getting the efficiency gains you want. If you are approaching the limits of your app and are seeing issues with fragmentation, perhaps you should investigate using collections other than arrays and/or limit the size of your arrays so that they aren't allocated on the LOH.

Measure then optimise :)

Processing big strings, Is this Large Object Heap Fragmentation?

  1. Yes. That sounds correct. The LOH is getting fragmented, which leads to the runtime being unable to allocate enough contiguous space for the large strings.

  2. You have a few options, I suppose doing which ever is easiest and effective is the one you should choose. That all depends entirely on how its written.

    1. Break your strings into small enough chunks that they are not in the LOH. (less than 85K - Note: the logic for when an object is put on the LOH isn't that cut-and-dry.) This will allow the GC to be able to reclaim the space. This is by no means guaranteed to fix fragmentation - it can definitely still happen otherwise. If you make the strings smaller, but still end up on the LOH - you'll be putting off the problem. It depends on how much more than 1 million strings you need to handle. The other downside is - you still have to load the string in memory to split it, so it ends up on the LOH anyway. You'd have the shrink the strings before your application even loads them. Kind of a Catch-22. EDIT: Gabe in the comments makes a point that if you can load your string into a StringBuilder first, under the covers it makes good effort to keep things out of the LOH (until you call ToString on it).

    2. Break the processing of the string out into a separate process. Use a process instead of a thread. Use each process to process say, 10K strings, then kill the process and start another. This way, each process starts with a clean slate. The advantage of this is it doesn't change your string processing logic (incase you can't make your strings smaller for processing), and avoids the catch-22 in #1. The downside is this requires probably a bigger change to your application, and coordinating the work between the master process and the slave processing process. The trick is the master can only tell it where the large string is, it can't give it to it directly, otherwise you are back to the catch-22.



Related Topics



Leave a reply



Submit