How to Initialize a List<T> to a Given Size (As Opposed to Capacity)

How to initialize a ListT to a given size (as opposed to capacity)?

I can't say I need this very often - could you give more details as to why you want this? I'd probably put it as a static method in a helper class:

public static class Lists
{
public static List<T> RepeatedDefault<T>(int count)
{
return Repeated(default(T), count);
}

public static List<T> Repeated<T>(T value, int count)
{
List<T> ret = new List<T>(count);
ret.AddRange(Enumerable.Repeat(value, count));
return ret;
}
}

You could use Enumerable.Repeat(default(T), count).ToList() but that would be inefficient due to buffer resizing.

Note that if T is a reference type, it will store count copies of the reference passed for the value parameter - so they will all refer to the same object. That may or may not be what you want, depending on your use case.

EDIT: As noted in comments, you could make Repeated use a loop to populate the list if you wanted to. That would be slightly faster too. Personally I find the code using Repeat more descriptive, and suspect that in the real world the performance difference would be irrelevant, but your mileage may vary.

Set List Initial Size

This can be easily done with an array:

string[] sa = new string[99];
sa[71] = "g";

Which also happens to implement the IList interface.

Is it worthwhile to initialize the collection size of a ListT if it's size reasonably known?

Yes, it gets to be important when your List<T> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.

This repeats as necessary, several copies of the internal array will become garbage.

Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.

Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.

This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.

This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.

Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.

List Better to init with a maximum capacity and only use a fraction of that, or init with no capacity

If it can truly vary that widely, then you would want to not set the capacity. For most collections, the capacity doubles as it is met (with a default capacity of 16 I believe), so your capacity will very closely approach your maximum as you fill it up.

Initialize List with some count of elements

Using Enumerable.Repeat

var myList = Enumerable.Repeat(false, 100).ToList();

which

Generates a sequence that contains one repeated value.

ListT memory allocation with initial capacity

The compiler doesn't allocate anything. Allocation happens at run-time. The way List<T> works is that it will double the size of its internal T[] as needed. If you specify an initial size it will allocate that and double from there as needed.

Keep in mind since T is class in your example, the list only holds references. I.e. the list will only be allocated on LOH when you have more than 85000 bytes worth of references (plus the overhead of the list itself).

Also, since the list doesn't hold instances of foo the excess space is just for the references.

C# - What is ListT initial capacity used for?

A list has an array under the hood. This decreases the amount of memory reallocation as you .add up to your 50 elements. This takes time and memory and gives the garbage collector stuff to do.

That's why List(T).Capacity is a thing.

Here are some benchmarks for 100 .Adds:

 Method A: Dictionary, no capacity
Time: 1350 ms

Method B: Dictionary, has capacity
Time: 700 ms

Method C: Dictionary, const capacity
Time: 760 ms

Method D: Dictionary, over-large capacity
Time: 1005 ms

Method E: List, no capacity
Time: 1010 ms

Method F: List, accurate capacity
Time: 575 ms

So when you're adding 100 elements, pre allocating seems to to half the time it takes. While I'm not a fan of premature optimisation, giving the CLR a hint if you know how big your list will be really sounds worth it.

ToList with Capacity?

In cases when the capacity is part of IEnumerable<T> that is also an ICollection<T>, the library will allocate at the correct capacity.

Here is a reference implementation of List<T>(IEnumerable<T> source), which is invoked when you call ToList():

public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();

ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0) {
_items = _emptyArray;
} else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
} else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.

using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}

Note how the constructor behaves when collection implements ICollection<T>: rather than iterating the content and calling Add for each item, it allocates the internal _items array, and copies the content into it without reallocations.

In situations when the capacity is not embedded in class implementing IEnumerable<T>, you can easily define one yourself, using a combination of standard methods:

public static class ToListExtension {

public static List<T> ToList<T>(this IEnumerable<T> source, int capacity)
{
var res = new List<T>(capacity);
res.AddRange(source);
return res;
}

}


Related Topics



Leave a reply



Submit