Is It Worthwhile to Initialize the Collection Size of a List<T> If It's Size Reasonably Known

Is it worthwhile to initialize the collection size of a List T if it's size reasonably known?

Yes, it gets to be important when your List<T> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.

This repeats as necessary, several copies of the internal array will become garbage.

Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.

Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.

This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.

This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.

Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.

Is there any advantage of defining the size of Arraylist while instantiating

ArrayLists use arrays internally, so when an ArrayList needs additional capacity, it has to create a new array internally and copy the elements over to the new array.

You can overcome the overhead of resizing the ArrayList by estimating or finding the exact size of the ArrayList beforehand. Alternatively, you can make sure the ArrayList never grows beyond a specified size by handling your business logic and then removing elements whenever your ArrayList is at its maximum desired size. Finally, you can use a different data structure that does not use arrays internally to avoid your growth problem altogether.

Is it better to set size of a Java Collection in constructor?

Different collections have different performance consequences for this, for ArrayList the saving can be very noticeable.

import java.util.*;
public class Main{
public static void main(String[] args){
List<Integer> numbers = new ArrayList<Integer>(5);
int max = 1000000;
// Warmup
for (int i=0;i<max;i++) {
numbers.add(i);
}

long start = System.currentTimeMillis();
numbers = new ArrayList<Integer>(max);
for (int i=0;i<max;i++) {
numbers.add(i);
}
System.out.println("Preall: "+(System.currentTimeMillis()-start));

start = System.currentTimeMillis();
numbers = new ArrayList<Integer>(5);
for (int i=0;i<max;i++) {
numbers.add(i);
}
System.out.println("Resizing: "+(System.currentTimeMillis()-start));

}
}

Result:

Preall: 26
Resizing: 58

Running with max set to 10 times the value at 10000000 gives:

Preall: 510
Resizing: 935

So you can see even at different sizes the ratio stays around the same.

This is pretty much a worst-case test but filling an array one element at a time is very common and you can see that there was a roughly 2*speed difference.

In C# what's the resulting capacity of `new List int () {3,5};`?

This code new List<int>() {3,5}; is identical to

var list = new List<int>(); 
list.Add(3);
list.Add(5};

So the answer is 4.

C# compiler is smart but not that smart yet.

List.Capacity does not working

Capacity

Gets or sets the total number of elements the internal data structure can hold without resizing.

The capacity is revaluated in the form on 0 or 2^n and 5 is never an option for that.

var lst = new List<string>(1);            

lst.Add("T1");
lst.Add("T2");
lst.Add("T3");
lst.Add("T4");
lst.Add("T5");

and this is what immediate window said

lst.Count
0
lst.Capacity
1
lst.Count
1
lst.Capacity
1
lst.Count
2
lst.Capacity
2
lst.Count
3
lst.Capacity
4
lst.Count
4
lst.Capacity
4
lst.Count
5
lst.Capacity
8

What are the internal differences of a T[] and a List T in terms of memory?

Since an array is a static structure, after the initialization, it allocates the memory that you've demanded.

int arr[5];

For example here there are 5 int objects created in memory. But when you use lists, according to its implementation, it gives you first an array with predefined capacity. And while you are adding your elements, if you exceed the capacity then it scales up. In some implementations it just doubles its size, or in some implementations it enlarges itself when the granted capacity is half full.

List which should contain 12 elements suddenly contains 16 elements, why?

The list reserves memory in chunks everytime it needs to grow in capacity. Hence capacity reports 16, but count only reports 12. Null items contribute towards the count.

The list class provides the TrimExcess method to remove unutilised space.

Also, specifying the capacity upfront in the constructor results in only one memory grab ( assuming you don't exceed that capacity).

Your screenshot shows a count of 12 with a capacity of 16. If memory serves, the list attempts to double its size (or at the very least definitely defaults to 4, then goes to 8, then 16). As you have 12 items, you triggered the jump from 8 to 16 capacity.

C# Large object in medium size collection

The size of an array of objects is the number of objects times the pointer size. This is because only value types is stored in the array itself, reference types (objects) will be stored somewhere else and will not count towards the size of the array. So 85000/4=21250 objects, and 85000/8=10625 objects can be stored in an array on the SOH in 32bit and 64bit mode, respectively.

Edit:
Thanks to Hans Passant for pointing out that this assumes that the collection type used is an array and not a list. Lists resize themselves to be bigger than the content to avoid too many allocations. See this link for details



Related Topics



Leave a reply



Submit