Is There a Performance Impact When Calling Tolist()

Is there a performance impact when calling ToList()?


IEnumerable<T>.ToList()

Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.

The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.

I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.

Handy tip, As vs To

You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.

Additional details on List<T>

Here is a little more detail on how List<T> works in case you're interested :)

A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.

This is the difference between the Capacity and Count properties on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.

Does calling ToList multiple times effect performance?

Calling ToList (or ToArray, AsEnumerable, etc) on an IQueryable will execute the query. After that any LINQ expression you use will be operating on the in-memory list.

For instance, your code:

List<Customer> Customers = GetCustomersFromDataBase();

This will result in a List of Customer objects which represent the database records. The list is now independent of the query used to fetch the results from the database, although linked records in the Customer objects will still need to be fetched from the database.

Then when you do this:

List<Customer> FilteredCustomers = (from c in Customers where c.id == 1c).ToList();

This is not operating on the database but on the results in memory that were returned from the previous statement. In this case you will not be querying the database multiple times.

Where it does impact performance is when you have an IQueryable object that you call ToList on directly multiple times. Each call to ToList on an IQueryable will result in a call to the database, with all of the attendant object tracking and such in the background. Generally this is something you want to avoid, although additional filtering of the query prior to enumerating the results may result in better performance if the total number of records selected by the query is large and the filters pull small subsets of the data.

Is it better to call ToList() or ToArray() in LINQ queries?

Unless you simply need an array to meet other constraints you should use ToList. In the majority of scenarios ToArray will allocate more memory than ToList.

Both use arrays for storage, but ToList has a more flexible constraint. It needs the array to be at least as large as the number of elements in the collection. If the array is larger, that is not a problem. However ToArray needs the array to be sized exactly to the number of elements.

To meet this constraint ToArray often does one more allocation than ToList. Once it has an array that is big enough it allocates an array which is exactly the correct size and copies the elements back into that array. The only time it can avoid this is when the grow algorithm for the array just happens to coincide with the number of elements needing to be stored (definitely in the minority).

EDIT

A couple of people have asked me about the consequence of having the extra unused memory in the List<T> value.

This is a valid concern. If the created collection is long lived, is never modified after being created and has a high chance of landing in the Gen2 heap then you may be better off taking the extra allocation of ToArray up front.

In general though I find this to be the rarer case. It's much more common to see a lot of ToArray calls which are immediately passed to other short lived uses of memory in which case ToList is demonstrably better.

The key here is to profile, profile and then profile some more.

Performance Difference between using ToList() vs. new List(IEnumerable T )

Like others already said, most of the time the best practice is to do what is more readable (which I think here means using ToList()). Both ways do exactly the same thing, so you should expect that their performance to be very similar.

If profiling shows that this is the code you need to optimize, then you should try it both ways and measure which one is faster.

And looking at the source code (in the case of .Net base libraries, you can look at Reference Source or use a decompiler) might help too. ToList looks like this:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}

So, its performance should be the same as new List, plus one null check and one method call. But both of the additions are likely to be optimized away.



Related Topics



Leave a reply



Submit