Difference Between Ienumerable Count() and Length

Difference between IEnumerable Count() and Length

By calling Count on IEnumerable<T> I'm assuming you're referring to the extension method Count on System.Linq.Enumerable. Length is not a method on IEnumerable<T> but rather a property on array types in .Net such as int[].

The difference is performance. TheLength property is guaranteed to be a O(1) operation. The complexity of the Count extension method differs based on runtime type of the object. It will attempt to cast to several types which support O(1) length lookup like ICollection<T> via a Count property. If none are available then it will enumerate all items and count them which has a complexity of O(N).

For example

int[] list = CreateSomeList();
Console.WriteLine(list.Length); // O(1)
IEnumerable<int> e1 = list;
Console.WriteLine(e1.Count()); // O(1)
IEnumerable<int> e2 = list.Where(x => x <> 42);
Console.WriteLine(e2.Count()); // O(N)

The value e2 is implemented as a C# iterator which does not support O(1) counting and hence the method Count must enumerate the entire collection to determine how long it is.

How should I get the length of an IEnumerable?

If you need to read the number of items in an IEnumerable<T> you have to call the extension method Count, which in general (look at Matthew comment) would internally iterate through the elements of the sequence and it will return you the number of items in the sequence. There isn't any other more immediate way.

If you know that your sequence is an array, you could cast it and read the number of items using the Length property.

No, in later versions there isn't any such method.

For implementation details of Count method, please have a look at here.

.NET array - difference between Length, Count() and Rank

Length is the property of an array object and using it is the most effective way to determine the count of elements in the array (Array.Length in MSDN documentation).

Count() is a LINQ extension method that does effectively the same. It applies to arrays because arrays are enumerable objects. It's preferred to use Length, because Count() is likely to be more expensive (see this question for further discussion and MSDN documentation on Count for reference).

Rank is the property that returns the number of dimensions (a different thing entirely). When you declare an array int[,] myArray = new int[5,10];, the Rank of it will be 2, but it will hold a total of 50 elements (MSDN on Rank property).

What's the difference between String.Count and String.Length?

On the surface they would seem functionally identical, but the main difference is:

  • Length is a property that is defined of strings and is the usual way to find the length of a string

  • .Count() is implemented as an extension method. That is, what string.Count() really does is call Enumerable.Count(this IEnumerable<char>), a System.Linq extension method, given that string is really a sequence of chars.

Performance concerns of LINQ enumerable methods notwithstanding, use Length instead, as it's built right into strings.

IEnumerable.Count() or ToList().Count

You asked:

I wonder, what would be faster.

Whenever you ask that you should actually time it and find out.

I set out to test all of these variants of obtaining a count:

var enumerable = Enumerable.Range(0, 1000000);
var list = enumerable.ToList();

var methods = new Func<int>[]
{
() => list.Count,
() => enumerable.Count(),
() => list.Count(),
() => enumerable.ToList().Count(),
() => list.ToList().Count(),
() => enumerable.Select(x => x).Count(),
() => list.Select(x => x).Count(),
() => enumerable.Select(x => x).ToList().Count(),
() => list.Select(x => x).ToList().Count(),
() => enumerable.Where(x => x % 2 == 0).Count(),
() => list.Where(x => x % 2 == 0).Count(),
() => enumerable.Where(x => x % 2 == 0).ToList().Count(),
() => list.Where(x => x % 2 == 0).ToList().Count(),
};

My testing code explicitly runs each method 1,000 times, measures each execution time with a Stopwatch, and ignores all results where garbage collection occurred. It then gets an average execution time per method.

var measurements =
methods
.Select((m, i) => i)
.ToDictionary(i => i, i => new List<double>());

for (var run = 0; run < 1000; run++)
{
for (var i = 0; i < methods.Length; i++)
{
var sw = Stopwatch.StartNew();
var gccc0 = GC.CollectionCount(0);
var r = methods[i]();
var gccc1 = GC.CollectionCount(0);
sw.Stop();
if (gccc1 == gccc0)
{
measurements[i].Add(sw.Elapsed.TotalMilliseconds);
}
}
}

var results =
measurements
.Select(x => new
{
index = x.Key,
count = x.Value.Count(),
average = x.Value.Average().ToString("0.000")
});

Here are the results (ordered from slowest to fastest):

+---------+-----------------------------------------------------------+
| average | method |
+---------+-----------------------------------------------------------+
| 14.879 | () => enumerable.Select(x => x).ToList().Count(), |
| 14.188 | () => list.Select(x => x).ToList().Count(), |
| 10.849 | () => enumerable.Where(x => x % 2 == 0).ToList().Count(), |
| 10.080 | () => enumerable.ToList().Count(), |
| 9.562 | () => enumerable.Select(x => x).Count(), |
| 8.799 | () => list.Where(x => x % 2 == 0).ToList().Count(), |
| 8.350 | () => enumerable.Where(x => x % 2 == 0).Count(), |
| 8.046 | () => list.Select(x => x).Count(), |
| 5.910 | () => list.Where(x => x % 2 == 0).Count(), |
| 4.085 | () => enumerable.Count(), |
| 1.133 | () => list.ToList().Count(), |
| 0.000 | () => list.Count, |
| 0.000 | () => list.Count(), |
+---------+-----------------------------------------------------------+

Two things come out that are significant here.

One, any method with a .ToList() inline is significantly slower than the equivalent without it.

Two, LINQ operators take advantage of the underlying type of the enumerable, where possible, to short-cut computations. The enumerable.Count() and list.Count() methods show this.

There is no difference between the list.Count and list.Count() calls. So the key comparison is between the enumerable.Where(x => x % 2 == 0).Count() and enumerable.Where(x => x % 2 == 0).ToList().Count() calls. Since the latter contains an extra operation we would expect it to take longer. It's almost 2.5 milliseconds longer.

I don't know why you say that you're going to call the counting code twice, but if you do it is better to build the list. If not just do the plain .Count() call after your query.

Differences between Array.Length and Array.Count()

array.Count() is actually a call to the Enumerable.Count<T>(IEnumerable<T>) extension method.

Since this method takes an IEnumerable<T> (as opposed to ICollection<T>, which has a Count property), it needs to loop through the entire sequence to figure out how big it is.

However, it actually checks whether the parameter implements ICollection<T> (which arrays do), and, if so, returns Count directly.

Therefore, calling .Count() on an array isn't much slower than .Length, although it will involve an extra typecast.

When should I use .Count() and .Count in the context of an IEnumerableT

The extension method works on any IEnumerable<T> but it is costly because it counts the sequence by iterating it. There is an optimization if the sequence is ICollection<T> meaning that the length of the collection is known. Then the Count property is used but that is an implementation detail.

The best advice is to use the Count property if available for performance reasons.

Is .Count() predominately better saved for queryable collections that are yet to be executed, and therefore don't have an enumeration yet?

If your collection is IQueryable<T> and not IEnumerable<T> then the query provider may be able to return the count in some efficient maner. In that case you will not suffer a performance penalty but it depends on the query provider.

An IQueryable<T> will not have a Count property so there is no choice between using the extension method and the property. However, if you query provider does not provide an efficient way of computing Count() you might consider using .ToList() to pull the collection to the client side. It really depends on how you intend to use it.

count vs length vs size in a collection

Length() tends to refer to contiguous elements - a string has a length for example.

Count() tends to refer to the number of elements in a looser collection.

Size() tends to refer to the size of the collection, often this can be different from the length in cases like vectors (or strings), there may be 10 characters in a string, but storage is reserved for 20. It also may refer to number of elements - check source/documentation.

Capacity() - used to specifically refer to allocated space in collection and not number of valid elements in it. If type has both "capacity" and "size" defined then "size" usually refers to number of actual elements.

I think the main point is down to human language and idioms, the size of a string doesn't seem very obvious, whilst the length of a set is equally confusing even though they might be used to refer to the same thing (number of elements) in a collection of data.

Count property vs Count() method?

Decompiling the source for the Count() extension method reveals that it tests whether the object is an ICollection (generic or otherwise) and if so simply returns the underlying Count property:

So, if your code accesses Count instead of calling Count(), you can bypass the type checking - a theoretical performance benefit but I doubt it would be a noticeable one!

// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
checked
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
ICollection collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num++;
}
}
return num;
}
}


Related Topics



Leave a reply



Submit