Handling Warning for Possible Multiple Enumeration of Ienumerable

Handling warning for possible multiple enumeration of IEnumerable

The problem with taking IEnumerable as a parameter is that it tells callers "I wish to enumerate this". It doesn't tell them how many times you wish to enumerate.

I can change the objects parameter to be List and then avoid the possible multiple enumeration but then I don't get the highest object that I can handle.

The goal of taking the highest object is noble, but it leaves room for too many assumptions. Do you really want someone to pass a LINQ to SQL query to this method, only for you to enumerate it twice (getting potentially different results each time?)

The semantic missing here is that a caller, who perhaps doesn't take time to read the details of the method, may assume you only iterate once - so they pass you an expensive object. Your method signature doesn't indicate either way.

By changing the method signature to IList/ICollection, you will at least make it clearer to the caller what your expectations are, and they can avoid costly mistakes.

Otherwise, most developers looking at the method might assume you only iterate once. If taking an IEnumerable is so important, you should consider doing the .ToList() at the start of the method.

It's a shame .NET doesn't have an interface that is IEnumerable + Count + Indexer, without Add/Remove etc. methods, which is what I suspect would solve this problem.

Resharper's example code for explaining Possible multiple enumeration of IEnumerable

GetNames() returns an IEnumerable. So if you store that result:

IEnumerable foo = GetNames();

Then every time you enumerate foo, the GetNames() method is called again (not literally, I can't find a link that properly explains the details, but see IEnumerable.GetEnumerator()).

Resharper sees this, and suggests you to store the result of enumerating GetNames() in a local variable, for example by materializing it in a list:

IEnumerable fooEnumerated = GetNames().ToList();

This will make sure that the GetNames() result is only enumerated once, as long as you refer to fooEnumerated.

This does matter because you usually want to enumerate only once, for example when GetNames() performs a (slow) database call.

Because you materialized the results in a list, it doesn't matter anymore that you enumerate fooEnumerated twice; you'll be iterating over an in-memory list twice.

Why is Possible Multiple Enumeration of IEnumerable warning *not* shown

TL/DR: I agree with @canton7, it would result in too many false positives. Just don't put expensive enumerables in properties, it's a bad practice.

Long version:

Cannot tell if enumeration is expensive or not

Basically, the inspection about possible multiple enumeration tries to warn you about potential performance problems, because very often IEnumerable comes from expensive calculations like database queries. But ReSharper cannot tell for sure if enumeration is really expensive or not, because tracing all enumerables' origin would be very complex and very slow, and in some cases impossible (enumerable coming from interface or virtual method in a class library, and overrides could be in external code).

Enumerable properties are often used to encapsulate simple collections

This also applies to enumerable properties: ReSharper cannot be sure if that enumerable has expensive enumeration or not. If it would still go ahead and warn about multiple enumerations of the same enumerable property, it would result in too many false positives, because many programmers don't put expensive enumerables in properties. Most often, enumerable properties return basic collections like List or HashSet under the hood, and return type IEnumerable is chosen to encapsulate implementation details and allow developer to change implementing collection to something else later. Although now we have IReadOnlyCollection which is better for such encapsulation, still we have tons of old code with IEnumerable.

Properties are meant to be lightweight, don't put expensive calculations there

I would go further and argue that even if ReSharper could warn you about expensive multiple enumeration on properties, it would still be a bad practice for properties to return expensive enumerable. Even if you would have no single method which enumerates twice on such property, you could still have a complex method that would call different enumerating methods several times in a row. And your teammates won't even think about caching access to enumeration results in such cases, because properties are meant to be lightweight and there is no sense in caching them in almost every case.

Resharper's 'Possible Multiple enumeration of IEnumerable warning'

There is a very good chance that going to DB twice in this case would be better than client side search via IEnumerable which is performed in current code.

If you can't push search to DB (i.e. by keeping IQueryable<Item> to allow chaining) you still can somewhat optimize the lookup by checking for both conditions on each item:

  foreach(var x in millionItems)
{
item1 = item1 == null && x=> x.Condition == "Excellent" ? x : item1;
item2 = item2 == null && x=> x.Condition == "Good" ? x : item2;

if (item1 != null && item2 != null)
{
break;
}
}

This have good chance to go through a lot of items client side so, but at least it will not keep them in memory at the same time.

Converting to list with ToList is unlikely to be better if these are just 2 queries you need to build.

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration?

You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.

Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):

  • If you've already read the next element within another iterator, you can yield it from your buffer
  • If you've already discovered that there is no next element, you can return that immediately
  • Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.

The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.

As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:

public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;

public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}

IEnumerator GetEnumerator() => GetEnumerator();

public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}

Possible multiple enumeration of IEnumerable vs Parameter can be declared with base type

Generally speaking, what you need is some state object into which you can PUSH the items (within a foreach loop), and out of which you then get your final result.

The downside of the enumerable LINQ operators is that they actively enumerate the source instead of accepting items being pushed to them, so they don't meet your requirements.

If you e.g. just need the minimum and maximum values of a sequence of 1'000'000 integers which cost $1'000 worth of processor time to retrieve, you end up writing something like this:

public class MinMaxAggregator
{
private bool _any;
private int _min;
private int _max;

public void OnNext(int value)
{
if (!_any)
{
_min = _max = value;
_any = true;
}
else
{
if (value < _min) _min = value;
if (value > _max) _max = value;
}
}

public MinMax GetResult()
{
if (!_any) throw new InvalidOperationException("Sequence contains no elements.");
return new MinMax(_min, _max);
}
}

public static MinMax DoSomething(IEnumerable<int> source)
{
var aggr = new MinMaxAggregator();
foreach (var item in source) aggr.OnNext(item);
return aggr.GetResult();
}

In fact, you just re-implemented the logic of the Min() and Max() operators. Of course that's easy, but they are only examples for arbitrary complex logic you might otherwise easily express in a LINQish way.

The solution came to me on yesterday's night walk: we need to PUSH... that's REACTIVE! All the beloved operators also exist in a reactive version built for the push paradigm. They can be chained together at will to whatever complexity you need, just as their enumerable counterparts.

So the min/max example boils down to:

public static MinMax DoSomething(IEnumerable<int> source)
{
// bridge over to the observable world
var connectable = source.ToObservable(Scheduler.Immediate).Publish();
// express the desired result there (note: connectable is observed by multiple observers)
var combined = connectable.Min().CombineLatest(connectable.Max(), (min, max) => new MinMax(min, max));
// subscribe
var resultAsync = combined.GetAwaiter();
// unload the enumerable into connectable
connectable.Connect();
// pick up the result
return resultAsync.GetResult();
}


Related Topics



Leave a reply



Submit