Handling warning for possible multiple enumeration of IEnumerable
The problem with taking IEnumerable
as a parameter is that it tells callers "I wish to enumerate this". It doesn't tell them how many times you wish to enumerate.
I can change the objects parameter to be List and then avoid the possible multiple enumeration but then I don't get the highest object that I can handle.
The goal of taking the highest object is noble, but it leaves room for too many assumptions. Do you really want someone to pass a LINQ to SQL query to this method, only for you to enumerate it twice (getting potentially different results each time?)
The semantic missing here is that a caller, who perhaps doesn't take time to read the details of the method, may assume you only iterate once - so they pass you an expensive object. Your method signature doesn't indicate either way.
By changing the method signature to IList
/ICollection
, you will at least make it clearer to the caller what your expectations are, and they can avoid costly mistakes.
Otherwise, most developers looking at the method might assume you only iterate once. If taking an IEnumerable
is so important, you should consider doing the .ToList()
at the start of the method.
It's a shame .NET doesn't have an interface that is IEnumerable + Count + Indexer, without Add/Remove etc. methods, which is what I suspect would solve this problem.
Resharper's example code for explaining Possible multiple enumeration of IEnumerable
GetNames()
returns an IEnumerable
. So if you store that result:
IEnumerable foo = GetNames();
Then every time you enumerate foo
, the GetNames()
method is called again (not literally, I can't find a link that properly explains the details, but see IEnumerable.GetEnumerator()
).
Resharper sees this, and suggests you to store the result of enumerating GetNames()
in a local variable, for example by materializing it in a list:
IEnumerable fooEnumerated = GetNames().ToList();
This will make sure that the GetNames()
result is only enumerated once, as long as you refer to fooEnumerated
.
This does matter because you usually want to enumerate only once, for example when GetNames()
performs a (slow) database call.
Because you materialized the results in a list, it doesn't matter anymore that you enumerate fooEnumerated
twice; you'll be iterating over an in-memory list twice.
Why is Possible Multiple Enumeration of IEnumerable warning *not* shown
TL/DR: I agree with @canton7, it would result in too many false positives. Just don't put expensive enumerables in properties, it's a bad practice.
Long version:
Cannot tell if enumeration is expensive or not
Basically, the inspection about possible multiple enumeration tries to warn you about potential performance problems, because very often IEnumerable
comes from expensive calculations like database queries. But ReSharper cannot tell for sure if enumeration is really expensive or not, because tracing all enumerables' origin would be very complex and very slow, and in some cases impossible (enumerable coming from interface or virtual method in a class library, and overrides could be in external code).
Enumerable properties are often used to encapsulate simple collections
This also applies to enumerable properties: ReSharper cannot be sure if that enumerable has expensive enumeration or not. If it would still go ahead and warn about multiple enumerations of the same enumerable property, it would result in too many false positives, because many programmers don't put expensive enumerables in properties. Most often, enumerable properties return basic collections like List or HashSet under the hood, and return type IEnumerable
is chosen to encapsulate implementation details and allow developer to change implementing collection to something else later. Although now we have IReadOnlyCollection
which is better for such encapsulation, still we have tons of old code with IEnumerable
.
Properties are meant to be lightweight, don't put expensive calculations there
I would go further and argue that even if ReSharper could warn you about expensive multiple enumeration on properties, it would still be a bad practice for properties to return expensive enumerable. Even if you would have no single method which enumerates twice on such property, you could still have a complex method that would call different enumerating methods several times in a row. And your teammates won't even think about caching access to enumeration results in such cases, because properties are meant to be lightweight and there is no sense in caching them in almost every case.
Resharper's 'Possible Multiple enumeration of IEnumerable warning'
There is a very good chance that going to DB twice in this case would be better than client side search via IEnumerable
which is performed in current code.
If you can't push search to DB (i.e. by keeping IQueryable<Item>
to allow chaining) you still can somewhat optimize the lookup by checking for both conditions on each item:
foreach(var x in millionItems)
{
item1 = item1 == null && x=> x.Condition == "Excellent" ? x : item1;
item2 = item2 == null && x=> x.Condition == "Good" ? x : item2;
if (item1 != null && item2 != null)
{
break;
}
}
This have good chance to go through a lot of items client side so, but at least it will not keep them in memory at the same time.
Converting to list with ToList
is unlikely to be better if these are just 2 queries you need to build.
Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration?
You certainly could write your own IEnumerable<T>
implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>
.
Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext()
and Current
, but that can probably be handled...):
- If you've already read the next element within another iterator, you can yield it from your buffer
- If you've already discovered that there is no next element, you can return that immediately
- Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.
The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T>
- if you don't need to do that, it makes things simpler.
As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:
public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;
public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}
IEnumerator GetEnumerator() => GetEnumerator();
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}
Possible multiple enumeration of IEnumerable vs Parameter can be declared with base type
Generally speaking, what you need is some state object into which you can PUSH the items (within a foreach loop), and out of which you then get your final result.
The downside of the enumerable LINQ operators is that they actively enumerate the source instead of accepting items being pushed to them, so they don't meet your requirements.
If you e.g. just need the minimum and maximum values of a sequence of 1'000'000 integers which cost $1'000 worth of processor time to retrieve, you end up writing something like this:
public class MinMaxAggregator
{
private bool _any;
private int _min;
private int _max;
public void OnNext(int value)
{
if (!_any)
{
_min = _max = value;
_any = true;
}
else
{
if (value < _min) _min = value;
if (value > _max) _max = value;
}
}
public MinMax GetResult()
{
if (!_any) throw new InvalidOperationException("Sequence contains no elements.");
return new MinMax(_min, _max);
}
}
public static MinMax DoSomething(IEnumerable<int> source)
{
var aggr = new MinMaxAggregator();
foreach (var item in source) aggr.OnNext(item);
return aggr.GetResult();
}
In fact, you just re-implemented the logic of the Min() and Max() operators. Of course that's easy, but they are only examples for arbitrary complex logic you might otherwise easily express in a LINQish way.
The solution came to me on yesterday's night walk: we need to PUSH... that's REACTIVE! All the beloved operators also exist in a reactive version built for the push paradigm. They can be chained together at will to whatever complexity you need, just as their enumerable counterparts.
So the min/max example boils down to:
public static MinMax DoSomething(IEnumerable<int> source)
{
// bridge over to the observable world
var connectable = source.ToObservable(Scheduler.Immediate).Publish();
// express the desired result there (note: connectable is observed by multiple observers)
var combined = connectable.Min().CombineLatest(connectable.Max(), (min, max) => new MinMax(min, max));
// subscribe
var resultAsync = combined.GetAwaiter();
// unload the enumerable into connectable
connectable.Connect();
// pick up the result
return resultAsync.GetResult();
}
Related Topics
Solution for Overloaded Operator Constraint in .Net Generics
How Do Arrays in C# Partially Implement Ilist<T>
Using Stringwriter for Xml Serialization
Regex for Accepting Only Persian Characters
Best Way in ASP.NET to Force Https for an Entire Site
How to Create/Edit a Manifest File
Interface Defining a Constructor Signature
Change System Date Programmatically
Workaround for Lack of 'Nameof' Operator in C# for Type-Safe Databinding
How to Iterate Over Values of an Enum Having Flags
Richtextbox (Wpf) Does Not Have String Property "Text"
Loop Through All the Resources in a .Resx File
Is It Better to Create a Singleton to Access Unity Container or Pass It Through the Application
Have a Set of Tasks with Only X Running at a Time