Enumerating Collections That Are Not Inherently Ienumerable

Enumerating Collections that are not inherently IEnumerable?

This code should do the trick

public static class Extensions
{
public static IEnumerable<T> GetRecursively<T>(this IEnumerable collection,
Func<T, IEnumerable> selector)
{
foreach (var item in collection.OfType<T>())
{
yield return item;

IEnumerable<T> children = selector(item).GetRecursively(selector);
foreach (var child in children)
{
yield return child;
}
}
}
}

Here's an example of how to use it

TreeView view = new TreeView();

// ...

IEnumerable<TreeNode> nodes = view.Nodes.
.GetRecursively<TreeNode>(item => item.Nodes);

Update: In response to Eric Lippert's post.

Here's a much improved version using the technique discussed in All About Iterators.

public static class Extensions
{
public static IEnumerable<T> GetItems<T>(this IEnumerable collection,
Func<T, IEnumerable> selector)
{
Stack<IEnumerable<T>> stack = new Stack<IEnumerable<T>>();
stack.Push(collection.OfType<T>());

while (stack.Count > 0)
{
IEnumerable<T> items = stack.Pop();
foreach (var item in items)
{
yield return item;

IEnumerable<T> children = selector(item).OfType<T>();
stack.Push(children);
}
}
}
}

I did a simple performance test using the following benchmarking technique. The results speak for themselves. The depth of the tree has only marginal impact on the performance of the second solution; whereas the performance decreases rapidly for the first solution, eventually leadning to a StackOverflowException when the depth of the tree becomes too great.

benchmarking

is IEnumerable enumerated on call of the method or when enumerating the response

The list (enumerable) is only enumerated when the result of ProcessList (enum2) is enumerated:

static void Main(string[] args)
{
var enumerable = Enum1();
Console.WriteLine("Enum1 retrieved");
var enum2 = Enum2(enumerable);
Console.WriteLine("Enum2 called");
foreach (var e in enum2)
{
Console.WriteLine(e);
}
}

private static IEnumerable<string> Enum1()
{
Console.WriteLine("Enum1");
yield return "foo";
}

private static IEnumerable<string> Enum2(IEnumerable<string> enumerable)
{
Console.WriteLine("Enum2");
foreach (var s in enumerable)
{
yield return s;
}
}

Gives:

Enum1 retrieved
Enum2 called
Enum2
Enum1
foo

The last three lines are only printed when entering the foreach loop.

Should parameters/returns of collections be IEnumerableT or T[]?

I went through a phase of passing around T[], and to cut a long story short, it's a pain in the backside. IEnumerable<T> is much better

However I wonder, with the late evaluation of the IEnumerable generic type if that is a good idea. Does it make more sense to use the T[] generic type? IList? Or something else

Late evaluation is precisely why IEnumerable is so good. Here's an example workflow:

IEnumerable<string> files = FindFileNames();
IEnumerable<string> matched = files.Where( f => f.EndsWith(".txt") );
IEnumerable<string> contents = matched.Select( f => File.ReadAllText(f) );
bool foundContents = contents.Any( s => s.Contains("orion") );

For the impatient, this gets a list of filenames, filters out .txt files, then sets the foundContents to true if any of the text files contain the word orion.

If you write the code using IEnumerable as above, you will only load each file one by one as you need them. Your memory usage will be quite low, and if you match on the first file, you prevent the need to look at any subsequent files. It's great.

If you wrote this exact same code using arrays, you'd end up loading all the file contents up front, and only then (if you have any RAM left) would any of them be scanned. Hopefully this gets the point across about why lazy lists are so good.

One thing that has not gotten addressed though seems to be the issue of thread safety. If, for example, you take an IEnumerable<T> argument to a method and it gets enumerated in a different thread, then when that thread attempts to access it the results might be different than those that were meant to be passed in. Worse still, attempting to enumerate an IEnumerable<T> twice - I believe throws an exception. Shouldn't we be striving to make our methods thread safe?

Thread safety is a giant red herring here.

If you used an array rather than an enumerable, it looks like it should be safer, but it's not. Most of the time when people return arrays of objects, they create a new array, and then put the old objects in it. If you return that array, then those original objects can then be modified, and you end up with precisely the kind of threading problems you're trying to avoid.

A partial solution is to not return an array of the original objects, but an array of new or cloned objects, so other threads can't access the original ones. This is useful, however there's no reason an IEnumerable solution can't also do this. One is no more threadsafe than the other.

Stop IEnumerable from enumerating through all elements

You can use the Take method to do something like this:

profile = profile.Take(1);

Now, when you enumerate profile, you will go through one item only.

Linq To Select All controls where DI contains some text from ControlCollection

Use an extension method to get all child controls:

public static class ControlExt {
public static IEnumerable<Control> AndSubControls(this Control aControl) {
var work = new Queue<Control>();
work.Enqueue(aControl);
while (work.Count > 0) {
var c = work.Dequeue();
yield return c;
foreach (var sc in c.Controls.Cast<Control>()) {
yield return sc;
if (sc.Controls.Count > 0)
work.Enqueue(sc);
}
}
}
}

Now you can test all the subcontrols in your ControlCollection:

IEnumerable<Control> matchedControls = controls.SelectMany(c => c.AndSubControls())
.Where(a => a != null && a.ID != null && a.ID.Contains(controlID));

Best practice for parameter: IEnumerable vs. IList vs. IReadOnlyCollection

You can take an IEnumerable<T> in the method, and use a CachedEnumerable similar to the one here to wrap it.

This class wraps an IEnumerable<T> and makes sure that it is only enumerated once. If you try to enumerate it again, it yield items from the cache.

Please note that such wrapper does not read all items from the wrapped enumerable immediately. It only enumerates individual items from the wrapped enumerable as you enumerate individual items from the wrapper, and it caches the individual items along the way.

This means that if you call Any on the wrapper, only a single item will be enumerated from the wrapped enumerable, and then such item will be cached.

If you then use the enumerable again, it will first yield the first item from the cache, and then continue enumerating the original enumerator from where it left.

You can do something like this to use it:

public IEnumerable<Data> RemoveHandledForDate(IEnumerable<Data> data, DateTime dateTime)
{
var dataWrapper = new CachedEnumerable(data);
...
}

Notice here that the method itself is wrapping the parameter data. This way, you don't force consumers of your method to do anything.

Error while sending a dropdown collection to a method accepting collection of Controls

use

IEnumerable<Control> dropDownControlsInCurrentRow;

instead of

IEnumerable<DropDownList> dropDownControlsInCurrentRow;

What's the difference between IEnumerable and Array, IList and List?

IEnumerable provides only minimal "iterable" functionality. You can traverse the sequence, but that's about it.

This has disadvantages; for example, it is very inefficient to count elements using IEnumerable, or to get the nth element.

But it has advantages too; for example, an IEnumerable could be an endless sequence, like the sequence of primes.

Array is a fixed-size collection with random access (i.e. you can index into it).

List is a variable-size collection (i.e. you can add and remove elements) with random access.

IList is an interface which abstracts list functionality (count, add, remove, indexer access) away from the various concrete classes such as List, BindingList, ObservableCollection, etc.

Is T[] not better than IEnumerableT as parameter type? (Considering threading challenges)

The real "problem" here is that while T[] is probably a more specific parameter type than would be ideal, and allows the recipient free access to write any element (which may or may not be desirable), IEnumerable<T> is too general. From a type-safety standpoint, the caller can supply a reference to any object which implements IEnumerable<T>, but in reality only certain such objects will actually work and there's no clean way for the caller to know which ones those are.

If T is a reference type, a T[] will be inherently thread-safe for reading, writing, and enumeration, subject to certain conditions (e.g. threads accessing different elements will not interfere at all; if one thread writes an item around the time another thread reads or enumerates it, the latter thread will see either old or new data. None of Microsoft's collection interfaces offer any such guarantees, nor do they provide any means by which a collection can indicate what guarantees it can or cannot make.

My inclination would be to either use T[], or else define an IThreadSafeList<T> where T:class which would include members like CompareExchangeItem that would allow Interlocked.CompareExchange on an item, but would not include things like Insert and Remove which cannot be done in thread-safe fashion.



Related Topics



Leave a reply



Submit