Split an Ienumerable<T> into Fixed-Sized Chunks (Return an Ienumerable<Ienumerable<T>> Where the Inner Sequences Are of Fixed Length)

Split an IEnumerableT into fixed-sized chunks (return an IEnumerableIEnumerableT where the inner sequences are of fixed length)

You could try to implement Batch method mentioned above on your own like this:

    static class MyLinqExtensions 
{
public static IEnumerable<IEnumerable<T>> Batch<T>(
this IEnumerable<T> source, int batchSize)
{
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return YieldBatchElements(enumerator, batchSize - 1);
}

private static IEnumerable<T> YieldBatchElements<T>(
IEnumerator<T> source, int batchSize)
{
yield return source.Current;
for (int i = 0; i < batchSize && source.MoveNext(); i++)
yield return source.Current;
}
}

I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.

UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:

public static void Main(string[] args)
{
var xs = Enumerable.Range(1, 20);
Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}

public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
foreach (var batch in batches)
{
Console.WriteLine($"[{string.Join(", ", batch)}]");
}
}

will output:

[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]

So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ

Split a collection into `n` parts with LINQ?

A pure linq and the simplest solution is as shown below.

static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}

Split List into Sublists with LINQ

Try the following code.

public static List<List<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}

The idea is to first group the elements by indexes. Dividing by three has the effect of grouping them into groups of 3. Then convert each group to a list and the IEnumerable of List to a List of Lists

Create batches in LINQ

An Enumerable.Chunk() extension method was added to .NET 6.0.

Example:

var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };

var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }

For those who cannot upgrade, the source is available on GitHub.

Split a List into smaller lists of N size

public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)  
{
var list = new List<List<float[]>>();

for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}

return list;
}

Generic version:

public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)  
{
for (int i = 0; i < locations.Count; i += nSize)
{
yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i));
}
}

How to loop through IEnumerable in batches

Sounds like you need to use Skip and Take methods of your object. Example:

users.Skip(1000).Take(1000)

this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call

You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.

public IEnumerable<user> GetBatch(int pageNumber)
{
return users.Skip(pageNumber * 1000).Take(1000);
}

Partitioning IEnumerableT Results

I've needed to do this regularly. As Alexei alludes to, an enumerable of enumerable is the thing I've wanted when dealing with this shape of problem.

    public static IEnumerable<IEnumerable<T>> ToInMemoryBatches<T>(this IEnumerable<T> source, int batchSize)
{
List<T> batch = null;
foreach (var item in source)
{
if (batch == null)
batch = new List<T>();

batch.Add(item);

if (batch.Count != batchSize)
continue;

yield return batch;
batch = null;
}

if (batch != null)
yield return batch;
}

IEnumerable vs List - What to Use? How do they work?

IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.

Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:

public IEnumerable<Animals> AllSpotted()
{
return from a in Zoo.Animals
where a.coat.HasSpots == true
select a;
}

public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Felidae"
select a;
}

public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
return from a in sample
where a.race.Family == "Canidae"
select a;
}

Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:

var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());

So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.

In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:

List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();


Related Topics



Leave a reply



Submit