Create batches in LINQ
An Enumerable.Chunk()
extension method was added to .NET 6.0.
Example:
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }
For those who cannot upgrade, the source is available on GitHub.
How to build batches/buckets with linq
Originally posted by @Nick_Whaley in Create batches in linq, but not the best response as the question was formulated differently:
Try this:
public static IEnumerable<IEnumerable<T>> Bucketize<T>(this IEnumerable<T> items, int bucketSize)
{
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
yield return GetNextBucket(enumerator, bucketSize);
}
private static IEnumerable<T> GetNextBucket<T>(IEnumerator<T> enumerator, int maxItems)
{
int count = 0;
do
{
yield return enumerator.Current;
count++;
if (count == maxItems)
yield break;
} while (enumerator.MoveNext());
}
The trick is to pass the old-fashion enumerator between inner and outer enumeration, to enable continuation between two batches.
How to loop through IEnumerable in batches
Sounds like you need to use Skip and Take methods of your object. Example:
users.Skip(1000).Take(1000)
this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call
You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.
public IEnumerable<user> GetBatch(int pageNumber)
{
return users.Skip(pageNumber * 1000).Take(1000);
}
Batch Update using LINQ
You need to use skip/take in LINQ to get around this. The code below should work. Although I'm not always a fan of doing a while(true) with a break statement, it's the easiest way to implement this.
int takeCount = 50;
int skipAmount = 0;
while (true)
{
var dbUserList = db.Users.Where(x => users.Select(y => y.Id).Contains(x.Id)).OrderBy(x => x.Id).Skip(skipAmount).Take(takeAmount);
if (!dbUserList.Any)
{
break;
}
foreach (var user in users)
{
var dbUser = dbUserList.First(x => x.Id == user.Id);
dbUser.name = user.name;
dbUser.cat = user.cat;
}
db.SaveChanges();
}
Looping through a ListT in batches while ensuring items per batch are unique
If I completely understand the problem, then there are many ways to do this and the best solution would depend on your actual needs.
The assumptions are :
- What you have described is an in memory approach
- It doesn't need to hit a database
- It doesn't need to be producer consumer.
Then a very simple (yet efficient) batch and queue pattern can be used with minimal allocations.
Given
public class Payment
{
public int AccountId { get; set; }
public Payment(int accountId) => AccountId = accountId;
}
And
public static IEnumerable<Payment[]> GetBatches(IEnumerable<Payment> source, int count)
{
var hashset = new HashSet<int>(count);
var batch = new List<Payment>(count);
var leftOvers = new Queue<Payment>();
while (true)
{
foreach (var item in source)
{
// check if batched
if (hashset.Add(item.AccountId))
batch.Add(item); // add to batch
else
leftOvers.Enqueue(item); // add to left overs
// if we are at the batch size start a loop
while (batch.Count == count)
{
yield return batch.ToArray(); // return the batch
batch.Clear();
hashset.Clear();
// check the left overs
while (leftOvers.Any() && batch.Count != count)
if (hashset.Add(leftOvers.Peek().AccountId)) // check the batch
batch.Add(leftOvers.Dequeue());
else break; // we still have a duplicate bail
}
}
if(batch.Any()) yield return batch.ToArray();
if (!leftOvers.Any()) break;
source = leftOvers.ToList(); // allocation :(
hashset.Clear();
batch.Clear();
leftOvers.Clear();
}
}
Note : This is fairly resource efficient, though it does probably have an extra unnecessary small allocation when dealing with pure leftovers, I am sure this could be removed, though I'll leave that up to you. There are also many efficiencies you could add with the use of a channel could easily be turned into a consumer
Test
var list = new List<Payment>() {new(1), new(2), new(3), new(4), new(4), new(5), new(6), new(4), new(4), new(6), new(4)};
var batches = GetBatches(list, 3);
foreach (var batch in batches)
Console.WriteLine(string.Join(", ",batch.Select(x => x.AccountId)));
Output
1, 2, 3
4, 5, 6
4, 6
4
4
4
Full demo here to Play with
How to process IEnumerable in batches?
Using the batch solution from this thread it seems trivial:
const int batchSize = 100;
foreach (var batch in contacts.Batch(batchSize))
{
DoSomething(batch);
}
If you want to also wrap it up:
public static void ProcessInBatches<TSource>(
this IEnumerable<TSource> source,
int batchSize,
Action<IEnumerable<TSource>> action)
{
foreach (var batch in source.Batch(batchSize))
{
action(batch);
}
}
So, your code can be transformed into:
const int batchSize = 100;
contacts.ProcessInBatches(batchSize, DoSomething);
Create Buckets ith Linq
Well, for arbitrary list you have to compute range: [min..max]
and then
step = (max - min) / 2;
Code:
// Given
List<double> list = new List<double>() {
0, 0.1, 1.1, 2.2, 3.3, 4.1, 5.6, 6.3, 7.1, 8.9, 9.8, 9.9, 10
};
int n = 5;
// We compute step
double min = list.Min();
double max = list.Max();
double step = (max - min) / 5;
// And, finally, group by:
double[][] result = list
.GroupBy(item => (int)Math.Clamp((item - min) / step, 0, n - 1))
.OrderBy(group => group.Key)
.Select(group => group.ToArray())
.ToArray();
// Let's have a look:
string report = string.Join(Environment.NewLine, result
.Select((array, i) => $"[{min + i * step} .. {min + i * step + step,2}) : {{{string.Join("; ", array)}}}"));
Console.WriteLine(report);
Outcome:
[0 .. 2) : {0; 0.1; 1.1}
[2 .. 4) : {2.2; 3.3}
[4 .. 6) : {4.1; 5.6}
[6 .. 8) : {6.3; 7.1}
[8 .. 10) : {8.9; 9.8; 9.9; 10}
Please, note Math.Clamp
method to ensure [0..n-1]
range for groups keys. If you want a Dictionary<int, double[]>
where Key
is index of bucket:
Dictionary<int, double[]> buckets = list
.GroupBy(item => (int)Math.Clamp((item - min) / step, 0, n - 1))
.ToDictionary(group => group.Key, group => group.ToArray());
Linq Select 5 items per Iteration
for (int i=0; i < 20 ; i++)
{
var fiveitems = theList.Skip(i*5).Take(5);
}
Batch with Multiple GroupBy
orderby Location, RepName, AccountID
There needs to be a select clause after the above, as demonstrated in StriplingWarrior's answer. Linq Comprehension Queries must end with select or group by.
Unfortunately, there is a logical defect... Suppose I have 50 accounts in the first group and 100 accounts in the second group with a batch size of 100. The original code will produce 3 batches of size 50, not 2 batches of 50, 100.
Here's one way to fix it.
IEnumerable<IGrouping<int, EbrRecord>> query = ...
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]) into x
group x by new {Location = x.Location, RepName = x.RepName} into g
from g2 in g.Select((data, index) => new Record = data, Index = index })
.GroupBy(y => y.Index/100, y => y.Record)
select g2;
List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
Also note that using GroupBy to batch is very slow due to redundant iterations. You can write a for loop that will do it in one pass over the ordered set and that loop will run much faster than the LinqToObjects.
Related Topics
Casting VS Using the 'As' Keyword in the Clr
What's Wrong With Using Thread.Abort()
Decimal VS Double! - Which One Should I Use and When
Convert List≪Derivedclass≫ to List≪Baseclass≫
How to Limit the Amount of Concurrent Async I/O Operations
C# Interfaces. Implicit Implementation Versus Explicit Implementation
How to Create a Custom Authorizeattribute in ASP.NET Core
How to Get a Consistent Byte Representation of Strings in C# Without Manually Specifying an Encoding
C# Compiler Error: "Not All Code Paths Return a Value"
How to Use Linq to Select Object With Minimum or Maximum Property Value
How to Get the Application'S Path in a .Net Console Application
Send Values from One Form to Another Form
Sending Email Through Gmail Smtp Server With C#
How to Auto-Generate a C# Class File from a Json String
The Entity Cannot Be Constructed in a Linq to Entities Query
Why Are Cdecl Calls Often Mismatched in the "Standard" P/Invoke Convention
Pre & Post Increment Operator Behavior in C, C++, Java, & C#