Why Are Tolookup and Groupby Different

Why are ToLookup and GroupBy different?

why would I ever bother with GroupBy? Why should it exist?

What happens when you call ToLookup on an object representing a remote database table with a billion rows in it?

The billion rows are sent over the wire, and you build the lookup table locally.

What happens when you call GroupBy on such an object?

A query object is built; end of story.

When that query object is enumerated then the analysis of the table is done on the database server and the grouped results are sent back on demand a few at a time.

Logically they are the same thing but the performance implications of each are completely different. Calling ToLookup means I want a cache of the entire thing right now organized by group. Calling GroupBy means "I am building an object to represent the question 'what would these things look like if I organized them by group?'"

Does ToLookup forces immediate execution of a sequence

Easy enough to test...

void Main()
{
var lookup = Inf().ToLookup(i => i / 100);
Console.WriteLine("if you see this, ToLookup is deferred"); //never happens
}

IEnumerable<int> Inf()
{
unchecked
{
for(var i=0;;i++)
{
yield return i;
}
}
}

To recap, ToLookup greedily consumes the source sequence without deferring.

In contrast, the GroupBy operator is deferred, so you can write the following to no ill-effect:

var groups = Inf().GroupBy(i => i / 100); //oops

However, GroupBy is greedy, so when you enumerate, the entire source sequence is consumed.

This means that

groups.SelectMany(g=>g).First();

also fails to complete.

When you think about the problem of grouping, it quickly becomes apparent that when separating a sequence into a sequence of groups, it would be impossible to know if even just one of the groups were complete without completely consuming the entire sequence.

LINQ grouping to more than one group

//Populating data
Dictionary<int, List<string>> GroupedA = new Dictionary<int, List<string>>();

GroupedA.Add(1, new List<string>{"1","2","3"});
GroupedA.Add(2, new List<string>{"1","32","3","4"});
GroupedA.Add(3, new List<string>{"1","52","43","4"});

//Inverting data
ILookup<string, int> GroupedB =
GroupedA.SelectMany(pair => pair.Value.Select(val => new{pair.Key, val}))
.ToLookup(pair => pair.val, pair => pair.Key);

//Printing data
var pairs = GroupedB.Select(pair => string.Format("{0} : {1}", pair.Key, string.Join(",", pair)));

Console.WriteLine (string.Join(Environment.NewLine, pairs));

prints:

1 : 1,2,3 
2 : 1
3 : 1,2
32 : 2
4 : 2,3
52 : 3
43 : 3

Is there a better GroupBy to Dictionary (or solution) to bucketting?

Whenever you find yourself with a Dictionary<TKey, List<TSomething>>, you may find you can happily use a Lookup<TKey, TSomething>. If this proves to be the case, you can use ToLookup to make one.

However, neither for ToLookup nor for your code is there a query expression syntax available, unfortunately.

ILookup versus IGrouping

You should call ToLookup if you need to lookup values by key, but you don't need ordering.

You should call GroupBy if you just need to loop through the groups.

C# - LINQ Lambda expression using GroupBy - Why nested validations are so inefficient?

I recently read that whenever EF Core 2 ran into anything that it couldn't produce a SQL Query for, it would switch to in-memory evaluation. So the first query would basically be pulling all of your ProjectDetails out of the database, then doing all the grouping and such in your application's memory. That's probably the biggest issue you had.

Using .Include had a big impact in that case, because you were including a bunch of other data when you pulled out all those ProjectDetails. It probably has little to no impact now that you've avoided doing all that work in-memory.

They realized the error in their ways, and changed the behavior to throw an exception in cases like that starting with EF Core 3.

To avoid problems like this in the future, you can upgrade to EF Core 3, or just be really careful to ensure Entity Framework can translate everything in your query to SQL.

Linq to Select First Group

There is no need for the ToLookup. The lookup groups by different DateDetails, but matches is already filtered to a single date, so there is already only one group to select.

You could skip the filter and just go with:

var match = recordList.ToLookup(a => a.DateDetails).First()

lookaheadList = match.ToList();

However, this is redundant for a couple of reasons:

  • If you're not storing the result of ToLookup and using it to look up other groups by date, there was no point creating the lookup object -- you could have just used GroupBy.

  • If you only need the first group, there is no need for any grouping at all (either by ToLookup or GroupBy).

To directly grab the items that match the first date, use:

var firstDate = recordList.First().DateDetails;

var matches = recordList.Where(d => d.DateDetails == firstDate)

lookaheadList = matches.ToList();

Why does LINQ GroupBy produce different results when preceded by ToArray()?

I found the answer. user995219's answer was useful, but not the full explanation.

Apparently, LINQ methods check the contents of what they are operating on. In my case, I'm using classes generated by Entity Framework. These have "Entity Keys", which allow the .net framework to distinguish between two rows that have the same contents and two instances of the same row.

In my case, I was using a complicated view and the .net framework inferred the entity keys incorrectly and then discarded rows because it thought they were the same.

The solution for me was to modify my view so that there is a GUID that uniquely identifies each row and use the GUIDs as an entity key.



Related Topics



Leave a reply



Submit