Why are ToLookup and GroupBy different?
why would I ever bother with GroupBy? Why should it exist?
What happens when you call ToLookup on an object representing a remote database table with a billion rows in it?
The billion rows are sent over the wire, and you build the lookup table locally.
What happens when you call GroupBy on such an object?
A query object is built; end of story.
When that query object is enumerated then the analysis of the table is done on the database server and the grouped results are sent back on demand a few at a time.
Logically they are the same thing but the performance implications of each are completely different. Calling ToLookup means I want a cache of the entire thing right now organized by group. Calling GroupBy means "I am building an object to represent the question 'what would these things look like if I organized them by group?'"
Does ToLookup forces immediate execution of a sequence
Easy enough to test...
void Main()
{
var lookup = Inf().ToLookup(i => i / 100);
Console.WriteLine("if you see this, ToLookup is deferred"); //never happens
}
IEnumerable<int> Inf()
{
unchecked
{
for(var i=0;;i++)
{
yield return i;
}
}
}
To recap, ToLookup
greedily consumes the source sequence without deferring.
In contrast, the GroupBy
operator is deferred, so you can write the following to no ill-effect:
var groups = Inf().GroupBy(i => i / 100); //oops
However, GroupBy
is greedy, so when you enumerate, the entire source sequence is consumed.
This means that
groups.SelectMany(g=>g).First();
also fails to complete.
When you think about the problem of grouping, it quickly becomes apparent that when separating a sequence into a sequence of groups, it would be impossible to know if even just one of the groups were complete without completely consuming the entire sequence.
LINQ grouping to more than one group
//Populating data
Dictionary<int, List<string>> GroupedA = new Dictionary<int, List<string>>();
GroupedA.Add(1, new List<string>{"1","2","3"});
GroupedA.Add(2, new List<string>{"1","32","3","4"});
GroupedA.Add(3, new List<string>{"1","52","43","4"});
//Inverting data
ILookup<string, int> GroupedB =
GroupedA.SelectMany(pair => pair.Value.Select(val => new{pair.Key, val}))
.ToLookup(pair => pair.val, pair => pair.Key);
//Printing data
var pairs = GroupedB.Select(pair => string.Format("{0} : {1}", pair.Key, string.Join(",", pair)));
Console.WriteLine (string.Join(Environment.NewLine, pairs));
prints:
1 : 1,2,3
2 : 1
3 : 1,2
32 : 2
4 : 2,3
52 : 3
43 : 3
Is there a better GroupBy to Dictionary (or solution) to bucketting?
Whenever you find yourself with a Dictionary<TKey, List<TSomething>>
, you may find you can happily use a Lookup<TKey, TSomething>
. If this proves to be the case, you can use ToLookup
to make one.
However, neither for ToLookup
nor for your code is there a query expression syntax available, unfortunately.
ILookup versus IGrouping
You should call ToLookup
if you need to lookup values by key, but you don't need ordering.
You should call GroupBy
if you just need to loop through the groups.
C# - LINQ Lambda expression using GroupBy - Why nested validations are so inefficient?
I recently read that whenever EF Core 2 ran into anything that it couldn't produce a SQL Query for, it would switch to in-memory evaluation. So the first query would basically be pulling all of your ProjectDetails out of the database, then doing all the grouping and such in your application's memory. That's probably the biggest issue you had.
Using .Include
had a big impact in that case, because you were including a bunch of other data when you pulled out all those ProjectDetails. It probably has little to no impact now that you've avoided doing all that work in-memory.
They realized the error in their ways, and changed the behavior to throw an exception in cases like that starting with EF Core 3.
To avoid problems like this in the future, you can upgrade to EF Core 3, or just be really careful to ensure Entity Framework can translate everything in your query to SQL.
Linq to Select First Group
There is no need for the ToLookup
. The lookup groups by different DateDetails
, but matches
is already filtered to a single date, so there is already only one group to select.
You could skip the filter and just go with:
var match = recordList.ToLookup(a => a.DateDetails).First()
lookaheadList = match.ToList();
However, this is redundant for a couple of reasons:
If you're not storing the result of
ToLookup
and using it to look up other groups by date, there was no point creating the lookup object -- you could have just used GroupBy.If you only need the first group, there is no need for any grouping at all (either by
ToLookup
orGroupBy
).
To directly grab the items that match the first date, use:
var firstDate = recordList.First().DateDetails;
var matches = recordList.Where(d => d.DateDetails == firstDate)
lookaheadList = matches.ToList();
Why does LINQ GroupBy produce different results when preceded by ToArray()?
I found the answer. user995219's answer was useful, but not the full explanation.
Apparently, LINQ methods check the contents of what they are operating on. In my case, I'm using classes generated by Entity Framework. These have "Entity Keys", which allow the .net framework to distinguish between two rows that have the same contents and two instances of the same row.
In my case, I was using a complicated view and the .net framework inferred the entity keys incorrectly and then discarded rows because it thought they were the same.
The solution for me was to modify my view so that there is a GUID that uniquely identifies each row and use the GUIDs as an entity key.
Related Topics
.Net (3.5) Formats Times Using Dots Instead of Colons as Timeseparator for It-It Culture
Forms' Does Not Exist in the Namespace System.Windows
How to Render an ASP.NET MVC View in PDF Format
Proper Datagrid Search from Textbox in Wpf Using Mvvm
Are Empty Interfaces Code Smell
JSONconvert.Deserializer Indexing Issues
Current Possibilities for Tracing Program Flow in C#
Convert List to Dictionary Using Linq and Not Worrying About Duplicates
Cannot Convert Lambda Expression to Type 'String' Because It Is Not a Delegate Type
Converting an Int[] to Byte[] in C#
Print Rdlc Report Without Showing Reportviewer Control
Using C# to Authenticate User Against Ldap
What Are the Default Schedulers for Each Observable Operator
Get All Associate/Composite Objects Inside an Object (In Abstract Way)