Linq-to-Entities Join vs GroupJoin
Behaviour
Suppose you have two lists:
Id Value
1 A
2 B
3 C
Id ChildValue
1 a1
1 a2
1 a3
2 b1
2 b2
When you Join
the two lists on the Id
field the result will be:
Value ChildValue
A a1
A a2
A a3
B b1
B b2
When you GroupJoin
the two lists on the Id
field the result will be:
Value ChildValues
A [a1, a2, a3]
B [b1, b2]
C []
So Join
produces a flat (tabular) result of parent and child values.GroupJoin
produces a list of entries in the first list, each with a group of joined entries in the second list.
That's why Join
is the equivalent of INNER JOIN
in SQL: there are no entries for C
. While GroupJoin
is the equivalent of OUTER JOIN
: C
is in the result set, but with an empty list of related entries (in an SQL result set there would be a row C - null
).
Syntax
So let the two lists be IEnumerable<Parent>
and IEnumerable<Child>
respectively. (In case of Linq to Entities: IQueryable<T>
).
Join
syntax would be
from p in Parent
join c in Child on p.Id equals c.Id
select new { p.Value, c.ChildValue }
returning an IEnumerable<X>
where X is an anonymous type with two properties, Value
and ChildValue
. This query syntax uses the Join
method under the hood.
GroupJoin
syntax would be
from p in Parent
join c in Child on p.Id equals c.Id into g
select new { Parent = p, Children = g }
returning an IEnumerable<Y>
where Y is an anonymous type consisting of one property of type Parent
and a property of type IEnumerable<Child>
. This query syntax uses the GroupJoin
method under the hood.
We could just do select g
in the latter query, which would select an IEnumerable<IEnumerable<Child>>
, say a list of lists. In many cases the select with the parent included is more useful.
Some use cases
1. Producing a flat outer join.
As said, the statement ...
from p in Parent
join c in Child on p.Id equals c.Id into g
select new { Parent = p, Children = g }
... produces a list of parents with child groups. This can be turned into a flat list of parent-child pairs by two small additions:
from p in parents
join c in children on p.Id equals c.Id into g // <= into
from c in g.DefaultIfEmpty() // <= flattens the groups
select new { Parent = p.Value, Child = c?.ChildValue }
The result is similar to
Value Child
A a1
A a2
A a3
B b1
B b2
C (null)
Note that the range variable c
is reused in the above statement. Doing this, any join
statement can simply be converted to an outer join
by adding the equivalent of into g from c in g.DefaultIfEmpty()
to an existing join
statement.
This is where query (or comprehensive) syntax shines. Method (or fluent) syntax shows what really happens, but it's hard to write:
parents.GroupJoin(children, p => p.Id, c => c.Id, (p, c) => new { p, c })
.SelectMany(x => x.c.DefaultIfEmpty(), (x,c) => new { x.p.Value, c?.ChildValue } )
So a flat outer join
in LINQ is a GroupJoin
, flattened by SelectMany
.
2. Preserving order
Suppose the list of parents is a bit longer. Some UI produces a list of selected parents as Id
values in a fixed order. Let's use:
var ids = new[] { 3,7,2,4 };
Now the selected parents must be filtered from the parents list in this exact order.
If we do ...
var result = parents.Where(p => ids.Contains(p.Id));
... the order of parents
will determine the result. If the parents are ordered by Id
, the result will be parents 2, 3, 4, 7. Not good. However, we can also use join
to filter the list. And by using ids
as first list, the order will be preserved:
from id in ids
join p in parents on id equals p.Id
select p
The result is parents 3, 7, 2, 4.
Entity Framework GroupJoin different result vs join with group
Syntax doesn't matter - you can do one or the another using either syntax.
The difference is coming from the way the queries are built (mentioned in the post title) - join
(Join
) + group by
(GroupBy
) vs join into
(GroupJoin
).
The former is basically an equivalent of SQL way of doing it. While GroupJoin
is a LINQ specific construct. The important thing to note is that it has left outer join semantics (in fact it is used to simulate left outer joins in LINQ). In many regards it's better than SQL way because it avoids the redundant grouping of the result. From the other side it always returns outer records due to left outer join semantics. If you want to filter out the outer records with no matching inner records, you have to do that yourself. For instance, by adding at the end of your second query
.Where(x => x.Count != 0)
Or in general, using the following pattern:
from a in A
join b in B on a.Key equals b.Key into g
where g.Any()
...
Understanding the GroupJoin and Join in Linq chaining syntax (Homework)
The fourth argument - which maps to the fifth parameter in the documentation (because the first parameter is the target of the extension method call) is just the result selector. It's a function accepting two parameters: the first is an element of the "outer" sequence (the people
array in your case) and the second is a sequence of elements from the "inner" sequence (the items
array in your case) which have the same key as the outer element. The function should return a "result" element, and the overall result of the method call is a sequence of those results.
The function is called once for each of the "outer" elements, so you'd have:
- First call: person ID 1, and products with IDs 1, 4, 5, 7
- Second call: person ID 2, and the product with ID 2
- Third call: person ID 3, and an empty sequence of products
Your query is complex because you're using an anonymous type for your result, and constructing an instance of the anonymous type using another query. Here's a simpler query that might help to clarify:
var productOwnerList = people
.GroupJoin(
items,
o => o.Id,
i => i.PersonId,
(person, items) => $"{person.Id}: {string.Join(",", items.Select(item => item.ProductId))}"
.ToArray();
Linq GroupJoin (join...into) results in INNER JOIN?
In general the query should generate left outer join
.
But remember, this is EF, and it has additional information coming from the model. In this case looks like the StaffID
property of Person
is an enforced FK constraint to Stuff
, so EF knows that there is always a corresponding record in Staff
table, hence ignoring your left outer join
construct and generates inner join
instead.
Again, the model (properties, whether they are required or not, the relationships - required or not etc.) allows EF to perform similar smart decisons and optimizations.
Linq to Entities From vs Join
Internally, LINQ to Entities will create two different queries, where your first example will use a WHERE and the second one will JOIN the tables. However this will have no effect on performance, as the SQL optimizer will create the same execution plans for both these queries. See Inner join vs Where
best practices wise, the first option is almost always preferable. Using navigation properties makes your code easier to read and understand and removes the hassle of having to join the tables yourself.
SQL to LINQ Involving Multiple GroupJoin
The equivalent of your SQL query is the following link phrase.
The GroupJoin
is for when you want to match a value with several values.
For more information, you can refer to this link: Linq to Entities join vs groupjoin
var list = _ApplicationDbContext.UserExtend.Join(_ApplicationDbContext.OrderTransaction,
usex => usex.Id,
ord => ord.MerchantUserId,
(usex, ord) => new { usex, ord })
.Join(_ApplicationDbContext.StoreDomain,
usexOrd => usexOrd.usex.StoreId,
sd => sd.StoreId,
(usexOrd, sd) => new { usexOrd, sd })
.Join(_ApplicationDbContext.Store,
usexOrdStoreDom => usexOrdStoreDom.sd.StoreId,
st => st.Id,
(usexOrdStoreDom, st) => new { usexOrdStoreDom, st })
.GroupBy(a => new { a.usexOrdStoreDom.usexOrd.ord.MerchantUserId, a.usexOrdStoreDom.usexOrd.usex.Nickname, a.usexOrdStoreDom.sd.Url, a.st.Name, a.usexOrdStoreDom.usexOrd.ord.CreatedDate })
.OrderBy(a => a.Key.CreatedDate)
.Select(a => new
{
MerchantUserId = a.Key.MerchantUserId,
Nickname = a.Key.Nickname,
Url = a.Key.Url,
storename = a.Key.Name,
LastOrderDate = a.Max(x => x.usexOrdStoreDom.usexOrd.ord.CreatedDate)
})
.ToList();
LINQ to Entities does not recognize the method group join
Try to replace this line:
select new Item { DishID = d.DishID, components = items.ToList() }
with this one:
select new Item { DishID = d.DishID, components = items }
The error is quite clear: ToList()
function cannot be translated in that context.
Related Topics
Insert 2 Million Rows into SQL Server Quickly
How to Export Datatable to Excel
Ado.Net |Datadirectory| Where Is This Documented
Convert Json String to C# Object
Why Does the Ef 6 Tutorial Use Asynchronous Calls
Getting Servicestack to Retain Type Information
When Should I Use a List VS a Linkedlist
Validating an Xml Against Referenced Xsd in C#
Compression/Decompression String With C#
Getting Multiple Keys of Specified Value of a Generic Dictionary
Decimal Precision and Scale in Ef Code First
Still Confused About Covariance and Contravariance & In/Out
Best Practice For Forcing Garbage Collection in C#