Does the Order of Linq Functions Matter

Does the order of LINQ functions matter?

It will depend on the LINQ provider in use. For LINQ to Objects, that could certainly make a huge difference. Assume we've actually got:

var query = myCollection.OrderBy(item => item.CreatedDate)
.Where(item => item.Code > 3);

var result = query.Last();

That requires the whole collection to be sorted and then filtered. If we had a million items, only one of which had a code greater than 3, we'd be wasting a lot of time ordering results which would be thrown away.

Compare that with the reversed operation, filtering first:

var query = myCollection.Where(item => item.Code > 3)
.OrderBy(item => item.CreatedDate);

var result = query.Last();

This time we're only ordering the filtered results, which in the sample case of "just a single item matching the filter" will be a lot more efficient - both in time and space.

It also could make a difference in whether the query executes correctly or not. Consider:

var query = myCollection.Where(item => item.Code != 0)
.OrderBy(item => 10 / item.Code);

var result = query.Last();

That's fine - we know we'll never be dividing by 0. But if we perform the ordering before the filtering, the query will throw an exception.

Why the order of LINQ to objects methods counts

Why it doesn't work this way with LINQ to objects?

LINQ to Objects doesn't use expression trees. The statement is directly turned into a series of method calls, each of which runs as a normal C# method.

As such, the following in LINQ to Objects:

   var results = collection.OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();

Gets turned into direct method calls:

   var results = Enumerable.ToList(
Enumerable.Where(
Enumerable.OrderBy(collection, x => x.Id),
x => x.Name = "gdoron"
)
);

By looking at the method calls, you can see why ordering matters. In this case, by placing OrderBy first, you're effectively nesting it into the inner-most method call. This means the entire collection will get ordered when the resutls are enumerated. If you were to switch the order:

   var results = collection
.Where(x => x.Name == "gdoron")
.OrderBy(x => x.Id)
.ToList();

Then the resulting method chain switches to:

   var results = Enumerable.ToList(
Enumerable.OrderBy(
Enumerable.Where(collection, x => x.Name = "gdoron"),
x => x.Id
)
);

This, in turn, means that only the filtered results will need to be sorted as OrderBy executes.

What is the correct order to use LinQ statements?

Does a specified order or any good practice guide exist for this?

No, because the order determines what the result is. In SQL (a declarative language), SELECT always comes before WHERE, which comes before GROUP BY, etc., and the parsing engine turns that into an execution plan which will execute in whatever order the optimizer thinks is best.

So selecting, then ordering, then grouping all happens on the data specified by the FROM clause(s), so order does not matter.

C# (within methods) is a procedural language, meaning that statements will be executed in the exact order that you provide them.

When you select, then order, the ordering applies to the selection, meaning that if you select a subset of fields (or project to different fields), the ordering applies to the projection. If you order, then select, the ordering applies to the original data, then the projection applies to the ordered data data.

In your second edited example, the query seems to be broken because you are specifying properties that would be lost from the projection:

var result = Data.OrderBy(x => x.Id).Select(x => x.Text).Where(x => x.Text.StartsWith("ABC")).Take(5).ToList();
^

at this (^) point, you are projecting just the Text property, which I'm assuming sia string, and thus the subsequent Select is working on a collection of strings, which would not have a Text property to filter off of.

Certainly you could change the Where to filter the strings directly, but it illustrates that shifting the order of commands can have a catastrophic impact on the query. It might not make a difference, as you are trying to illustrate, for example, ordering then filtering should be logically equivalent to filtering then ordering (assuming that one doesn't impact the other), and there's no "best practice" to say which should go first, so the right answer (if there is one) would be determined on a case-by-case basis.

Does the order of OrderBy, Select and Where clauses in Linq-to-entities matter

The order in which you create your query before it's executed against the server is not relevant in most cases.

Actually one of the advantages is to be able of gradually create the query by concatenating where, order by, and other clauses.

But there are sometimes where the order can affect the generated sql.

Take the samples you provided. They both compile correctly, but the second does not actually get executed. If you try to run this query against an EF database you will get an NotSupportedException:

System.NotSupportedException: The specified type member 'CourseName' is not supported in LINQ to Entities.

The key here is that you are trying to filter the query by the CourseName property in the view model (StudentView) and not the property of the entity.
So you get this error.

In the case of the first query, it correctly generates this sql:

SELECT
[Extent1].[ID] AS [ID],
[Extent1].[LastName] AS [LastName],
[Extent1].[FirstName] AS [FirstName]
FROM [dbo].[Students] AS [Extent1]
WHERE N'Eurasian Nomads' = [Extent1].[CourseName]
ORDER BY [Extent1].[LastName] ASC, [Extent1].[FirstName] ASC

So, as you can see the order is critical sometimes.

Does order of conditions under where clause in a LINQ query matter

If it's LINQ-To-Objects it matters.

Enumerable.Where is comparable to an ifclause. So this ...

if(expensiveMethodThatReturnsBool() && num1 < num2)
{
// ...
}

... is probably less efficient than:

if(num1 < num2 && expensiveMethodThatReturnsBool())
{
// ...
}

because && is a short-circuit operator. The second expression is evaluated only if the first returned true. The same applies to ||, the second is evaluated only if the first returned false.

7.11 Conditional logical operators


It's similar with chained Wheres. Similar because predicates are applied to every remaining item which passed through the previous Where.

So this...

.Where(x => expensiveMethodThatReturnsBool(x))
.Where(x => x.num1 < x.num2)

can also be less efficient than:

.Where(x => x.num1 < x.num2)
.Where(x => expensiveMethodThatReturnsBool(x))

The first is logically equivalent to:

.Where(x => expensiveMethodThatReturnsBool(x) && x.num1 < x.num2)

Is order of the predicate important when using LINQ?

The answer is going to be different for different LINQ providers. In particular, the story is very different for LINQ to Objects and say LINQ to Entities.

In LINQ to Objects, the Where operator accepts the filter as Func<TSource, bool>. Func<,> is a delegate, so for the purposes of this discussion, you can think of it as a function pointer. In LINQ to Objects, your query is equivalent to this:

static void Main() {
List<TestItem> results = items.Where(MyFilter).ToList();

static boolean MyFilter(TestItem item) {
return item.Item1 == 12 &&
item.Item2 != null &&
item.Item2.SubItem == 65 &&
item.Item3.Equals(anotherThingy)
}

The main thing to notice is that MyFilter is an ordinary C# method and so ordinary C# rules apply, including the short-circuiting behavior of &&. Consequently, the conditions will be evaluated in the order you wrote them. LINQ to Objects can invoke MyFilter on different input elements, but it cannot change what MyFilter does.

In LINQ to Entities and LINQ to SQL, the Where operator accepts the filter as Expression<Func<TSource, bool>>. Now, the filter is passed into the Where operator as a data structure that describes the expression. In that case, the LINQ provider will look at the data structure (the "expression tree") and it is up to the LINQ provider to decide how to interpret it.

In LINQ to Entities and LINQ to SQL cases, the expression tree will be translated to SQL. And then it is up to the database server to decide how to execute the query. The server is definitely allowed to reorder the conditions, and it may do even more substantial optimizations. For example, if the SQL table contains an index on one of the columns referenced in the condition, the server can choose to use the index and avoid even looking at rows that don't match that particular condition part.

Should the order of LINQ query clauses affect Entity Framework performance?

The core of the question is not "why does the order matter with LINQ?". LINQ just translates literally without reordering. The real question is "why do the two SQL queries have different performance?".

I was able to reproduce the problem by only inserting 100k rows. In that case a weakness in the optimizer is being triggered: it does not recognize that it can do a seek on Colour due to the complex condition. In the first query the optimizer does recognize the pattern and creates an index seek.

There is no semantic reason why this should be. A seek on an index is possible even when seeking on NULL. This is a weakness/bug in the optimizer. Here are the two plans:

Sample Image

EF tries to be helpful here because it assumes that both the column and the filter variable can be null. In that case it tries to give you a match (which according to C# semantics is the right thing).

I tried undoing that by adding the following filter:

Colour IS NOT NULL AND @p__linq__0 IS NOT NULL
AND Size IS NOT NULL AND @p__linq__1 IS NOT NULL

Hoping that the optimizer now uses that knowledge to simplify the complex EF filter expression. It did not manage to do so. If this had worked the same filter could have been added to the EF query providing an easy fix.

Here are the fixes the I recommend in the order that you should try them:

  1. Make the database columns not-null in the database
  2. Make the columns not-null in the EF data model hoping that this will prevent EF from creating the complex filter condition
  3. Create indexes: Colour, Size and/or Size, Colour. They also remove them problem.
  4. Ensure that the filtering is done in the right order and leave a code comment
  5. Try to use INTERSECT/Queryable.Intersect to combine the filters. This often results in different plan shapes.
  6. Create an inline table-valued function that does the filtering. EF can use such a function as part of a bigger query
  7. Drop down to raw SQL
  8. Use a plan guide to change the plan

All of these are workarounds, not root cause fixes.

In the end I am not happy with both SQL Server and EF here. Both products should be fixed. Alas, they likely won't be and you can't wait for that either.

Here are the index scripts:

CREATE NONCLUSTERED INDEX IX_Widget_Colour_Size ON dbo.Widget
(
Colour, Size
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX IX_Widget_Size_Colour ON dbo.Widget
(
Size, Colour
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Does the order of EF linq query clauses influence performance?

No, changing of these two where clauses will not affect performance.
Generated SQL will look like this anyway:

WHERE [condition1] AND [condition2]

Besides, you can write conditions, combined with logical operators:

where images.Type.Equals("jpg") && images.ImageSource.Id == 5


Related Topics



Leave a reply



Submit