What Guarantees Are There on the Run-Time Complexity (Big-O) of Linq Methods

What guarantees are there on the run-time complexity (Big-O) of LINQ methods?

There are very, very few guarantees, but there are a few optimizations:

  • Extension methods that use indexed access, such as ElementAt, Skip, Last or LastOrDefault, will check to see whether or not the underlying type implements IList<T>, so that you get O(1) access instead of O(N).

  • The Count method checks for an ICollection implementation, so that this operation is O(1) instead of O(N).

  • Distinct, GroupBy Join, and I believe also the set-aggregation methods (Union, Intersect and Except) use hashing, so they should be close to O(N) instead of O(N²).

  • Contains checks for an ICollection implementation, so it may be O(1) if the underlying collection is also O(1), such as a HashSet<T>, but this is depends on the actual data structure and is not guaranteed. Hash sets override the Contains method, that's why they are O(1).

  • OrderBy methods use a stable quicksort, so they're O(N log N) average case.

I think that covers most if not all of the built-in extension methods. There really are very few performance guarantees; Linq itself will try to take advantage of efficient data structures but it isn't a free pass to write potentially inefficient code.

Improve the time complexity of current Linq queries

You can use an inner join to do this:

var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;

int count = matches.Count(); // Here's the answer.

Note that (as Ivan Stoev points out) this only works if RakeCode is the primary key of RakeSnapshots.

If it is not, you will have to use a grouped join.

Here's the Linq query syntax version that you should use in that case, but note that this is exactly the same as Ivan's answer (only in Linq query form):

var matches =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;

For completeness, here's a compilable console app that demonstrates the different results you'll get if RakeCode and ProductCode are not primary keys:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp1
{
class RakeSnapshot
{
public string StatusCode;
public string RakeCode;
}

class ProductMovement
{
public string Status;
public string ProductCode;
}

sealed class Program
{
void run()
{
var rakeSnapshots = new List<RakeSnapshot>
{
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
};

var productMovements = new List<ProductMovement>
{
new ProductMovement {Status = "InProgress", ProductCode = "1"},
new ProductMovement {Status = "InProgress", ProductCode = "2"},
new ProductMovement {Status = "InProgress", ProductCode = "2"}
};

var dumpingRakeSnapshots = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

// Inner join.

var matches1 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
select r;

Console.WriteLine(matches1.Count());

// Grouped join.

var matches2 =
from r in dumpingRakeSnapshots
join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
select gj;

Console.WriteLine(matches2.Count());

// OP's code.

var resultCount =
productMovements
.Count(x => rakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));

Console.WriteLine(resultCount);
}

static void Main(string[] args)
{
new Program().run();
}
}
}

What is the time complexity of Linq OrderBy().ThenBy() method sequence?

The LINQ OrderBy(...).ThenBy(...)...ThenBy(...) method chain form a single sort operation by multiple sort keys (using multi key comparer).

Thus, regardless of how many ThenBy(Descending) methods do you include in the chain, the time complexity of the sort operation stays the typical for QuickSort O(N*logN) average / O(N2) worst case. Of course more keys you add, comparing two objects will take more time, but the complexity of the sort operation would not change.

The Big O of Distinct() method with a Custom IEqualityComparer

There's an equal question here on SO about "What guarantees are there on the run-time complexity (Big-O) of LINQ methods?"

See this section in the answer about distinct:

Distinct, GroupBy Join, and I believe
also the set-aggregation methods
(Union, Intersect and Except) use
hashing, so they should be close to
O(N) instead of O(N²).

What exactly does big Ө notation represent?

It means that the algorithm is both big-O and big-Omega in the given function.

For example, if it is Ө(n), then there is some constant k, such that your function (run-time, whatever), is larger than n*k for sufficiently large n, and some other constant K such that your function is smaller than n*K for sufficiently large n.

In other words, for sufficiently large n, it is sandwiched between two linear functions :

For k < K and n sufficiently large, n*k < f(n) < n*K



Related Topics



Leave a reply



Submit