Entity Framework Hitting 2100 Parameter Limit

The 2100 parameter limit problem does not exist in EF.

I've run a test on the AdventureWorks database (in SQL Express 2008 R2): I'm trying to get all products where ProductCategoryId is in the range of values (1, 2, 3).

Using LINQ, the generated SQL WHERE clause looks like this:

WHERE [t0].[ProductCategoryID] IN (@p0, @p1, @p2)
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = -1; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = -1; Prec = 0; Scale = 0) [3]

(which leads to the max parameter number issue), whereas with EF 4.0 it looks like this:

WHERE [Extent1].[ProductCategoryID] IN (1,2,3)

Next, I've tested this with EF for a list of 3000 values:

var categoryList = Enumerable.Range(1, 3000).Select(i => (int?)i).ToArray();

using (var aw = new AdventureWorksEntities())
{
    var products = aw.Products
        .Where(p => categoryList.Contains(p.ProductCategoryID))
        .ToList();
}

While this is extremely inefficient, it works and yields the expected result.

However, it is also possible to use the InRange extension provided by Marc Gravell with EF, by also using the LINQKit library, like so:

using (var aw = new AdventureWorksEntities())
{
    var products = aw.Products
        .AsExpandable()
        .InRange(p => p.ProductCategoryID, 1000, categoryList)
        .ToList();
}

(the AsExpandable extension is defined in LINQKit)

This produces the expected result (executes the query in chunks) and, depending on the number of items in the list and the size of the chunk can be a lot more efficient than the non-chunked solution.

Avoiding the 2100 parameter limit in LINQ to SQL

as LINQ-to-SQL can call stored procs, you could

have a stored proc that takes an array as a input then puts the values in a temp table to join on
likewise by taking a string that the stored proc splits

Or upload all the values to a temp table yourself and join on that table.

However maybe you should rethink the problem:

Sql server can be configured to allow query against tables in other databases (including oracle), if you are allowed this may be an option for you.
Could you use some replication system to keep a table of trip numbers updated in DB2?

Working around the LINQ to SQL 2100 max parameter constraint in Where Contains

The unsupported construct is the memberSelector(x) invocation inside the LINQ query expression tree.

To make LINQ query translatable (and this applies basically to any IQueryable provider), you need to change the parameter type

Func<TSource, TUniverse> memberSelector

Expression<Func<TSource, TUniverse>> memberSelector

and then build the

x => universeSlice.Contains(memberSelector(x))

dynamically using the System.Linq.Expressions.Expression class methods:

public static List<TSource> WhereMemberInUniverse<TSource, TUniverse>(this IQueryable<TSource> source, Expression<Func<TSource, TUniverse>> memberSelector, IEnumerable<TUniverse> universe)
{
    var distinctUniverse = universe.Distinct().ToList();
    int batchSize = 2000;

    var result = new List<TSource>();

    for (int i = 0; i < distinctUniverse.Count; i += batchSize)
    {
        var universeSlice = distinctUniverse.Skip(i).Take(batchSize);
        //x => universeSlice.Contains(memberSelector(x))
        var predicate = Expression.Lambda<Func<TSource, bool>>(
            Expression.Call(
                typeof(Enumerable), "Contains", new Type[] { typeof(TUniverse) },
                Expression.Constant(universeSlice), memberSelector.Body
            ),
            memberSelector.Parameters
        );
        var partialRes = source.Where(predicate);
        result.AddRange(partialRes);
    }

    return result;
}

The incoming request has too many parameters. The server supports a maximum of 2100 parameters

Simple - as long as TheTAbleID's contains less than 2100 ID's then - it is not legal to do that.

Cut the table into blocks of 2000 and then query each block separately, possibly in multiple threads.

Entity Framework SELECT IN not using parameters

I can't say why EF (Core) designers decided to use constants instead of variables when translating Enumerable.Contains. As @Gert Arnold pointed out in the comments, could be related to SQL query parameter count limit.

Interestingly, both EF (6.2) and EF Core (2.1.2) generate IN with parameters when you use the equivalent || expression like:

var values = new int[] { 1, 2, 3 };
var value0 = values[0];
var value1 = values[1];
var value2 = values[2]; 
var query = context.Things.Where(x =>
    x.Id == value0 ||
    x.Id == value1 ||
    x.Id == value2);

EF6.2 generated query is

SELECT
    [Extent1].[Id] AS [Id]
    FROM [dbo].[Things] AS [Extent1]
    WHERE [Extent1].[Id] IN (@p__linq__0,@p__linq__1,@p__linq__2)

EF Core 2.1 does something similar.

So the solution is to convert the Contains expression to || based expression. It has to be dynamically using Expression class methods. And to make it easier to use, could be encapsulated in a custom extension method, which internally user ExpressionVisitor to perform the conversion.

Something like this:

public static partial class EfQueryableExtensions
{
    public static IQueryable<T> Parameterize<T>(this IQueryable<T> source)
    {
        var expression = new ContainsConverter().Visit(source.Expression);
        if (expression == source) return source;
        return source.Provider.CreateQuery<T>(expression);
    }

    class ContainsConverter : ExpressionVisitor
    {
        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            if (node.Method.DeclaringType == typeof(Enumerable) &&
                node.Method.Name == nameof(Enumerable.Contains) &&
                node.Arguments.Count == 2 &&
                CanEvaluate(node.Arguments[0]))
            {
                var values = Expression.Lambda<Func<IEnumerable>>(node.Arguments[0]).Compile().Invoke();
                var left = Visit(node.Arguments[1]);
                Expression result = null;
                foreach (var value in values)
                {
                    // var variable = new Tuple<TValue>(value);
                    var variable = Activator.CreateInstance(typeof(Tuple<>).MakeGenericType(left.Type), value);
                    // var right = variable.Item1;
                    var right = Expression.Property(Expression.Constant(variable), nameof(Tuple<int>.Item1));
                    var match = Expression.Equal(left, right);
                    result = result != null ? Expression.OrElse(result, match) : match;
                }
                return result ?? Expression.Constant(false);
            }
            return base.VisitMethodCall(node);
        }

        static bool CanEvaluate(Expression e)
        {
            if (e == null) return true;
            if (e.NodeType == ExpressionType.Convert)
                return CanEvaluate(((UnaryExpression)e).Operand);
            if (e.NodeType == ExpressionType.MemberAccess)
                return CanEvaluate(((MemberExpression)e).Expression);
            return e.NodeType == ExpressionType.Constant;
        }
    }
}

Applying it to the sample query

var values = new int[] { 1, 2, 3 };
var query = context.Things
    .Where(x => values.Contains(x.Id))
    .Parameterize();

produces the desired translation.

Why is .Contains slow? Most efficient way to get multiple entities by primary key?

UPDATE: With the addition of InExpression in EF6, the performance of processing Enumerable.Contains improved dramatically. The analysis in this answer is great but largely obsolete since 2013.

Using Contains in Entity Framework is actually very slow. It's true that it translates into an IN clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR concatenations because there is no native expression which represents an IN. When the SQL is created this expression of many ORs is recognized and collapsed back into the SQL IN clause.

This does not mean that using Contains is worse than issuing one query per element in your ids collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.

It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains expression for each roundtrip.

This approach and also the limitations of using Contains with Entity Framework is shown and explained here:

Why does the Contains() operator degrade Entity Framework's performance so dramatically?

It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString) or dbContext.Images.SqlQuery(sqlString) where sqlString is the SQL shown in @Rune's answer.

Edit

Here are some measurements:

I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:

using (var context = new MyDbContext())
{
    Random rand = new Random();
    var ids = new List<int>();
    for (int i = 0; i < 20000; i++)
        ids.Add(rand.Next(550000));

    Stopwatch watch = new Stopwatch();
    watch.Start();

    // here are the code snippets from below

    watch.Stop();
    var msec = watch.ElapsedMilliseconds;
}

Test 1

var result = context.Set<MyEntity>()
    .Where(e => ids.Contains(e.ID))
    .ToList();

Result -> msec = 85.5 sec

Test 2

var result = context.Set<MyEntity>().AsNoTracking()
    .Where(e => ids.Contains(e.ID))
    .ToList();

Result -> msec = 84.5 sec

This tiny effect of AsNoTracking is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).

For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.

Test 3

var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
    values.AppendFormat(", {0}", ids[i]);

var sql = string.Format(
    "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
    values);

var result = context.Set<MyEntity>().SqlQuery(sql).ToList();

Result -> msec = 5.1 sec

Test 4

// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();

Result -> msec = 3.8 sec

This time the effect of disabling tracking is more noticable.

Test 5

// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();

Result -> msec = 3.7 sec

My understanding is that context.Database.SqlQuery<MyEntity>(sql) is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking(), so there is no difference expected between Test 4 and Test 5.

(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)

Edit 2

Test 6

Even 20000 roundtrips to the database are faster than using Contains:

var result = new List<MyEntity>();
foreach (var id in ids)
    result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));

Result -> msec = 73.6 sec

Note that I have used SingleOrDefault instead of Find. Using the same code with Find is very slow (I cancelled the test after several minutes) because Find calls DetectChanges internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false) leads to roughly the same performance as SingleOrDefault. Using AsNoTracking reduces the time by one or two seconds.

Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.

Entity Framework Hitting 2100 Parameter Limit