Entity Framework Hitting 2100 Parameter Limit
The 2100 parameter limit problem does not exist in EF.
I've run a test on the AdventureWorks database (in SQL Express 2008 R2): I'm trying to get all products where ProductCategoryId
is in the range of values (1, 2, 3).
Using LINQ, the generated SQL WHERE
clause looks like this:
WHERE [t0].[ProductCategoryID] IN (@p0, @p1, @p2)
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = -1; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = -1; Prec = 0; Scale = 0) [3]
(which leads to the max parameter number issue), whereas with EF 4.0 it looks like this:
WHERE [Extent1].[ProductCategoryID] IN (1,2,3)
Next, I've tested this with EF for a list of 3000 values:
var categoryList = Enumerable.Range(1, 3000).Select(i => (int?)i).ToArray();
using (var aw = new AdventureWorksEntities())
{
var products = aw.Products
.Where(p => categoryList.Contains(p.ProductCategoryID))
.ToList();
}
While this is extremely inefficient, it works and yields the expected result.
However, it is also possible to use the InRange
extension provided by Marc Gravell with EF, by also using the LINQKit library, like so:
using (var aw = new AdventureWorksEntities())
{
var products = aw.Products
.AsExpandable()
.InRange(p => p.ProductCategoryID, 1000, categoryList)
.ToList();
}
(the AsExpandable
extension is defined in LINQKit)
This produces the expected result (executes the query in chunks) and, depending on the number of items in the list and the size of the chunk can be a lot more efficient than the non-chunked solution.
Avoiding the 2100 parameter limit in LINQ to SQL
as LINQ-to-SQL can call stored procs, you could
- have a stored proc that takes an array as a input then puts the values in a temp table to join on
- likewise by taking a string that the stored proc splits
Or upload all the values to a temp table yourself and join on that table.
However maybe you should rethink the problem:
- Sql server can be configured to allow query against tables in other databases (including oracle), if you are allowed this may be an option for you.
- Could you use some replication system to keep a table of trip numbers updated in DB2?
Working around the LINQ to SQL 2100 max parameter constraint in Where Contains
The unsupported construct is the memberSelector(x)
invocation inside the LINQ query expression tree.
To make LINQ query translatable (and this applies basically to any IQueryable
provider), you need to change the parameter type
Func<TSource, TUniverse> memberSelector
to
Expression<Func<TSource, TUniverse>> memberSelector
and then build the
x => universeSlice.Contains(memberSelector(x))
dynamically using the System.Linq.Expressions.Expression
class methods:
public static List<TSource> WhereMemberInUniverse<TSource, TUniverse>(this IQueryable<TSource> source, Expression<Func<TSource, TUniverse>> memberSelector, IEnumerable<TUniverse> universe)
{
var distinctUniverse = universe.Distinct().ToList();
int batchSize = 2000;
var result = new List<TSource>();
for (int i = 0; i < distinctUniverse.Count; i += batchSize)
{
var universeSlice = distinctUniverse.Skip(i).Take(batchSize);
//x => universeSlice.Contains(memberSelector(x))
var predicate = Expression.Lambda<Func<TSource, bool>>(
Expression.Call(
typeof(Enumerable), "Contains", new Type[] { typeof(TUniverse) },
Expression.Constant(universeSlice), memberSelector.Body
),
memberSelector.Parameters
);
var partialRes = source.Where(predicate);
result.AddRange(partialRes);
}
return result;
}
The incoming request has too many parameters. The server supports a maximum of 2100 parameters
Simple - as long as TheTAbleID's contains less than 2100 ID's then - it is not legal to do that.
Cut the table into blocks of 2000 and then query each block separately, possibly in multiple threads.
Entity Framework SELECT IN not using parameters
I can't say why EF (Core) designers decided to use constants instead of variables when translating Enumerable.Contains
. As @Gert Arnold pointed out in the comments, could be related to SQL query parameter count limit.
Interestingly, both EF (6.2) and EF Core (2.1.2) generate IN
with parameters when you use the equivalent ||
expression like:
var values = new int[] { 1, 2, 3 };
var value0 = values[0];
var value1 = values[1];
var value2 = values[2];
var query = context.Things.Where(x =>
x.Id == value0 ||
x.Id == value1 ||
x.Id == value2);
EF6.2 generated query is
SELECT
[Extent1].[Id] AS [Id]
FROM [dbo].[Things] AS [Extent1]
WHERE [Extent1].[Id] IN (@p__linq__0,@p__linq__1,@p__linq__2)
EF Core 2.1 does something similar.
So the solution is to convert the Contains
expression to ||
based expression. It has to be dynamically using Expression
class methods. And to make it easier to use, could be encapsulated in a custom extension method, which internally user ExpressionVisitor
to perform the conversion.
Something like this:
public static partial class EfQueryableExtensions
{
public static IQueryable<T> Parameterize<T>(this IQueryable<T> source)
{
var expression = new ContainsConverter().Visit(source.Expression);
if (expression == source) return source;
return source.Provider.CreateQuery<T>(expression);
}
class ContainsConverter : ExpressionVisitor
{
protected override Expression VisitMethodCall(MethodCallExpression node)
{
if (node.Method.DeclaringType == typeof(Enumerable) &&
node.Method.Name == nameof(Enumerable.Contains) &&
node.Arguments.Count == 2 &&
CanEvaluate(node.Arguments[0]))
{
var values = Expression.Lambda<Func<IEnumerable>>(node.Arguments[0]).Compile().Invoke();
var left = Visit(node.Arguments[1]);
Expression result = null;
foreach (var value in values)
{
// var variable = new Tuple<TValue>(value);
var variable = Activator.CreateInstance(typeof(Tuple<>).MakeGenericType(left.Type), value);
// var right = variable.Item1;
var right = Expression.Property(Expression.Constant(variable), nameof(Tuple<int>.Item1));
var match = Expression.Equal(left, right);
result = result != null ? Expression.OrElse(result, match) : match;
}
return result ?? Expression.Constant(false);
}
return base.VisitMethodCall(node);
}
static bool CanEvaluate(Expression e)
{
if (e == null) return true;
if (e.NodeType == ExpressionType.Convert)
return CanEvaluate(((UnaryExpression)e).Operand);
if (e.NodeType == ExpressionType.MemberAccess)
return CanEvaluate(((MemberExpression)e).Expression);
return e.NodeType == ExpressionType.Constant;
}
}
}
Applying it to the sample query
var values = new int[] { 1, 2, 3 };
var query = context.Things
.Where(x => values.Contains(x.Id))
.Parameterize();
produces the desired translation.
Why is .Contains slow? Most efficient way to get multiple entities by primary key?
UPDATE: With the addition of InExpression in EF6, the performance of processing Enumerable.Contains improved dramatically. The analysis in this answer is great but largely obsolete since 2013.
Using Contains
in Entity Framework is actually very slow. It's true that it translates into an IN
clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR
concatenations because there is no native expression which represents an IN
. When the SQL is created this expression of many OR
s is recognized and collapsed back into the SQL IN
clause.
This does not mean that using Contains
is worse than issuing one query per element in your ids
collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains
query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.
It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains
expression for each roundtrip.
This approach and also the limitations of using Contains
with Entity Framework is shown and explained here:
Why does the Contains() operator degrade Entity Framework's performance so dramatically?
It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString)
or dbContext.Images.SqlQuery(sqlString)
where sqlString
is the SQL shown in @Rune's answer.
Edit
Here are some measurements:
I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:
using (var context = new MyDbContext())
{
Random rand = new Random();
var ids = new List<int>();
for (int i = 0; i < 20000; i++)
ids.Add(rand.Next(550000));
Stopwatch watch = new Stopwatch();
watch.Start();
// here are the code snippets from below
watch.Stop();
var msec = watch.ElapsedMilliseconds;
}
Test 1
var result = context.Set<MyEntity>()
.Where(e => ids.Contains(e.ID))
.ToList();
Result -> msec = 85.5 sec
Test 2
var result = context.Set<MyEntity>().AsNoTracking()
.Where(e => ids.Contains(e.ID))
.ToList();
Result -> msec = 84.5 sec
This tiny effect of AsNoTracking
is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).
For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.
Test 3
var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
values.AppendFormat(", {0}", ids[i]);
var sql = string.Format(
"SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
values);
var result = context.Set<MyEntity>().SqlQuery(sql).ToList();
Result -> msec = 5.1 sec
Test 4
// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();
Result -> msec = 3.8 sec
This time the effect of disabling tracking is more noticable.
Test 5
// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();
Result -> msec = 3.7 sec
My understanding is that context.Database.SqlQuery<MyEntity>(sql)
is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking()
, so there is no difference expected between Test 4 and Test 5.
(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)
Edit 2
Test 6
Even 20000 roundtrips to the database are faster than using Contains
:
var result = new List<MyEntity>();
foreach (var id in ids)
result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));
Result -> msec = 73.6 sec
Note that I have used SingleOrDefault
instead of Find
. Using the same code with Find
is very slow (I cancelled the test after several minutes) because Find
calls DetectChanges
internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false
) leads to roughly the same performance as SingleOrDefault
. Using AsNoTracking
reduces the time by one or two seconds.
Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.
Related Topics
Libraries for Ado.Net to Rapidly Bulk Insert Data into a Database from a .CSV File
Changing SQL Server Database Sorting
Ssis Connection Not Found in Package
Comma-Separated List as a Result of Select Statement in Oracle
Spring Boot Query Annotation with Nativequery Doesn't Work in Postgresql
How to Run Multiple SQL Queries
Ora-00907 Missing Right Parenthesis Issue - Select with Order by Inside Insert Query
How to Do Ms Access Database Paging + Search
SQL Server 2008 Using Sum() Over(Order By...)
How to Detect Query Which Holds the Lock in Postgres
Autoincrement Fields on Databases Without Autoincrement Field
Difference Between Varchar(500) VS Varchar(Max) in SQL Server
How to Get a Value from Previous Result Row of a Select Statement
Run Stored Procedure and Return Values from Vba
Insert Multiple Rows Using Subquery