Limit on the Where Col in (...) Condition

Limit on the WHERE col IN (...) condition

Depending on the database engine you are using, there can be limits on the length of an instruction.

SQL Server has a very large limit:

http://msdn.microsoft.com/en-us/library/ms143432.aspx

ORACLE has a very easy to reach limit on the other side.

So, for large IN clauses, it's better to create a temp table, insert the values and do a JOIN. It works faster also.

IN clause limitation in Sql Server

Yes, there is a limit, but Microsoft only specifies that it lies "in the thousands":

Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.

Looking at those errors in details, we see that this limit is not specific to IN but applies to query complexity in general:

Error 8623:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Error 8632:

Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.

LIMIT SQL within WHERE clause?

No, it's not possible to add a LIMIT clause within a WHERE clause.

It is possible to achieve the resultset you want, but the SQL to do that isn't pretty. It's going to require either a JOIN, a correlated subquery, or an inline view.


If there's an "order" to the rows in _mct_dot, you could use a correlated subquery to check for the number of rows "before" the row you pulled, and take rows only that have fewer than four rows.

SELECT d.*
FROM _mct_dot d
JOIN ( SELECT n.number
, q.qty
FROM (SELECT 4 AS `number` UNION ALL SELECT 7 UNION ALL SELECT 13) n
CROSS
JOIN (SELECT 3 AS `qty` UNION ALL SELECT 5 UNION ALL SELECT 7) q
) p
ON p.number = d.number
AND p.qty = d.qty
AND 5 > ( SELECT SUM(1)
FROM _mct_dot c
WHERE c.number = d.number
AND c.qty = d.qty
AND c.a_ID < d.a_ID
)
ORDER BY ...

The correlated subquery could wind up being executed a LOT of times, so for best performance, you are going to want an index with leading columns of number and qty and including the a_id column.

Either:

... ON `_mct_dot` (`number`, `qty`, `a_ID`)

or

... ON `_mct_dot` (`qty`, `number`, `a_ID`)

Another option is to use MySQL user variables, to emulate a row_number() analytic function, something like this:

SELECT t.*
FROM ( SELECT d.a_ID
, IF(d.number = @prev_number AND d.qty = @prev_qty
, @rn := @rn + 1
, @rn := 1
) AS rn
, @prev_number := d.number
, @prev_qty := d.qty
FROM (SELECT @prev_number := NULL, @prev_qty := NULL, @rn := 0 ) i
CROSS
JOIN ( SELECT n.number
, q.qty
FROM (SELECT 4 AS `number` UNION ALL SELECT 7 UNION ALL SELECT 13) n
CROSS
JOIN (SELECT 3 AS `qty` UNION ALL SELECT 5 UNION ALL SELECT 7) q
) p
JOIN _mct_dot d
ON d.number = p.number
AND d.qty = p.qty
ORDER BY d.number, d.qty
) s
JOIN _mct_dot t
ON t.a_ID = s.a_ID
WHERE s.rn <= 5
ORDER BY t.number ASC, t.qty ASC

(These queries are desk checked only; haven't setup a SQL Fiddle demo.)

FOLLOWUP

For the first query, I've just used an inline view (aliased as "p"), that generates the set of all pairs of number and qty values that are being requested.

And we can use a JOIN operation to locate all the rows that match each pair from _mct_dot table.

The tricky part is the correlated subquery. (There's a couple of approaches we could use.) The approach in the query above is to get a "count" of the rows with a matching "number" and "qty", but with an id value less than the id value of the current row, basically finding out how many rows are "before" the current row. And we're comparing that to a literal 5, because we want to return only the first 5 rows in each group.


For the second query,

the inline view aliased as i is initializing some MySQL user variables. We don't really care what's returned by the query, except that it returns exactly one row (because we're referencing it in a JOIN operation)... what we're really interested in is getting the variables initialized at the start of the execution. And that happens because MySQL materializes the inline view (derived table), before the outer query that references the view is executed.

The inline view aliased as p gets us the pairs of number,qty that we want to retrieve, and we use a JOIN operation against _mct_dot to get the matching rows.

The "trick" in the inline view aliased as s is the use of the MySQL user variables. We're doing a check of the current values against the values from the previous row... if the number and qty match, then we're in the same "group", so we can increment the row number counter by 1. If either of the values change, then it's a new group, so we reset the row number counter to 1, since the current row is the "first" row in the new group.

We can run the query for the inline view s, and see that we're getting row numbers (rn col) 1, 2, 3, etc. for each group.

Then the outermost query just filters out all the rows that have an rn row number greater than five. Actually, from s, we're returning just the unique identifier for the row; that outermost query is also doing a JOIN operation to retrieve the entire row, based on the unique id.


As I mentioned at the top of my answer, the SQL to do this is not pretty. (It does take a bit of work to unwind what those queries are doing.)

Query to select limit in specific condition

A simple union all should be it. However, to make sure that you're getting exactly 1000 rows (in case there are more than 1000 rows but less than 100 are @gmail) you can do this:

with u as 
(SELECT email from my_table where email like '%@gmail.%' limit 100)
select * from u
union all
(SELECT email from my_table
where email not like '%@gmail.%'
limit 1000 - (select count(*) from u));

MySQL IN condition limit

No there isn't, check the manual about the IN function:

The number of values in the IN list is only limited by the max_allowed_packet value.

Is there a limit on the number of WHERE conditions in a SELECT statement?

Consider using an IN clause for a query like that - it's more compact and signals your intent better.

SELECT * FROM table WHERE column NOT IN('asd', 'bsd', 'csd', ...);

Another alternative would be to create a table to do a left join against to filter out the rows you don't want.

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.

As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.

As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.

SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1

The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.

The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

Conditionally LIMIT in BigQuery

The LIMIT clause works differently within BigQuery. It specifies the maximum number of depression inputs in the result. The LIMIT n must be a constant INT64.

Using the LIMIT clause, you can overcome the limitation on cache result size:

  • Using filters to limit the result set.
  • Using a LIMIT clause to reduce the result set, especially if you are
    using an ORDER BY clause.

You can see this example:

SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC
LIMIT
100

This will only return 100 rows.

The best practice is to use it if you are sorting a very large number of values. You can see this document with examples.

If you want to return all rows from a table, you need to omit the LIMIT clause.

SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC

This example will return all the rows from a table. It is not recommended to omit LIMIT if your tables are too large, as it will consume a lot of resources.

One solution to optimize resources is to use cluster tables. This will save costs and querying times. You can see this document with a detailed explanation of how it works.

Limit to number of Items in list for WHERE clause SQL query

Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.

Error 8623:

The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.

Error 8632:

Internal error: An expression services limit has been reached. Please
look for potentially complex expressions in your query, and try to
simplify them.

microsoft docs



Related Topics



Leave a reply



Submit