Sql - Filtering Large Tables with Joins - Best Practices

SQL - Filtering large tables with joins - best practices

Because you are using INNER JOINs the WHERE or JOIN debate only depends on your taste and style. Personally, I like to keep the links between the two tables (e.g. foreign key constraint) in the ON clause, and actual filters against data in the WHERE clause.

SQL Server will parse the query into the same token tree, and will therefore build identical query execution plans.

If you were using [LEFT/RIGHT] OUTER JOINS instead, it makes a world of difference since not only is the performance probably different, but also very likely the results.


To answer your other questions:

When is it best to filter my data?

  1. In the where clause of the SQL.
  2. Create a temp table with the specific data and only then join it.
  3. Add the predicate to the first inner join ON clause.
  4. Some other idea.

In the WHERE or ON clause, both are seen as the same. For 3, the "first inner join" has no relevance. In a multi-table INNER JOIN scenario, it really doesn't matter which goes first (in the query), as the query optimizer will shuffle the order as it sees fit.

Using a temp table is completely unnecessary and won't help, because you are having to extract the relevant portion anyway - which is what a JOIN would do as well. Moreover, if you had a good index on the JOIN conditions/WHERE filter, the index will be used to only visit the relevant data without looking at the rest of the table(s).

Querying large table with filter vs small table in database - any performance gain?

For almost any database, using a sample table will be noticeably faster. This is because reading the records will require loading fewer data pages.

In addition, if the base table is being updated, then a "snapshot" is isolated from page, table, and row locks that occur on the main table. This is good from a performance perspective, but it means that the versions can get out-of-synch, which may be bad.

And, from a querying perspective, the statistics on the sample would be more accurate. This helps the optimizer choose the best query plans.

I can think of two cases where performance might not improve significantly. The first is if your database supports clustered indexes and the rows that you want are defined by a range of index keys (or a single key). These will be "adjacent", so the clustered index would scan about the same number of pages. There is a slight overhead for the actual index structure.

Similarly, if your records were so large that there was one record per data page, then the advantage of a second table would be less. It would eliminate the index access overhead, but not reduce the number of reads.

None of these considerations say whether or not you should use a separate table. You should test in your environment. The overhead of managing a separate table (and there is a cost to creating and deleting it both in terms of performance and application complexity) may outweigh small performance gains.

Which SQL query is faster? Filter on Join criteria or Where clause?

Performance-wise, they are the same (and produce the same plans)

Logically, you should make the operation that still has sense if you replace INNER JOIN with a LEFT JOIN.

In your very case this will look like this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
AND a.ID = 1
LEFT JOIN
TableB b
ON x.TableBID = b.ID

or this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
LEFT JOIN
TableB b
ON b.id = x.TableBID
WHERE a.id = 1

The former query will not return any actual matches for a.id other than 1, so the latter syntax (with WHERE) is logically more consistent.

SQL Filter criteria in join criteria or where clause which is more efficient

I wouldn't use performance as the deciding factor here - and quite honestly, I don't think there's any measurable performance difference between those two cases, really.

I would always use case #2 - why? Because in my opinion, you should only put the actual criteria that establish the JOIN between the two tables into the JOIN clause - everything else belongs in the WHERE clause.

Just a matter of keeping things clean and put things where they belong, IMO.

Obviously, there are cases with LEFT OUTER JOINs where the placement of the criteria does make a difference in terms of what results get returned - those cases would be excluded from my recommendation, of course.

Marc

Which is better - multiple joins to the same table or filtering in a case?

The two are not exactly equivalent.

Presumably, you have more than one row in ufd for each master key. So, the equivalent for the first query would be:

select mt.keyid,
max(case when ufd.table_code = 'case' then ufd.user_field_data_01 end),
max(case when ufd.table_code = 'appe' then ufd.user_field_data_01 end)
from MainTable mt left join
ufd
on ufd.keyid = mt.keyid
where ufd.table_code in ('case', 'appe')
group by mt.keyid;

(Well, I added the key.)

I assume that this is the actual result set that you want.

Which is better? I typically go the conditional aggregation route, because it is easier to extend for more columns and more complex logic. Additional columns add very little additional overhead.

From a performance perspective, you need to test. But with the right indexes, I wouldn't be surprised if the second method were better -- for two columns. With additional columns or more complex logic, the conditional aggregation method is more consistent performance-wise.



Related Topics



Leave a reply



Submit