Why Does the Following Join Increase the Query Time Significantly

Why does the following join increase the query time significantly?

Rewritten with (recommended) explicit ANSI JOIN syntax:

SELECT COUNT(impression_id), imp.os_id, os.os_desc 
FROM bi.impressions imp
JOIN bi.os_desc os ON os.os_id = imp.os_id
GROUP BY imp.os_id, os.os_desc;

First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.

This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:

SELECT COUNT(*) AS ct, imp.os_id, os.os_desc 
FROM bi.impressions imp
JOIN bi.os_desc os USING (os_id)
GROUP BY imp.os_id, os.os_desc;

count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.

Faster, yet:

SELECT os_id, os.os_desc, sub.ct
FROM (
SELECT os_id, COUNT(*) AS ct
FROM bi.impressions
GROUP BY 1
) sub
JOIN bi.os_desc os USING (os_id)

Aggregate first, join later. More here:

  • Aggregate a single column in query with many columns
  • PostgreSQL - order by an array

Why LEFT JOIN increase query time so much?

The 'small' left join is actually doing a lot of extra work for you. SQL Server has to go back to TABLE_Additional for each row from your inner join between and TABLE_Accounts_History and TABLE_For_Filtering. You can help SQL Server a few ways to speed this up by trying some indexing. You could:

1) Ensure TABLE_Accounts_History has an index on the Foreign Key H.[ACCOUNTSYS]

2) If you think that TABLE_Additional will always be accessed by the AccountSys, i.e. you will be requesting AccountSys in ordered groups, you could create a Clustered Index on TABLE_Additional.AccountSys. (in orther words physically order the table on disk in order of AccountSys)

3) You could also ensure there is a foreign key index on TABLE_Accounts_History.

Why does SELECT * INTO x FROM a JOIN b take significantly greater time than total time of SELECT COUNT(*) FROM a JOIN b & SELECT * INTO y FROM x?

It is I/O operations. The JOIN has to process all the data rather than just the row counts. You are not taking this processing time into account.

Given the work that JOIN has to do, an additional read/write of the data seems about right.



Related Topics



Leave a reply



Submit