Performance Tuning SQL - How

Performance Tuning SQL - How?

I really like the book "Professional SQL Server 2005 Performance Tuning" to answer this. It's Wiley/Wrox, and no, I'm not an author, heh. But it explains a lot of the things you ask for here, plus hardware issues.

But yes, this question is way, way beyond the scope of something that can be answered in a comment box like this one.

Performance tuning of a query processing millions of rows

SELECT
a.customernumber,
a.car_month,
b.car_month AS match_month_6,
CASE
WHEN b.customernumber IS NULL
THEN 0
END 1
END AS fl_match_6
FROM WB_YH_BCUPDATE_MATCH_MONTH a
LEFT JOIN WB_YH_BCUPDATE_MATCH_MONTH b
ON (a.customernumber = b.Customernumber AND a.match_month_6 = b.car_month);

Since you say that WB_YH_BCUPDATE_MATCH_MONTH contains the same data as WB_YH_BCUPDATE_FULL_BASE, but with one extra column, we can use the former and ignore the latter.

We now left join it with itself. Of course on the customer number, but also, we join the date + 6 months on the date. If the customer was active 6 months later, we will find an entry; if not, we won't.

To completely duplicate the results of your query, we select get our data for match_month_6 from the left joined table, since it was NULL if we couldn't get a match in your original query as well.

You should put indexes on both month fields as well, since we join on those too.


Note that this doesn't guarantee that the customer was active in the months in between. I a customer was active in January and in July, they will be returned by this query.

Performance tuning of Oracle SQL query

DISTINCT is very often an indicator for a badly written query. A normalized database doesn't contain duplicate data, so where do the duplicates suddenly come from that you must remove with DISTINCT? Very often it is your own query producing these. Avoid producing duplicates in the first place, so you don't need DISTINCT later.

In your case you are joining with the table notification in your subquery a, but you are not using its rows in that subquery; you only select from notification_master_id.

After all, you want to get notification masters, get their latest related notification (by getting its ID first and then select the row). You don't need hundreds of subqueries to achieve this.

Some side notes:

  • To get the description from template_classification you are joining again with the notification table, which is not necessary.
  • ORDER BY in a subquery (ORDER BY nm.id DESC) is superfluous, because subquery results are per standard SQL unsorted. (Oracle violates this standard sometimes in order to apply ROWNUM on the result, but you are not using ROWNUM in your query.)
  • It's a pity that you store created_at not as a DATE or TIMESTAMP, but as a number. This forces you to calculate. I don't think this has a great impact on your query, though, because you are using it in an OR condition.
  • CURRENT_DATE gets you the client date. This is rarely wanted, as you select data from the database, which should of course not relate to some client's date, but to its own date SYSDATE.

If I am not mistaken, your query can be shortened to:

SELECT
nm.id AS masterid,
nf.id AS notification_id,
nfagg.notification_list AS notification_list,
nm.notification_type_id AS typeid,
nf.subject AS subject,
nf.approver AS approver,
nf.created_at AS created_at,
nf.created_by AS created_by,
nf.sequence_no AS sequence_no,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nf.updated_by AS updated_by,
nf.updated_at AS updated_at,
(
SELECT LISTAGG(p.name, ',') WITHIN GROUP (ORDER BY p.id)
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE np.notification_id = nf.id
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = nm.notification_status_id
) AS notification_status,
(
SELECT name
FROM template
WHERE id = nm.template_id
) AS template,
(
SELECT description
FROM notification_type
WHERE id = nm.notification_type_id
) AS notification_type,
(
SELECT description
FROM template_classification
WHERE id = nf.classification_id
) AS classification
FROM notification_master nm
INNER JOIN
(
SELECT
notification_master_id,
MAX(id) AS maxid,
LISTAGG(id,',') WITHIN GROUP (ORDER BY id) AS notification_list
FROM notification
GROUP BY notification_master_id
) nfagg ON nfagg.notification_master_id = nm.id
INNER JOIN notification nf
ON nf.id = nfagg.maxid
AND
(
(
DATE '1970-01-01' + NUMTODSINTERVAL(nf.created_at / 1000, 'SECOND')
< CURRENT_DATE + INTERVAL '-21' DAY
)
OR (nm.notification_type_id IN (2,4) AND nm.notification_status_id = 4)
)
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;

As mentioned, you may want to replace CURRENT_DATE with SYSDATE.

I recommend the following indexes for the query:

CREATE INDEX idx1 ON notification_master (disable, id, notification_status_id, notification_type_id);
CREATE INDEX idx2 ON notification (notification_master_id, id, created_at);

A last remark on paging: In order to skip n rows to get the next n, the whole query must get executed for all data and then all result rows be sorted only to pick n of them at last. It is usually better to remember the last fetched ID and then only select rows with a higher ID in the next execution.

SQL WHERE IN () Performance Optimization

You can get everything you need in a single query:

SELECT  TOP (5) a.ID
FROM article AS a
WHERE a.publish_flag = 1
AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())
AND a.Id <> @ID
AND EXISTS
( SELECT 1
FROM article_tags AS at
WHERE at.ArticleID = a.ID
AND EXISTS
( SELECT 1
FROM article_tags AS at2
WHERE at2.ArticleID = @ID
AND at2.TagID = at.TagID
)
)
ORDER BY a.publish_date DESC;

I have assumed that you were originally using TOP 4 for tags as an arbitrary limit for performance reasons, as there was no sort. So have ommitted this. I have also changed your predicate from:

SYSDATETIME() > DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), a.publish_date)

to

a.publish_date <  DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())

The meaning is the same, however by calling the DATEADD/DATEDIFF functions on the run time constants SYSDATETIME() and UTCDATETIME() it means this calculation is only done once, rather than once for every a.publish_date meaning any index on publish_date is now usable.

The other change I have made is to use EXISTS rather than JOIN to link articles to tags. This will avoid duplicates, however it would be equally trivial to remove duplicates using GROUP BY e.g.

SELECT  TOP (5) a.ID
FROM article AS a
INNER JOIN article_tags AS at
ON at.ArticleID = a.ID
WHERE a.publish_flag = 1
AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())
AND a.Id <> @ID
AND EXISTS
( SELECT 1
FROM article_tags AS at2
WHERE at2.ArticleID = @ID
AND at2.TagID = at.TagID
)
GROUP BY a.ID, a.publish_date
ORDER BY a.publish_date DESC;

A few side notes as well that don't directly relate to the above answer, but are still worth mentioning.

  1. The Implicit join syntax you are using was replaced 28 years ago by ANSI 92 explicit join syntax. There are plenty of good reasons to switch to the "new" syntax, so I would advise you do.
  2. Parameterised queries are about more than just SQL Injection attacks (including but not limited to type safety and query plan caching), so just because your input isn't coming from a user doesn't mean you shouldn't use parametrized queries.
  3. I would strongly advise against re-using your SqlClient objects (SqlConnection, SqlCommand), create a new object for each use, and dispose of it correctly when done.


Related Topics



Leave a reply



Submit