How to Optimize This SQL Query (Using Indexes)

How can I optimize this SQL query (Using Indexes)?

Intro: There is a lot to talk about here, and because of the complexity of SQL, it's going to be impossible for anyone to help with your query fully – it matters what your Query is, how large the tables are, and what the database system being used is. If you don’t know what indexes are, or how to use them, see here: How does database indexing work?.

Precaution: Again, if you have a DBA for your system, check with them before indexing anything, especially on a live system. They can even help, if you're nice to them. If the system is used by many others, be careful before changing anything like indexes. If the data is being used for multiple query types, make sure you aren't creating tons of indexes on them that conflict or overlap.

Syntax. The standard (SQL92) uses: CREATE INDEX [index name] ON [table name] ( [column name] ). This syntax should work on almost any system. If you need only one index on the table, and there is not already a clustered index, you can use: CREATE [Unique] Clustered INDEX [index name] ON [table name] ( [column name] ) - it should be unique if there cannot be multiple items with the same values. If you can't get this to work, see this post for more details: How do I index a database column.

Which tables should be indexed? Any table that is being used for querying, especially if the data is static or only gets new values, is a great candidate. If the table is in your query, and has a join statement, you probably want to have an index on the column(s) being joined.

What columns should be indexed? There are full books written on choosing the best indexes, and how to properly index a database. A basic rule of thumb for indexing, if you don't want to dive deep into the problem, is: index by the following, in this order:

  1. Join predicates (on Table1.columnA=Table2.ColumnA and Table1.columnB=Table2.ColumnQ)
  2. Filtered columns (where Table1.columnN=’Bob’ and Table1.columnS<20)
  3. Order by / Group By / etc. (any column which is used for the order/grouping should be in the index, if possible.)

Also:

  • Use data types that make sense - store nothing as varchar if it's an integer or date. (Column width matters. Use the smallest data type you can, if possible.)
  • Make sure your joins are the same data type - int to int, varchar to varchar, and so on.
  • If possible, use unique, non-null indexes on each join predicate in each tables.
  • Make sure whatever columns possible are non-null. (If they cannot contain null values, you can use the following syntax.

     Alter table Table1 
    alter column columnN int not null

Do all of this, and you'll be well on your way. But if you need this stuff regularly, learn it! Buy a book, read online, find the information. There is a lot of information out there, and it is a deep topic, but you can make queries MUCH better if you know what you are doing.

What indexes optimize this query with four joins?

It not always work, but try to:

  1. Reorder tables in joins from the smallest one to the biggest one.
  2. Use subquery instead of ProjectTransaction table:

    JOIN
    (SELECT RefEmployeeID, RefProjectID FROM ProjectTransaction WHERE @from <= PTran.Date AND PTran.Date <= @to AND PTran.Type = 0) AS trans

Optimize SELECT MySql query using INDEXING

I started to write this in a comment because these are hints and not a clear answer. But that's way too long

First of all, it is common sense (but not always a rule of thumb) to index the columns appearing in a WHERE clause :

   playing_date BETWEEN '' AND ''
AND country_code LIKE ''
AND device_report_tag LIKE ''
AND channel_report_tag LIKE ''

If your columns have a very high cardinality (your tag columns???), it's probably not a good idea to index them. Country_code and playing_date should be indexed.

The issue here is that there are so many LIKE in your query. This operator is perf a killer and you are using it on 3 columns. That's awfull for the database. So the question is: Is that really needed?

For instance I see no obvious reason to make a LIKE on a country code. Will you really query like this :

AND country_code LIKE 'U%'

To retrieve UK and US ??
You probably won't. Chances are high that you will know the countries for which you are searching for, so you should do this instead :

AND country_code IN ('UK','US')

Which will be a lot faster if the country column is indexed

Next, If you really want to make LIKE on your 2 tag columns, instead of doing a LIKE you can try this

AND MATCH(device_report_tag) AGAINST ('anything*' IN BOOLEAN MODE)

It is also possible to index your tag columns as FULLTEXT, especially if you search with LIKE ='anything%'. I you search with LIKE='%anything%', the index won't probably help much.

I could also state that with millions rows a day, you might have to PARTITION your tables (on the date for instance). And following your data, a composite index on the date and something else might help.

Really, there's no simple and straight answer to your complex question, especially with what you shown (not a lot).



Related Topics



Leave a reply



Submit