What Are Your Most Common SQL Optimizations

What are your most common sql optimizations?

Reducing the amount of data that is returned, by only returning the fields required and only returning the rows required. This is the most common, as you do it for every query that returns data.

With ENGINE=InnoDB, the above advice especially applies to avoiding unnecessary fetching of TEXT and BLOB columns since they may be stored in a different location (thereby costing extra I/O).

Adding indexes. This is not done as frequently, as some tables doesn't need any other index than the one created for the primary key.

What steps should be necessary to optimize a poorly performing query?

  1. Look at the execution plan in query analyzer
  2. See what step costs the most
  3. Optimize the step!
  4. Return to step 1 [thx to Vinko]

What's your approach for optimizing large tables (+1M rows) on SQL Server?

  1. At 1 million records, I wouldn't consider this a particularly large table needing unusual optimization techniques such as splitting the table up, denormalizing, etc. But those decisions will come when you've tried all the normal means that don't affect your ability to use standard query techniques.

Now, second approach for optimization was to make a clustered index. Actually the primary index is automatically clusted and I made it a compound index with Stock and Date fields. This is unique, I can't have two quote data for the same stock on the same day.

The clusted index makes sure that quotes from the same stock stay together, and probably ordered by date. Is this second information true?

It's logically true - the clustered index defines the logical ordering of the records on the disk, which is all you should be concerned about. SQL Server may forego the overhead of sorting within a physical block, but it will still behave as if it did, so it's not significant. Querying for one stock will probably be 1 or 2 page reads in any case; and the optimizer doesn't benefit much from unordered data within a page read.

Right now with a half million records it's taking around 200ms to select 700 quotes from a specific asset. I believe this number will get higher as the table grows.

Not necessarily significantly. There isn't a linear relationship between table size and query speed. There are usually a lot more considerations that are more important. I wouldn't worry about it in the range you describe. Is that the reason you're concerned? 200 ms would seem to me to be great, enough to get you to the point where your tables are loaded and you can start doing realistic testing, and get a much better idea of real-life performance.

Now for a third approach I'm thinking in maybe splitting the table in three tables, each for a specific market (stocks, options and forwards). This will probably cut the table size by 1/3. Now, will this approach help or it doesn't matter too much? Right now the table has 50mb of size so it can fit entirely in RAM without much trouble.

No! This kind of optimization is so premature it's probably stillborn.

Another approach would be using the partition feature of SQL Server.

Same comment. You will be able to stick for a long time to strictly logical, fully normalized schema design.

What would be other good approachs to make this the fastest possible?

The best first step is clustering on stock. Insertion speed is of no consequence at all until you are looking at multiple records inserted per second - I don't see anything anywhere near that activity here. This should get you close to maximum efficiency because it will efficiently read every record associated with a stock, and that seems to be your most common index. Any further optimization needs to be accomplished based on testing.

What generic techniques can be applied to optimize SQL queries?

  • Use primary keys
  • Avoid select *
  • Be as specific as you can when building your conditional statements
  • De-normalisation can often be more efficient
  • Table variables and temporary tables (where available) will often be better than using a large source table
  • Partitioned views
  • Employ indices and constraints

What are the best SQL Server performance optimization techniques?

My approach is to gather commands against the server or database into a table using SQL Server Profiler. Once you have that, you can query based on the max and avg execution times, max and avg cpu times, and (also very important) the number of times that the query was run.

Since I try to put all database access code in stored procedures it's easy for me to break out queries. If you use inline SQL it might be harder, since a change to a value in the query would make it look like a different query. You can try to work around this using the LIKE operator to put the same types of queries into the same buckets for calculating the aggregates (max, avg, count).

Once you have a "top 10" list of potential problems you can start looking at them individually to see if either the query can be reworked, an index might help, or making a minor architecture change is in order. To come up with the top 10, try looking at the data in different ways: avg * count for total cost during the period, max for worst offender, just plain avg, etc.

Finally, be sure to monitor over different time periods if necessary. The database usage might be different in the morning when everyone is getting in and running their daily reports than it is at midday when users are entering new data. You may also decide that even though some nightly process takes longer than any other query it doesn't matter since it's run during off hours.

Good luck!

Sql Query Optimization

Articles discussing Query Optimization issues are often very factual and useful, but as you found out they can be hard to follow. It is a bit like when someone is trying to learn the basics rules of baseball, and all the sports commentary he/she finds on the subject is rife with acronyms and strategic details about the benefits of sacrificing someone at bat, and other "inside baseball" trivia...

So you need to learn the basics first:

  • the structure(s) of the database storage
  • indexes' structure, the clustered and non clustered kind, the multi column indexes
  • the concept of covering a query
  • the selectivity of a particular column
  • the disadvantage of indexes when it comes to CRUD operations
  • the basic subtasks/strategies of a query: table or index scan, index seek, sorting, inner-outer merge etc.
  • the log file, the data recovery model.

The following links apply to MS SQL Server. If that is not the DBMS you are using you can try and find similar material for the system of your choice. In fact, so long as you realize that the implementation may vary, it may be useful to peruse the MS documention.

MS SQL storage structures

MS SQL pages and extents

Then as you started doing, learn the way to read query plans (even if not in fully understand at first), and all this should bring you to a level where you start to make sense of the more advanced books or articles on the topic. I do not know of tutorials for Query Plans on the Internet (though I'm quite sure they exist...), but the following methodology may be of use: Start with simple queries, review the query plan (if possible in a graphic fashion), start recognizing the most common elements: Table Scan, Index Seek, Sort, nested loops... Read the detailed properties of these instances: estimated nb of rows, cost percentage etc. When you find a new element that you do not know/understand, use this keyword to find details on the internet. Also: experiment a lot.

Finally you should remember that while the way the query is written and the set of indexes etc. provided cover a great part of optimization needs, there are other sources of optmization, for example the way hardware is put to use (a basic example is how by having the data file and the log file on separate physical disks, we can greatly improve CRUD performance).



Related Topics



Leave a reply



Submit