Improve SQL Server Query Performance on Large Tables

Slow running SQL query on a large table

You want an index that covers both fields you are searching on. Try this:

CREATE INDEX IX_Composite ON [table1] ([uid], [added_on]);

Improve SQL Server query performance on large tables MsSQL vs MySQL

If you are selecting 10m + rows, data size per row for you SQL server table is 42 bytes.

(bigint = 8 bytes) + (varchar of 12 chars = 14 bytes) + (datetimeoffset = 10 bytes)*2
So 10 million rows should be 420 000 000 bytes which is aproximate 400MB.

So reading 400MB data in 2 sec is 400MB/2s = 200MB/s, which is a reasonable speed for accessing hard drive.
But 400MB/0.0011s is 363 636,36MB/s, that is way beyond any hard drive speed and very similar to the RAM accessing speed.

So the table in MySQL must be fully cached in Memory and that is why your query can be finshed at 0.0011 sec.

You need to find a way to cached you MsSQL table fully in Memory too in order to achieve similar speed.

Edited:
If your query is

Select * FROM [dbo].[TEST_TRADE]
where [TradeId] = 'ID99999999'

Create clustered index on column [TradeId] first, then create PK on [ID], and if possible, use fixed length char(12) for [TradeId] instread of varchar.

Edited:

Tested, making a clustered index on column [TradeId] will speed the query up to 50%.

And I would suggest to check the index fragmentation for [IX_TradeID] first and rebuild the index regularly.

Performance Improve on SQL Large table

Investigate horizontal partitioning. This will really only help query performance if you can force users to put the partitioning key into the predicates.

Try vertical partitioning, where you split one 260-column table into several tables with fewer columns. Put all the values which are commonly required together into one table. The queries will only reference the table(s) which contain columns required. This will give you more rows per page i.e. fewer pages per query.

You have a high fraction of NULLs. Sparse columns may help, but calculate your percentages as they can hurt if inappropriate. There's an SO question on this.

Filtered indexes and filtered statistics may be useful if the DB often runs similar queries.

Improve performance on simple query with table of 10 million rows

To scan all the values of any column from a clustered index requires a complete table scan. If you want to optimize for retrieving all the ids or count the rows try a non-clustered index on id or a columnstore index.

Queries on large table extremely slow, how can I optimize?

The problem is that your indexes do not cover your query. In other words: the server cannot service your query by using just one index, so either it will have to do a key lookup for every row, or more likely it will choose to just scan the whole table.

Generally, single-column indexes are not very useful for precisely this reason. You can change one of your existing indexes.

  • You want the equality = predicates from your WHERE to be the first columns in the index key.
  • Then you add in join columns and grouping columns. It is normally only worth it to add one of these at this stage, unless a join is on a unique value.
  • Finally, add in all other columns. These do not have to be part of the key, they can be INCLUDE columns.

For example:

CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorId] ON [dbo].[VendorContracts]
(VendorId, StateId)
INCLUDE
(ContractAmount)
WITH (DROP_EXISTING = ON, ONLINE = ON);

What's your approach for optimizing large tables (+1M rows) on SQL Server?


  1. At 1 million records, I wouldn't consider this a particularly large table needing unusual optimization techniques such as splitting the table up, denormalizing, etc. But those decisions will come when you've tried all the normal means that don't affect your ability to use standard query techniques.

Now, second approach for optimization was to make a clustered index. Actually the primary index is automatically clusted and I made it a compound index with Stock and Date fields. This is unique, I can't have two quote data for the same stock on the same day.

The clusted index makes sure that quotes from the same stock stay together, and probably ordered by date. Is this second information true?

It's logically true - the clustered index defines the logical ordering of the records on the disk, which is all you should be concerned about. SQL Server may forego the overhead of sorting within a physical block, but it will still behave as if it did, so it's not significant. Querying for one stock will probably be 1 or 2 page reads in any case; and the optimizer doesn't benefit much from unordered data within a page read.

Right now with a half million records it's taking around 200ms to select 700 quotes from a specific asset. I believe this number will get higher as the table grows.

Not necessarily significantly. There isn't a linear relationship between table size and query speed. There are usually a lot more considerations that are more important. I wouldn't worry about it in the range you describe. Is that the reason you're concerned? 200 ms would seem to me to be great, enough to get you to the point where your tables are loaded and you can start doing realistic testing, and get a much better idea of real-life performance.

Now for a third approach I'm thinking in maybe splitting the table in three tables, each for a specific market (stocks, options and forwards). This will probably cut the table size by 1/3. Now, will this approach help or it doesn't matter too much? Right now the table has 50mb of size so it can fit entirely in RAM without much trouble.

No! This kind of optimization is so premature it's probably stillborn.

Another approach would be using the partition feature of SQL Server.

Same comment. You will be able to stick for a long time to strictly logical, fully normalized schema design.

What would be other good approachs to make this the fastest possible?

The best first step is clustering on stock. Insertion speed is of no consequence at all until you are looking at multiple records inserted per second - I don't see anything anywhere near that activity here. This should get you close to maximum efficiency because it will efficiently read every record associated with a stock, and that seems to be your most common index. Any further optimization needs to be accomplished based on testing.



Related Topics



Leave a reply



Submit