Index Maintenance

DB Index Maintenance

Index maintenance is an I/O intensive workload. Under these circumstances please consider to scale to premium tiers prior to run index maintenance tasks, and when the workload finishes scale down to previous tier. You can automate that using T-SQL. That should reduce maintenance time.

Additionally, compacting large objects data (image, text, ntext, varchar(max), nvarchar(max), varbinary(max), and xml) can save disk space but it can increase the time it takes to maintain indexes. If you have a restricted time window for reindexing you can set LOB_COMPACTION to OFF.

Hope this helps.

Ola Hallengren index maintenance - long periods of time between commands?

The root cause is the DMF sys.dm_db_index_physical_stats in conjunction with the Scanning Modes and heaps.

(My emphasis on the LIMITED portions of the original description)

The mode in which the function is executed determines the level of scanning performed to obtain the statistical data that is used by the function. mode is specified as LIMITED, SAMPLED, or DETAILED. The function traverses the page chains for the allocation units that make up the specified partitions of the table or index. sys.dm_db_index_physical_stats requires only an Intent-Shared (IS) table lock, regardless of the mode that it runs in.

The LIMITED mode is the fastest mode and scans the smallest number of pages. For an index, only the parent-level pages of the B-tree (that is, the pages above the leaf level) are scanned. For a heap, the associated PFS and IAM pages are examined and the data pages of a heap are scanned in LIMITED mode.

With LIMITED mode, compressed_page_count is NULL because the Database Engine only scans non-leaf pages of the B-tree and the IAM and PFS pages of the heap. Use SAMPLED mode to get an estimated value for compressed_page_count, and use DETAILED mode to get the actual value for compressed_page_count. The SAMPLED mode returns statistics based on a 1 percent sample of all the pages in the index or heap. Results in SAMPLED mode should be regarded as approximate. If the index or heap has fewer than 10,000 pages, DETAILED mode is used instead of SAMPLED.

The DETAILED mode scans all pages and returns all statistics.

The modes are progressively slower from LIMITED to DETAILED, because more work is performed in each mode. To quickly gauge the size or fragmentation level of a table or index, use the LIMITED mode. It is the fastest and will not return a row for each nonleaf level in the IN_ROW_DATA allocation unit of the index.

Reference: sys.dm_db_index_physical_stats (Transact-SQL) | Scanning Modes (Microsoft Docs)

Even if Ola's script is executing the sys.dm_db_index_physical_stats in LIMITED mode, depending on the amount of data, it can take a long time to scan a very large heap. And because you are using @UpdateStatistics = 'ALL' you are telling the script to update all the statistics (INDEX and COLUMN) which will include statistics on heap columns.

Possible Solution

You might want to consider not updating the statistics on all objects, but instead limiting the scope to INDEX or then consider changing the following parameter:

@OnlyModifiedStatistics = 'Y'

the default is N

Update statistics only if any rows have been modified since the most recent statistics update.

Reference: SQL Server Index and Statistics Maintenance (ola.hallengren.com)

Index Maintenance

I second everything that Jonathan said - except for the frequency of index maintenance.

Well, if you happen to have a poorly designed index (such as a clustered index on a GUID key), you might actually need to do it at least every night - or even during the day, too.

As a general rule of thumb: if your index fragmentation is below 5%, all is fine. If you have fragmentation between 5% and approx. 30%, you should do an index reorganization:

ALTER INDEX (your index name) ON (your table name) REORGANIZE

If your index has index fragmentation of more than 30%, you need to rebuild it completely:

ALTER INDEX (your index name) ON (your table name) REBUILD

Rebuilding an index can be disruptive - try to do it at off-hours, e.g. during the night.

In order to determine index fragmentation, you can use this DMV query:

SELECT 
t.NAME 'Table name',
i.NAME 'Index name',
ips.index_type_desc,
ips.alloc_unit_type_desc,
ips.index_depth,
ips.index_level,
ips.avg_fragmentation_in_percent,
ips.fragment_count,
ips.avg_fragment_size_in_pages,
ips.page_count,
ips.avg_page_space_used_in_percent,
ips.record_count,
ips.ghost_record_count,
ips.Version_ghost_record_count,
ips.min_record_size_in_bytes,
ips.max_record_size_in_bytes,
ips.avg_record_size_in_bytes,
ips.forwarded_record_count
FROM
sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') ips
INNER JOIN
sys.tables t ON ips.OBJECT_ID = t.Object_ID
INNER JOIN
sys.indexes i ON ips.index_id = i.index_id AND ips.OBJECT_ID = i.object_id
WHERE
AVG_FRAGMENTATION_IN_PERCENT > 0.0
ORDER BY
AVG_FRAGMENTATION_IN_PERCENT, fragment_count

Michelle Ufford has a great automatic index defrag script - highly recommended! Or then you should look into setting up SQL Server maintenance plans which can run e.g. every night and clean up your indices.

Marc

Why index REBUILD does not reduce index fragmentatation?

If an index is very small (I believe less than 8 pages) it will use mixed extents. Therefore, it'll appear as if there is still fragmentation remaining, as the housing extent will contain pages from multiple indexes.

Because of this, and also the fact that in such a small index that fragmentation is typically negligable, you really should only be rebuilding indexes with a certain page threshold. It is best practices to rebuild fragmented indexes that are a minimum of 1000 pages.

How to set Azure SQL to rebuild indexes automatically?

Update: Note that the engineering team has published updated guidance to better codify some of the suggestions in this answer in a more "official" from Microsoft place as some customers asked for that. SQL Server/DB Index Guidance. Thanks, Conor

original answer:

I'll point out that most people don't need to consider rebuilding indexes in SQL Azure at all. Yes, B+ Tree indexes can become fragmented, and yes this can cause some space overhead and some CPU overhead compared to having perfectly tuned indexes. So, there are some scenarios where we do work with customers to rebuild indexes. (The primary scenario is when the customer may run out of space, currently, as disk space is somewhat limited in SQL Azure due to the current architecture). So, I will encourage you to step back and consider that using the SQL Server model for managing databases is not "wrong" but it may or may not be worth your effort.

(If you do end up needing to rebuild an index, you are welcome to use the models posted here by the other posters - they are generally fine models to script tasks. Note that SQL Azure Managed Instance also supports SQL Agent which you can also use to create jobs to script maintenance operations if you so choose).

Here are some details that may help you decide if you may be a candidate for index rebuilds:

  • The link you referenced is from a post in 2013. The architecture for SQL Azure was completely redone after that post. Specifically, the hardware architecture moved from a model that was based on local spinning disks to one based on local SSDs (in most cases). So, the guidance in the original post is out of date.
  • You can have cases in the current architecture where you can run out of space with a fragmented index. You have options to rebuild the index or to move to a larger reservation size for awhile (which will cost more money) that supports a larger disk space allocation. [Since the local SSD space on the machines is limited, reservation sizes are roughly linked to proportions of the machine. As we get newer hardware with larger/more drives, you have more scale-up options].
  • SSD fragmentation impact is relatively low compared to rotating disks since the cost of a random IO is not really any higher than a sequential one. The CPU overhead of walking a few more B+ Tree intermediate pages is modest. I've usually seen an overhead of perhaps 5-20% max in the average case (which may or may not justify regular rebuilds which have a much bigger workload impact when rebuilding)
  • If you are using query store (which is on by default in SQL Azure), you can evaluate whether a specific index rebuild helps your performance visibly or not. You can do this as a test to see if your workload improves before bothering to take the time to build and manage index rebuild operations yourself.
  • Please note that there is currently no intra-database resource governance within SQL Azure for user workloads. So, if you start an index rebuild, you may end up consuming lots of resources and impacting your main workload. You can try to time things to be done off-hours, of course, but for applications with lots of customers around the world this may not be possible.
  • Additionally, I will note that many customers have index rebuild jobs "because they want stats to be updated". It is not necessary to rebuild an index just to rebuild the stats. In recent SQL Server and SQL Azure, the algorithm for stats update was made more aggressive on larger tables and the model for how we estimate cardinality in cases where customers are querying recently inserted data (since the last stats update) have been changed in later compatibility levels. So, it is often the case that the customer doesn't even need to do any manual stats update at all.
  • Finally, I will note that the impact of stats being out of date was historically that you'd get plan choice regressions. For repeated queries, a lot of the impact of this was mitigated by the introduction of the automatic tuning feature over query store (which forces prior plans if it notices a large regression in query performance compared to the prior plan).

The official recommendation that I give customers is to not bother with index rebuilds unless they have a tier-1 app where they've demonstrated real need (benefits outweigh the costs) or where they are a SaaS ISV where they are trying to tune a workload over many databases/customers in elastic pools or in a multi-tenant database design so they can reduce their COGS or avoid running out of disk space (as mentioned earlier) on a very big database. In the largest customers we have on the platform, we sometimes see value in doing index operations manually with the customer, but we often do not need to have a regular job where we do this kind of operation "just in case". The intent from the SQL team is that you don't need to bother with this at all and you can just focus on your app instead. There are always things that we can add or improve into our automatic mechanisms, of course, so I completely allow for the possibility that an individual customer database may have a need for such actions. I've not seen any myself beyond the cases I mentioned, and even those are rarely an issue.

I hope this gives you some context to understand why this isn't being done in the platform yet - it just hasn't been an issue for the vast majority of customer databases we have today in our service compared to other pressing needs. We revisit the list of things we need to build each planning cycle, of course, and we do look at opportunities like this regularly.

Good luck - whatever your outcome here, I hope this helps you make the right choice.

Sincerely,
Conor Cunningham
Architect, SQL

Azure SQL managed instances

Azure does not automatically rebuild your indexes. You need to maintain the SQL indexes.

There is a blog article on how you can automate these tasks here: https://geeks.ms/davidjrh/2015/10/08/rebuilding-sql-database-indexes-using-azure-automation/

The Best Recovery Model to Rebuilt Indexes

Shrinking is a wasteful operation.. Index maintenance such as index rebuilds are also a wasteful operation.. Rebuilding your indexes will generate a lot of unused allocated space in the data file. Index fragmentation is pretty meaningless in regards to performance. As you've seen, shrinking your database causes high levels of index fragmentation. It's a vicious cycle, like Brent mentions in the first article I linked. My advice, don't do either (except for very rare circumstances).

But if you still are hung up on rebuilding your indexes, then doing so with Simple Recovery Model will at least minimize the logging in your Transaction Log on the process. It won't save you any space in the data file though. The catch is by switching to Simple Recovery Model, you lose point in time recovery after the last Full backup, and so are at risk for data loss until your next Full backup. You won't be able to utilize point in time recovery until you switch back to Full Recovery Model and then take a Full backup too. If your recovery point objectives can tolerate that, then that's fine.



Related Topics



Leave a reply



Submit