Why & When Should I Use Sparse Column? (SQL Server 2008)

Why & When should I use SPARSE COLUMN? (SQL SERVER 2008)

A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)

  • So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

  • A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.

Use column set for ALL_SPARSE_COLUMNS in the Sql View

This behaviour is documented for SPARSE columns when there is a COLUMN_SET present.

Warning:

Adding a column set changes the behavior of SELECT * queries. The query will return the column set as an XML column and not return the individual sparse columns. Schema designers and software developers must be careful not to break existing applications. Individual sparse columns can still be queried by name in a SELECT statement.

So the view will never contain that column, unless you specifically select it, not just using select *.


There is another issue that you would get even if it wasn't SPARSE.

You are adding the column after creating the view.

You need to then run the following statement:

EXEC sp_refreshview N'dbo.v_my_view';

When creating a view, the view is parsed into a compiled expression tree (without any optimizations). Then, when you use the view, the compiler does not simply dump the view text into the outer query. Instead, it parses the outer query into an expression tree, and uses the expression tree from the view in the correct place.

So when you add a column, the expression tree is not updated. So you need to refresh the view definition.


You also need to rebuild any stored procedures which access this table or the view, for the same reason.

EXEC sp_refreshsqlmodule N'dbo.YourProc';

db<>fiddle

Is SPARSE the best way to implement an SQL column union?

According to the BOL section Estimated Space Savings by Data Type you should consider SPARSE for INT columns if at least 64% are NULL.

You have 94.7% NULL so seems like you could benefit from this to me. This does seem an unusual design though.

Remove SPARSE from column definition

According to Microsoft:

Changing a column from sparse to nonsparse or nonsparse to sparse requires changing the storage format of the column. The SQL Server Database Engine uses the following procedure to accomplish this change:

1) Adds a new column to the table in the new storage size and format.

2) For each row in the table, updates and copies the value stored in the old column to the new column.

3) Removes the old column from the table schema.

4) Rebuilds the table (if there is no clustered index) or rebuilds the clustered index to reclaim space used by the old column.

Note

Step 2 can fail when the size of the data in the row exceeds the maximum allowable row size. This size includes the size of the data stored in the old column and the updated data stored in the new column. This limit is 8060 bytes for tables that do not contain any sparse columns or 8018 bytes for tables that contain sparse columns. This error can occur even if all eligible columns have been pushed off-row.

https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns



Related Topics



Leave a reply



Submit