Create an Index on SQL View with Union Operators? Will It Really Improve Performance

Create an index on SQL view with UNION operators? Will it really improve performance?

You cannot create an index on a view that makes use of a union operator. Really no way around that, sorry!

I would imagine you've seen this, but check out this MSDN page. It gives the requirements for indexed views and explains what they are and how they work.

As to whether or not you'd see a performance benefit if you COULD index the view, that would depend entirely on the size of your tables. I would not expect any impact on creating separate indexed views, as I would assume that your tables are already indexed and you aren't doing any joining or logic in the view.

Performance of a view compared to a query (with union statements)

You don't mention which version or edition of MSSQL you have, but you may be looking for "partitioned views" or even "partitioned tables" (see Books Online for full details). Whether they have any benefit for you or not depends on how much data you have, and testing would be the best way to find out.

And to answer your general question, when you have a view in a query the view name is replaced with the view definition when you query it, so MSSQL never 'sees' the view anyway. The exception is indexed views, where the query planner can take the data from the view, not the table. But indexed views have disadvantages too. Books Online has a lot of relevant information: look for "View Resolution", "Resolving Indexes on Views", "Creating Indexed Views" etc.

Creating index on schema binding view in SQL Server

The problem is not with SCHEMABINDING but because UNION ALL is not allowed in an index view. I would expect the error message to be clear about that. See the documentation for index view requirements.

Also, why index this view? I wouldn't expect the indexed view will improve performance significantly, assuming the underlying tables have a unique nonclustered index on prodid and prodname.

SQL Performance UNION vs OR

Either the article you read used a bad example, or you misinterpreted their point.

select username from users where company = 'bbc' or company = 'itv';

This is equivalent to:

select username from users where company IN ('bbc', 'itv');

MySQL can use an index on company for this query just fine. There's no need to do any UNION.

The more tricky case is where you have an OR condition that involves two different columns.

select username from users where company = 'bbc' or city = 'London';

Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

The UNION solution is for this type of case.

select username from users where company = 'bbc' 
union
select username from users where city = 'London';

Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.


An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.

The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.

Use A Union Or A Join - What Is Faster

Union will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table.

The Join will go through each row of both tables, finding matches in the other table therefore needing a lot more processing due to searching for matching rows for each and every row.

EDIT

By Union, I mean Union All as it seemed adequate for what you were trying to achieve. Although a normal Union is generally faster then Join.

EDIT 2 (Reply to @seebiscuit 's comment)

I don't agree with him. Technically speaking no matter how good your join is, a "JOIN" is still more expensive than a pure concatenation. I made a blog post to prove it at my blog codePERF[dot]net. Practically speaking they serve 2 completely different purposes and it is more important to ensure your indexing is right and using the right tool for the job.

Technically, I think it can be summed using the following 2 execution plans taken from my blog post:

UNION ALL Execution Plan

UNION ALL Execution Plan

JOIN Execution Plan

JOIN Execution Plan

Practical Results

Practically speaking the difference on a clustered index lookup is negligible:

Benchmark Results

How to create indexed view of children count

I don't think you can achieve that using a CTE neither a LEFT JOIN because there are many restriction using the indexed views.

Workaround

I suggest splitting the query into two part:

  1. Create an indexed view instead of a common table expression (CTE)
  2. Create a non indexed view that perform the LEFT JOIN

Beside that, create a Non-Clustered index on Entity column in Table Example .

Then when you query the non-indexed view, it will use indexes

--CREATE TABLE
CREATE TABLE Example (
Id INT primary key,
Entity varchar(50),
Parent varchar(50)
)

--INSERT VALUES
INSERT INTO Example
VALUES
(1, 'A', NULL)
,(2, 'AA', 'A')
,(3, 'AB','A')
,(4, 'ABA', 'AB')
,(5, 'ABB', 'AB')
,(6, 'AAA', 'AA')
,(7, 'AAB', 'AA')
,(8, 'AAC', 'AA')

--CREATE NON CLUSTERED INDEX
CREATE NONCLUSTERED INDEX idx1 ON dbo.Example(Entity);

--CREATE Indexed View

CREATE VIEW dbo.ExampleView_1
WITH SCHEMABINDING
AS
SELECT Parent, COUNT_BIG(*) as Count
FROM dbo.Example
GROUP BY Parent

CREATE UNIQUE CLUSTERED INDEX idx ON dbo.ExampleView_1(Parent);

--Create non-indexed view
CREATE VIEW dbo.ExampleView_2
WITH SCHEMABINDING
AS
SELECT e.Entity, COALESCE(Count,0) Count
FROM dbo.Example e
LEFT JOIN dbo.ExampleView_1 g
ON e.Entity = g.Parent

So when you perform the following query:

SELECT * FROM dbo.ExampleView_2 WHERE Entity = 'A'

You can see that the view Clustered index and the Table Non-Clustered index are used in the execution plan:

Sample Image

Additional Information

I didn't find additional workarounds to replace the use of LEFT JOIN or UNION or CTE in indexed views, you can check many similar Stackoverflow questions:

  • Indexing views with a CTE
  • What to replace left join in a view so i can have an indexed view?
  • Create an index on SQL view with UNION operators? Will it really improve performance?

Update 1 - Splitting view vs. Cartesian join

To identify the better approach, i tried to compare both suggested approaches.

--The other approach (cartesian join)
CREATE TABLE TwoRows (
N INT primary key
)

INSERT INTO TwoRows
VALUES (1),(2)

CREATE VIEW dbo.indexedView WITH SCHEMABINDING AS
SELECT
IIF(T.N = 2, Entity, Parent) as Entity
, COUNT_BIG(*) as CountPlusOne
, COUNT_BIG(ALL IIF(T.N = 2, NULL, 1)) as Count
FROM dbo.Example E1
INNER JOIN dbo.TwoRows T
ON 1=1
WHERE IIF(T.N = 2, Entity, Parent) IS NOT NULL
GROUP BY IIF(T.N = 2, Entity, Parent)
GO

CREATE UNIQUE CLUSTERED INDEX testIndex ON indexedView(Entity)

I created each indexed view on seperate databases and performed the following query:

SELECT * FROM View WHERE Entity = 'AA'

Splitting view

Sample Image

Cartesian Join

Sample Image

Time Statistics

The time statistics shows that the Cartesian join approach execution time is higher than the Splitting view approach, as shown in the image below (cartesian join to the right):

Sample Image

Adding WITH(NOEXPAND)

Also i tried to add WITH(NOEXPAND) option the the cartesian join approach, to force the database engine to use the indexed view clustered index and the result was as following:

Sample Image

I cleared all caches and perform a comparison, the Time statistics comparison shows that the Splitting view approach is still faster than the cartesian join approach (WITH(NOEXPAND) approach to the right):

Sample Image

Enhancing the performance of insert query from multiple tables using UNION

I think you should work take into considerations two factors:

  1. Hardware and server specifications
  2. Data size

If the machine specifications you are working with are not very performant and you have a huge size of data, then distributing the INSERT operation (multiple INSERT) will enhance the performance because it will not consume the memory like a UNION operation.
If the data size is acceptable and can be handled by SQL Server allocated memory, then you should use common table expression with a SELECT INTO query:

WITH CTE_1 as (SELECT * FROM TABLE_1 
UNION ALL
SELECT * FROM TABLE_2
UNION ALL
SELECT * FROM TABLE_3)
SELECT *
INTo New_Table
FROM CTE_1

Also note the difference between UNION and UNION ALL operations:

A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results. Also try to avoid INSERT INTO and use SELECT INTO instead because it is minimally logged assuming proper trace flags are set.

Another thing to mention, (i didn't tested this approach but maybe it should gives better performance - and it may caused a huge index size) you should also try to create an indexed view over all tables (UNION query you have mentioned), then execute the query, as example:

SELECT * INTO ...  FROM vw_Unified

Update 1

If you are familiar with SSIS, performing data import process may gives better performance when using SSIS:

  • Data Import Performance Comparison T-SQL vs SSIS for large import
  • SSIS for table-to-table inserts vs. (SQL only) INSERT INTO () SELECT FROM approach
  • Implementing Foreach Looping Logic in SSIS


Related Topics



Leave a reply



Submit