How to Create Unique Index Where Column Order Is Not Taken into Account (Set)

How to create unique index where column order is not taken into account (set?)

You can create an index on an expression, in this case least() and greatest():

create unique index idx_obj1_obj2 on table(least(Object1, Object2), greatest(Object1, Object2));

Note: there is one slight weirdness if the columns allow NULL values. In that case, the same value would only be allowed once, regardless of the column it is in. This can be fixed with a more complicated expression, if it is actually a problem.

Unique constraint for 2 columns that works both ways

You can create a unique index that always indexes the same order of values:

create unique index 
on friends (least(requestor, requestee), greatest(requestor, requestee));

How do I enforce set-like uniqueness between multiple columns?

create unique index idx_unique_ab 
on x (least(a,b), greatest(a,b));

Does column order matter when defining unique constraints

The order matters if you expect to ever use the index as a partial index. For example, suppose you had a unique index on (col1, col2), and you wanted to optimize the following query:

SELECT col1, col2 FROM foo WHERE col1 = 'stack';

The index on (col1, col2) could still be used here, because col1, which appears in the WHERE clause, is the leftmost portion of the index. Had you defined the unique constraint on (col2, col1), the index could not be used for this query.

Declaring an Index as unique in SQL Server

Long story short: if your data are intrinsically UNIQUE, you will benefit from creating a UNIQIE index on them.

See the article in my blog for detailed explanation:

  • Making an index UNIQUE

Now, the gory details.

As @Mehrdad said, UNIQUENESS affects the estimated row count in the plan builder.

UNIQUE index has maximal possible selectivity, that's why:

SELECT  *
FROM table1 t2, table2 t2
WHERE t1.id = :myid
AND t2.unique_indexed_field = t1.value

almost surely will use NESTED LOOPS, while

SELECT  *
FROM table1 t2, table2 t2
WHERE t1.id = :myid
AND t2.non_unique_indexed_field = t1.value

may benefit from a HASH JOIN if the optimizer thinks that non_unique_indexed_field is not selective.

If your index is CLUSTERED (i. e. the rows theirselves are contained in the index leaves) and non-UNIQUE, then a special hidden column called uniquifier is added to each index key, thus making the key larger and the index slower.

That's why UNIQUE CLUSTERED index is in fact a little more efficicent than a non-UNIQUE CLUSTERED one.

In Oracle, a join on UNIQUE INDEX is required for a such called key preservation, which ensures that each row from a table will be selected at most once and makes a view updatable.

This query:

UPDATE  (
SELECT *
FROM mytable t1, mytable t2
WHERE t2.reference = t1.unique_indexed_field
)
SET value = other_value

will work in Oracle, while this one:

UPDATE  (
SELECT *
FROM mytable t1, mytable t2
WHERE t2.reference = t1.non_unique_indexed_field
)
SET value = other_value

will fail.

This is not an issue with SQL Server, though.

One more thing: for a table like this,

CREATE TABLE t_indexer (id INT NOT NULL PRIMARY KEY, uval INT NOT NULL, ival INT NOT NULL)
CREATE UNIQUE INDEX ux_indexer_ux ON t_indexer (uval)
CREATE INDEX ix_indexer_ux ON t_indexer (ival)

, this query:

/* Sorts on the non-unique index first */
SELECT TOP 1 *
FROM t_indexer
ORDER BY
ival, uval

will use a TOP N SORT, while this one:

/* Sorts on the unique index first */
SELECT TOP 1 *
FROM t_indexer
ORDER BY
uval, ival

will use just an index scan.

For the latter query, there is no point in additional sorting on ival, since uval are unique anyway, and the optimizer takes this into account.

On sample data of 200,000 rows (id == uval == ival), the former query runs for 15 seconds, while the latter one is instant.

Create unique constraint with null columns

Postgres 15 or newer

Postgres 15 adds the clause NULLS NOT DISTINCT. The release notes:

  • Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)

    Previously NULL values were always indexed as distinct values, but
    this can now be changed by creating constraints and indexes using
    UNIQUE NULLS NOT DISTINCT.

With this clause NULL is treated like just another value, and a UNIQUE constraint does not allow more than one row with the same NULL value. The task is simple now:

ALTER TABLE favorites
ADD CONSTRAINT favo_uni UNIQUE NULLS NOT DISTINCT (user_id, menu_id, recipe_id);

There are examples in the manual chapter "Unique Constraints".

The clause switches behavior for all keys of the same index. You can't treat NULL as equal for one key, but not for another.

NULLS DISTINCT remains the default (in line with standard SQL) and does not have to be spelled out.

The same clause works for a UNIQUE index, too:

CREATE UNIQUE INDEX favo_uni_idx
ON favorites (user_id, menu_id, recipe_id) NULLS NOT DISTINCT;

Note the position of the new clause after the key fields.

Postgres 14 or older

Create two partial indexes:

CREATE UNIQUE INDEX favo_3col_uni_idx ON favorites (user_id, menu_id, recipe_id)
WHERE menu_id IS NOT NULL;

CREATE UNIQUE INDEX favo_2col_uni_idx ON favorites (user_id, recipe_id)
WHERE menu_id IS NULL;

This way, there can only be one combination of (user_id, recipe_id) where menu_id IS NULL, effectively implementing the desired constraint.

Possible drawbacks:

  • You cannot have a foreign key referencing (user_id, menu_id, recipe_id). (It seems unlikely you'd want a FK reference three columns wide - use the PK column instead!)
  • You cannot base CLUSTER on a partial index.
  • Queries without a matching WHERE condition cannot use the partial index.

If you need a complete index, you can alternatively drop the WHERE condition from favo_3col_uni_idx and your requirements are still enforced.

The index, now comprising the whole table, overlaps with the other one and gets bigger. Depending on typical queries and the percentage of NULL values, this may or may not be useful. In extreme situations it may even help to maintain all three indexes (the two partial ones and a total on top).

This is a good solution for a single nullable column, maybe for two. But it gets out of hands quickly for more as you need a separate partial index for every combination of nullable columns, so the number grows binomially. For multiple nullable columns, see instead:

  • Why doesn't my UNIQUE constraint trigger?

Aside: I advise not to use mixed case identifiers in PostgreSQL.

Enforcing mutual uniqueness across multiple columns

You could create an "external" constraint in the form of an indexed view:

CREATE VIEW dbo.OccupiedRooms
WITH SCHEMABINDING
AS
SELECT r.Id
FROM dbo.Occupants AS o
INNER JOIN dbo.Rooms AS r ON r.Id IN (o.LivingRoomId, o.DiningRoomId)
;
GO

CREATE UNIQUE CLUSTERED INDEX UQ_1 ON dbo.OccupiedRooms (Id);

The view is essentially unpivoting the occupied rooms' IDs, putting them all in one column. The unique index on that column makes sure it does not have duplicates.

Here are demonstrations of how this method works:

  • failed insert;

  • successful insert.

UPDATE

As hvd has correctly remarked, the above solution does not catch attempts to insert identical LivingRoomId and DiningRoomId when they are put on the same row. This is because the dbo.Rooms table is matched only once in that case and, therefore, the join produces produces just one row for the pair of references.

One way to fix that is suggested in the same comment: additionally to the indexed view, use a CHECK constraint on the dbo.OccupiedRooms table to prohibit rows with identical room IDs. The suggested LivingRoomId <> DiningRoomId condition, however, will not work for cases where both columns are NULL. To account for that case, the condition could be expanded to this one:

LivingRoomId <> DinindRoomId AND (LivingRoomId IS NOT NULL OR DinindRoomId IS NOT NULL)

Alternatively, you could change the view's SELECT statement to catch all situations. If LivingRoomId and DinindRoomId were NOT NULL columns, you could avoid a join to dbo.Rooms and unpivot the columns using a cross-join to a virtual 2-row table:

SELECT  Id = CASE x.r WHEN 1 THEN o.LivingRoomId ELSE o.DiningRoomId END
FROM dbo.Occupants AS o
CROSS
JOIN (SELECT 1 UNION ALL SELECT 2) AS x (r)

However, as those columns allow NULLs, this method would not allow you to insert more than one single-reference row. To make it work in your case, you would need to filter out NULL entries, but only if they come from rows where the other reference is not NULL. I believe adding the following WHERE clause to the above query would suffice:

WHERE o.LivingRoomId IS NULL AND o.DinindRoomId IS NULL
OR x.r = 1 AND o.LivingRoomId IS NOT NULL
OR x.r = 2 AND o.DinindRoomId IS NOT NULL

How can I create a unique constraint on my column (SQL Server 2008 R2)?

To create these constraints through the GUI you need the "indexes and keys" dialogue not the check constraints one.

But in your case you just need to run the piece of code you already have. It doesn't need to be entered into the expression dialogue at all.

How do I create a unique constraint that also allows nulls?

SQL Server 2008 +

You can create a unique index that accept multiple NULLs with a WHERE clause. See the answer below.

Prior to SQL Server 2008

You cannot create a UNIQUE constraint and allow NULLs. You need set a default value of NEWID().

Update the existing values to NEWID() where NULL before creating the UNIQUE constraint.



Related Topics



Leave a reply



Submit