Oracle: Single Multicolumn Index or Two Single Column Indexes

Oracle: Single multicolumn index or two single column indexes

It depends...

It is quite unlikely that an index on just column1 will be beneficial if you already have a composite index on column1, column2. Since column1 is the leading index, queries against the table that have only column1 as a predicate will be able to use the composite index. If you are frequently running queries that need to do a full scan of the index and the presence of column2 substantially increases the size of the index, it is possible that an index on just column1 would be more efficient since the full index scan would need to do less I/O. But that is a pretty unusual situation.

An index on just column2 may be beneficial if some of your queries against the table specify predicates on just column2. If there are relatively few distinct values of column1, it is possible that Oracle could do an index skip scan using the composite index to satisfy queries that only specify column2 as a predicate. But a skip scan is likely to be much less efficient than a range scan so it is reasonably likely that an index on just column2 would benefit those queries. If there are a large number of distinct values for column1, the skip scan would be even less efficient and an index on just column2 would be more beneficial. Of course, if you never query the table using column2 without also specifying a predicate on column1, you wouldn't need an index on just column2.

Oracle multiple vs single column index

Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.

Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:

  • Ordering of rows (e.g. if PK is incremented only then all values will be appended)
  • Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.

If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?

It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.

Q1 - If so, why should I create an index with those two columns?

It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.

However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).

If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?

Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.

Oracle SQL: Single Index with two Columns vs index on one Column

For this particular query, you want the two column index version:

create index ind_tableb on tableb (data, id);

The above index, if used, would let Oracle rapidly lookup tabled.data values for a potential match with a tableb.data value. If a match be found, then the same index would also contain the tableb.ID value for the next join to tablec. If you just used the single column version on tableb.data alone, then Oracle would have to seek back to the tableb table to find the ID values. This could hurt performance and might even cause the index to not be used.

How does a multi-column index work in oracle?

You can think of the index key as conceptually being the 'concatenation' of the all of the columns, and generally you need to have a leading element of that key in order to get benefit from the index. So for an index on (company,store,sku) then

WHERE s.company = 1 AND s.store = 1 AND s.sku = 123;

can potentially benefit from the index

WHERE s.store = 1 AND s.sku = 123;

is unlikely to benefit (but see footnote below)

WHERE s.company = 1 AND s.store = 1;

can potentially benefit from the index.

In all cases, I say "potentially" etc, because it is a costing decision by the optimizer. For example, if I only have (say) 2 companies and 2 stores then a query on company and store, whilst it could use the index is perhaps better suited to not to do so, because the volume of information to be queried is still a large percentage of the size of the table.

In your example, it might be the case that an index on (store,sku,company) would be "good enough" to satisfy all three, but that depends on the distribution of data. But you're thinking the right way, ie, get as much value from as few indexes as possible.

Footnote: There is a thing called a "skip scan" where we can get value from an index even if you do not specify the leading column(s), but you will typically only see that if the number of distinct values in those leading columns is low.

Single-column vs. multi-column index for separate but always together joins

It depends on the join type chosen:

  • With a nested loop join, an index on the join condition of the lookup tables would help.

  • For a hash join, no index helps.

  • For a merge join, an index on the join condition of thr lookup table may help.

It all depends on the cardinalities.

A multi-column index is definitely the wrong thing.

Two single-column indexes vs one two-column index in MySQL?

If you have two single column indexes, only one of them will be used in your example.

If you have an index with two columns, the query might be faster (you should measure). A two column index can also be used as a single column index, but only for the column listed first.

Sometimes it can be useful to have an index on (A,B) and another index on (B). This makes queries using either or both of the columns fast, but of course uses also more disk space.

When choosing the indexes, you also need to consider the effect on inserting, deleting and updating. More indexes = slower updates.

Multiple indexes vs single index on multiple columns in postgresql

Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).

Note, that queries like ... WHERE timestamp BETWEEN smth AND smth will not use the index above while queries like ... WHERE location = 'smth' or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth will. It's because the first attribute in index is crucial for searching and sorting.

Don't forget to perform

ANALYZE;

after index creation in order to collect statistics.

Update:
As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR clauses like a = 123 OR b = 456 (assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND queries but instead of union there would be an intersection.

Multiple Indexes vs Multi-Column Indexes

I agree with Cade Roux.

This article should get you on the right track:

  • Indexes in SQL Server 2005/2008 – Best Practices, Part 1
  • Indexes in SQL Server 2005/2008 – Part 2 – Internals

One thing to note, clustered indexes should have a unique key (an identity column I would recommend) as the first column.
Basically it helps your data insert at the end of the index and not cause lots of disk IO and Page splits.

Secondly, if you are creating other indexes on your data and they are constructed cleverly they will be reused.

e.g. imagine you search a table on three columns

state, county, zip.

  • you sometimes search by state only.
  • you sometimes search by state and county.
  • you frequently search by state, county, zip.

Then an index with state, county, zip. will be used in all three of these searches.

If you search by zip alone quite a lot then the above index will not be used (by SQL Server anyway) as zip is the third part of that index and the query optimiser will not see that index as helpful.

You could then create an index on Zip alone that would be used in this instance.

By the way We can take advantage of the fact that with Multi-Column indexing the first index column is always usable for searching and when you search only by 'state' it is efficient but yet not as efficient as Single-Column index on 'state'

I guess the answer you are looking for is that it depends on your where clauses of your frequently used queries and also your group by's.

The article will help a lot. :-)



Related Topics



Leave a reply



Submit