Oracle: Single multicolumn index or two single column indexes
It depends...
It is quite unlikely that an index on just column1
will be beneficial if you already have a composite index on column1, column2
. Since column1
is the leading index, queries against the table that have only column1
as a predicate will be able to use the composite index. If you are frequently running queries that need to do a full scan of the index and the presence of column2
substantially increases the size of the index, it is possible that an index on just column1
would be more efficient since the full index scan would need to do less I/O. But that is a pretty unusual situation.
An index on just column2
may be beneficial if some of your queries against the table specify predicates on just column2
. If there are relatively few distinct values of column1
, it is possible that Oracle could do an index skip scan using the composite index to satisfy queries that only specify column2
as a predicate. But a skip scan is likely to be much less efficient than a range scan so it is reasonably likely that an index on just column2
would benefit those queries. If there are a large number of distinct values for column1
, the skip scan would be even less efficient and an index on just column2
would be more beneficial. Of course, if you never query the table using column2
without also specifying a predicate on column1
, you wouldn't need an index on just column2
.
Oracle multiple vs single column index
Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
- Ordering of rows (e.g. if PK is incremented only then all values will be appended)
- Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique
modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C
then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.
Oracle SQL: Single Index with two Columns vs index on one Column
For this particular query, you want the two column index version:
create index ind_tableb on tableb (data, id);
The above index, if used, would let Oracle rapidly lookup tabled.data
values for a potential match with a tableb.data
value. If a match be found, then the same index would also contain the tableb.ID
value for the next join to tablec
. If you just used the single column version on tableb.data
alone, then Oracle would have to seek back to the tableb
table to find the ID
values. This could hurt performance and might even cause the index to not be used.
How does a multi-column index work in oracle?
You can think of the index key as conceptually being the 'concatenation' of the all of the columns, and generally you need to have a leading element of that key in order to get benefit from the index. So for an index on (company,store,sku) then
WHERE s.company = 1 AND s.store = 1 AND s.sku = 123;
can potentially benefit from the index
WHERE s.store = 1 AND s.sku = 123;
is unlikely to benefit (but see footnote below)
WHERE s.company = 1 AND s.store = 1;
can potentially benefit from the index.
In all cases, I say "potentially" etc, because it is a costing decision by the optimizer. For example, if I only have (say) 2 companies and 2 stores then a query on company and store, whilst it could use the index is perhaps better suited to not to do so, because the volume of information to be queried is still a large percentage of the size of the table.
In your example, it might be the case that an index on (store,sku,company) would be "good enough" to satisfy all three, but that depends on the distribution of data. But you're thinking the right way, ie, get as much value from as few indexes as possible.
Footnote: There is a thing called a "skip scan" where we can get value from an index even if you do not specify the leading column(s), but you will typically only see that if the number of distinct values in those leading columns is low.
Single-column vs. multi-column index for separate but always together joins
It depends on the join type chosen:
With a nested loop join, an index on the join condition of the lookup tables would help.
For a hash join, no index helps.
For a merge join, an index on the join condition of thr lookup table may help.
It all depends on the cardinalities.
A multi-column index is definitely the wrong thing.
Two single-column indexes vs one two-column index in MySQL?
If you have two single column indexes, only one of them will be used in your example.
If you have an index with two columns, the query might be faster (you should measure). A two column index can also be used as a single column index, but only for the column listed first.
Sometimes it can be useful to have an index on (A,B) and another index on (B). This makes queries using either or both of the columns fast, but of course uses also more disk space.
When choosing the indexes, you also need to consider the effect on inserting, deleting and updating. More indexes = slower updates.
Multiple indexes vs single index on multiple columns in postgresql
Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).
Note, that queries like ... WHERE timestamp BETWEEN smth AND smth
will not use the index above while queries like ... WHERE location = 'smth'
or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth
will. It's because the first attribute in index is crucial for searching and sorting.
Don't forget to perform
ANALYZE;
after index creation in order to collect statistics.
Update:
As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR
clauses like a = 123 OR b = 456
(assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND
queries but instead of union there would be an intersection.
Multiple Indexes vs Multi-Column Indexes
I agree with Cade Roux.
This article should get you on the right track:
- Indexes in SQL Server 2005/2008 – Best Practices, Part 1
- Indexes in SQL Server 2005/2008 – Part 2 – Internals
One thing to note, clustered indexes should have a unique key (an identity column I would recommend) as the first column.
Basically it helps your data insert at the end of the index and not cause lots of disk IO and Page splits.
Secondly, if you are creating other indexes on your data and they are constructed cleverly they will be reused.
e.g. imagine you search a table on three columns
state, county, zip.
- you sometimes search by state only.
- you sometimes search by state and county.
- you frequently search by state, county, zip.
Then an index with state, county, zip. will be used in all three of these searches.
If you search by zip alone quite a lot then the above index will not be used (by SQL Server anyway) as zip is the third part of that index and the query optimiser will not see that index as helpful.
You could then create an index on Zip alone that would be used in this instance.
By the way We can take advantage of the fact that with Multi-Column indexing the first index column is always usable for searching and when you search only by 'state' it is efficient but yet not as efficient as Single-Column index on 'state'
I guess the answer you are looking for is that it depends on your where clauses of your frequently used queries and also your group by's.
The article will help a lot. :-)
Related Topics
How to Test My Ad-Hoc SQL with Parameters in Postgres Query Window
SQL How to Search a Many to Many Relationship
Recursive Cte Stop Condition for Loops
Datename(Month,Getadate()) Is Returning Numeric Value of the Month as '09'
Which Orm Frameworks Will Build and Execute the SQL Ddl for You
Get All Punch in and Out for Each Employee
How to Remove Duplicate Rows Except One
Get Count of Items and Their Values in One Column
Error: Query Has No Destination for Result Data
Conditional Stored Procedure With/Without Passing Parameter
Orderby in SQL Server to Put Positive Values Before Negative Values
Xquery - How to Use the SQL:Variable in 'Value()' Function
Cascade Delete in Many-To-Many Self-Reference Table
SQL Server Normalization Tactic: Varchar VS Int Identity
Tips and Tricks to Speed Up an SQL
Sort String as Number in SQL Server
Syntax Error (Missing Operator) in Query Expression in Ms Access