What Are The Disadvantages of Having Many Indices

What are the disadvantages to Indexes in database tables?

One index on a table is not a big deal. You automatically have an index on columns (or combinations of columns) that are primary keys or declared as unique.

There is some overhead to an index. The index itself occupies space on disk and memory (when used). So, if space or memory are issues then too many indexes could be a problem. When data is inserted/updated/deleted, then the index needs to be maintained as well as the original data. This slows down updates and locks the tables (or parts of the tables), which can affect query processing.

A small number of indexes on each table are reasonable. These should be designed with the typical query load in mind. If you index every column in every table, then data modifications would slow down. If your data is static, then this is not an issue. However, eating up all the memory with indexes could be an issue.

What are the disadvantages of having many indices?

Indexes slow down inserts and updates (which can become a really serious issue with locking) and cost disk space. That's pretty much it.

Disadvantages of having many Indexes in a SQL database?

  • insert/update performance when indexed columns are modified will be worse
  • more indexes will use more disk space

Disadvantages of creating multiple indexes in a PostgreSQL table

Indexes have several disadvantages.

First, they consume space. This may be inconsequential, but if your table is particularly large, it may have an impact.

Second, and more importantly, you need to remember that indexes have a performance penalty when it comes to INSERTing new rows, DELETEing old ones or UPDATEing existing values of the indexed column, as now the DML statement need not only modify the table's data, but the index's one too. Once again, this depends largely on your application usecase. If DMLs are so rare that the performance is a non-issue, this may not be a consideration.

Third (although this ties in storngly to my first point), remember that each time you create another database object, you are creating an additional maintenance overhead - it's another index you'd have to occasionally rebuild, collect statistics for (depending on the RDBMS you're using, of course), another objet to clatter the data dictionary, etc.

The bottom line all comes down to your usecase. If you have important queries that you run often and that can be improved by this index - go for it. If you're running this query once in a blue moon, you probably wouldn't want to slow down all your INSERT statements.

Disadvantages of many indexes in MySQL

And is there any disadvantage of having this amount (or more than
this) of indexes in DB ?

I dont think that these amount of indexes will affect your performance.

However you may note that Indexes are good and speedy when using SELECT rather than INSERT.

Disadvantages of Index from [here][1] says that:

When an index is created on the column(s), MySQL also creates a
separate file that is sorted, and contains only the field(s) you're
interested in sorting on.

Firstly, the indexes take up disk space. Usually the space usage isn’t
significant, but because of creating index on every column in every
possible combination, the index file would grow much more quickly than
the data file. In the case when a table is of large table size, the
index file could reach the operating system’s maximum file size.

Secondly, the indexes slow down the speed of writing queries, such as
INSERT, UPDATE and DELETE.
Because MySQL has to internally maintain
the “pointers” to the inserted rows in the actual data file, so there
is a performance price to pay in case of above said writing queries
because every time a record is changed, the indexes must be updated.
However, you may be able to write your queries in such a way that do
not cause the very noticeable performance degradation.

[1]: spam link removed

Why and where to use INDEXes - pros and cons

Well you can probably fill books about indices but in short here a few things to think about, when creating an index:

While it (mostly) speeds up a select, it slows down inserts, updates and deletes because the database engine does not have to write the data only, but the index, too.
An index need space on hard disk (and much more important) in RAM. An index that can not be held in RAM is pretty useless.
An index on a column with only a few different values doesn't speed up selects, because it can not sort out much rows (for example a column "gender", which usually has only two different values - male, female).

If you use MySQL for example you can check, if the engine uses an index by adding "explain" before the select - for your above example EXPLAIN SELECT TestField FROM Example WHERE username=XXXX

Using more than one index per table is dangerous?

You need to create exactly as many indexes as you need to create. No more, no less. It is as simple as that.

Everybody "knows" that an index will slow down DML statements on a table. But for some reason very few people actually bother to test just how "slow" it becomes in their context. Sometimes I get the impression that people think that adding another index will add several seconds to each inserted row, making it a game changing business tradeoff that some fictive hotshot user should decide in a board room.

I'd like to share an example that I just created on my 2 year old pc, using a standard MySQL installation. I know you tagged the question SQL Server, but the example should be easily converted. I insert 1,000,000 rows into three tables. One table without indexes, one table with one index and one table with nine indexes.

drop table numbers;
drop table one_million_rows;
drop table one_million_one_index;
drop table one_million_nine_index;

/*
|| Create a dummy table to assist in generating rows
*/
create table numbers(n int);

insert into numbers(n) values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

/*
|| Create a table consisting of 1,000,000 consecutive integers
*/
create table one_million_rows as
select d1.n + (d2.n * 10)
+ (d3.n * 100)
+ (d4.n * 1000)
+ (d5.n * 10000)
+ (d6.n * 100000) as n
from numbers d1
,numbers d2
,numbers d3
,numbers d4
,numbers d5
,numbers d6;

/*
|| Create an empty table with 9 integer columns.
|| One column will be indexed
*/
create table one_million_one_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1)
);

/*
|| Create an empty table with 9 integer columns.
|| All nine columns will be indexed
*/
create table one_million_nine_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1), index(c2), index(c3)
,index(c4), index(c5), index(c6)
,index(c7), index(c8), index(c9)
);

/*
|| Insert 1,000,000 rows in the table with one index
*/
insert into one_million_one_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;

/*
|| Insert 1,000,000 rows in the table with nine indexes
*/
insert into one_million_nine_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;

My timings are:

  • 1m rows into table without indexes: 0,45 seconds
  • 1m rows into table with 1 index: 1,5 seconds
  • 1m rows into table with 9 indexes: 6,98 seconds

I'm better with SQL than statistics and math, but I'd like to think that:
Adding 8 indexes to my table, added (6,98-1,5) 5,48 seconds in total. Each index would then have contributed 0,685 seconds (5,48 / 8) for all 1,000,000 rows. That would mean that the added overhead per row per index would have been 0,000000685 seconds. SOMEBODY CALL THE BOARD OF DIRECTORS!

In conclusion, I'd like to say that the above test case doesn't prove a shit. It just shows that tonight, I was able to insert 1,000,000 consecutive integers into in a table in a single user environment. Your results will be different.

What is pro/con of having big ES index or several small ES indexs on same data?

There are two concepts here that you should understand -

  1. Sharding - Sharding is where we divide our data into various partitions and assign one partition of entire data to a seprate shard. Each shard can run on any different machine. This way we can delegate our work to different machines. Say we have 10 Million documents and 10 machines. We set a shard of 10 and creates an index. When we complete writing this 10 Million document to this index , each million document will go to a different shard. And hence we will have 10 shard , with each shard having one million documents each. The advantage of this architecture is that , when you actually search the documents , it happens parllely in each each shard. As in this case ,each shard has its own machine , we are able to utilize all 10 machines at once for searching and hence achieve maximum performance out of 10 machines.
  2. "One index having 10 shards is the same as 10 indices having one shard" - What actually counts is the number of shards. Index name is just an abstraction build over shards. Even if you execute search on multiple indices or single index , the performance is determined by number of shards the search has actually executed on.

By sharding , you are distributing all heavy operation like search and aggregation to various machines. If you are sure there wont be new documents at later point of time , and if you have 1T of documents and say 100 machines , the best approach would be to create a single index with 100 shards and then index the data in there.

Ideally one shard per machine is the best approach.

Answer to the comment

A single shard itself uses concurrency to the maximum. Hence it doesn't make sense to use multiple shards on same machine. An index is a collection of similar documents. In a different scenario , its partitioned based different logic to capture the distributed behavior better. For eg: , lets say i am storing the logs data of free subscribers in an index with only 2 shard but i might store this same data for paid users in a separate index with 10 shards. So that the performance of paid user is much better. So index can be though as different sets of documents , that has different semantic meaning.

So to answer the comment , different index are different sets of information having different semantic meaning. It can be seen something similar to database in SQL. Hence i might store my bank transaction information in one index and my grocery purchase information in another index.



Related Topics



Leave a reply



Submit