Insertion of Data After Creating Index on Empty Table or Creating Unique Index After Inserting Data on Oracle

Insertion of data after creating index on empty table or creating unique index after inserting data on oracle?

Insert your data first, then create your index.

Every time you do an UPDATE, INSERT or DELETE operation, any indexes on the table have to be updated as well. So if you create the index first, and then insert 10M rows, the index will have to be updated 10M times as well (unless you're doing bulk operations).

Is it better to create an index before filling a table with data, or after the data is in place?

Creating index after data insert is more efficient way (it even often recomended to drop index before batch import and after import recreate it).

Syntetic example (PostgreSQL 9.1, slow development machine, one million rows):

CREATE TABLE test1(id serial, x integer);
INSERT INTO test1(id, x) SELECT x.id, x.id*100 FROM generate_series(1,1000000) AS x(id);
-- Time: 7816.561 ms
CREATE INDEX test1_x ON test1 (x);
-- Time: 4183.614 ms

Insert and then create index - about 12 sec

CREATE TABLE test2(id serial, x integer);
CREATE INDEX test2_x ON test2 (x);
-- Time: 2.315 ms
INSERT INTO test2(id, x) SELECT x.id, x.id*100 FROM generate_series(1,1000000) AS x(id);
-- Time: 25399.460 ms

Create index and then insert - about 25.5 sec (more than two times slower)

Creating indexes after data load Vs Before data load in a large table

Given that your table is very wide, and the indexes very narrow, creating non-clustered indexes on the table following the load should be preferred.

In this instance I would have:

  1. Create the new table with the Clustered Index in place - this is because the process of converting a heap into a clustered index is computationally expensive.
  2. Load the data into the table, in the order of the clustered index SwapData_ID
  3. Using BULK INSERT (ensuring the operation is minimally logged), load into the table
  4. Create the non-clustered indexes

The above approach should be optimal given your scenario.

There is of course then other questions around:

Data drift (will the source data change during your load process? Do these changes need to be taken across)

DR (is log shipping enabled? In this case the recovery model may need to be changed to bulk-logged)

Log file sizing (You'll need to ensure your log file is big enough to accommodate for the non-clustered index creations)

Presizing the database (ensuring it doesn't auto-grow during the load)

but these all seem to be slightly outside the context of what you're asking.

create index on creation of table in Oracle

Other than indexes defined as part of a primary or unique constraint there does not appear to be a way to define an index as part of a CREATE TABLE statement in Oracle. Although the USING INDEX clause is part of the constraint-state element of the CREATE TABLE statement, a missing right parenthesis error is issued if you try to include a USING INDEX clause in any constraint definition except a PRIMARY or UNIQUE constraint - see this db<>fiddle for examples.

As to "why" - that's a question only someone on the architecture team at Oracle could answer. From my personal user-oriented point of view, I see no particular value to being able to create an index as part of the CREATE TABLE statement, but then I'm accustomed to how Oracle works and have my thought patterns oriented in that particular direction. YMMV.

Oracle 11G - Performance effect of indexing at insert

It's true that it is faster to modify a table if you do not also have to modify one or more indexes and possibly perform constraint checking as well, but it is also largely irrelevant if you then have to add those indexes. You have to consider the complete change to the system that you wish to effect, not just a single part of it.

Obviously if you are adding a single row into a table that already contains millions of rows then it would be foolish to drop and rebuild indexes.

However, even if you have a completely empty table into which you are going to add several million rows it can still be slower to defer the indexing until afterwards.

The reason for this is that such an insert is best performed with the direct path mechanism, and when you use direct path inserts into a table with indexes on it, temporary segments are built that contain the data required to build the indexes (data plus rowids). If those temporary segments are much smaller than the table you have just loaded then they will also be faster to scan and to build the indexes from.

the alternative, if you have five index on the table, is to incur five full table scans after you have loaded it in order to build the indexes.

Obviously there are huge grey areas involved here, but well done for:

  1. Questioning authority and general rules of thumb, and
  2. Running actual tests to determine the facts in your own case.

Edit:

Further considerations -- you run a backup while the indexes are dropped. Now, following an emergency restore, you have to have a script that verifies that all indexes are in place, when you have the business breathing down your neck to get the system back up.

Also, if you absolutely were determined to not maintain indexes during a bulk load, do not drop the indexes -- disable them instead. This preserves the metadata for the indexes existence and definition, and allows a more simple rebuild process. Just be careful that you do not accidentally re-enable indexes by truncating the table, as this will render disabled indexes enabled again.

Column with Unique Index and Primary Key gives Unique Constraint Violation

Please execute a simple statement:

select * from admin.message_list where id = 1;

If it wouldn't return rows, you should check if constraint PK_ID refers to ID column and if so, ask Oracle support to fix a bug.

Insert data into one table from another table avoiding duplicates

OK - from your description, I understand table t2 is currently empty, and you want to copy the rows where id is in (1, 2, 4) from table t1 to table t2.

Why your code fails:

You seem to believe that the condition is applied to the first row in t1, it passes so it is inserted into t2, then the condition is applied to the second row in t1 (using what is already inserted in t2), etc. - and you don't understand why there is any attempt to insert ALL the rows from t1 into t2. Why doesn't the third row fail the WHERE clause?

Good question! The reason is that operations are done on a SET basis. The WHERE condition uses table t2 AS IT WAS before the INSERT operation began. So for ALL rows, the WHERE clause compares to an empty table t2.

How to fix this... Decide which id you want to add when there are duplicate names. For example, one way to get the result you said you wanted is to select MIN(id) for each name. Moreover, you still want to check if the name exists in t2 already (since you may do this again in the future, when t2 is already partially populated).

insert into t2 ( id, name )
select min(id), name
from t1
where name not in (select name from t2)
group by name
;


Related Topics



Leave a reply



Submit