Difference between truncation, transaction and deletion database strategies
The database cleaning strategies refer to database terminology. I.e. those terms come from the (SQL) database world, so people generally familiar with database terminology will know what they mean.
The examples below refer to SQL definitions. DatabaseCleaner
however supports other non-SQL types of databases too, but generally the definitions will be the same or similar.
Deletion
This means the database tables are cleaned using the SQL DELETE FROM
statement. This is usually slower than truncation, but may have other advantages instead.
Truncation
This means the database tables are cleaned using the TRUNCATE TABLE
statement. This will simply empty the table immediately, without deleting the table structure itself or deleting records individually.
Transaction
This means using BEGIN TRANSACTION
statements coupled with ROLLBACK
to roll back a sequence of previous database operations. Think of it as an "undo button" for databases. I would think this is the most frequently used cleaning method, and probably the fastest since changes need not be directly committed to the DB.
Example discussion: Rspec, Cucumber: best speed database clean strategy
Reason for truncation strategy with Capybara
The best explanation was found in the Capybara docs themselves:
# Transactional fixtures do not work with Selenium tests, because Capybara
# uses a separate server thread, which the transactions would be hidden
# from. We hence use DatabaseCleaner to truncate our test database.
Cleaning requirements
You do not necessarily have to clean your database after each test case. However you need to be aware of side effects this could have. I.e. if you create, modify, or delete some records in one step, will the other steps be affected by this?
Normally RSpec runs with transactional fixtures turned on, so you will never notice this when running RSpec - it will simply keep the database automatically clean for you:
https://www.relishapp.com/rspec/rspec-rails/v/2-10/docs/transactions
What's the difference between TRUNCATE and DELETE in SQL
Here's a list of differences. I've highlighted Oracle-specific features, and hopefully the community can add in other vendors' specific difference also. Differences that are common to most vendors can go directly below the headings, with differences highlighted below.
General Overview
If you want to quickly delete all of the rows from a table, and you're really sure that you want to do it, and you do not have foreign keys against the tables, then a TRUNCATE is probably going to be faster than a DELETE.
Various system-specific issues have to be considered, as detailed below.
Statement type
Delete is DML, Truncate is DDL (What is DDL and DML?)
Commit and Rollback
Variable by vendor
SQL*Server
Truncate can be rolled back.
PostgreSQL
Truncate can be rolled back.
Oracle
Because a TRUNCATE is DDL it involves two commits, one before and one after the statement execution. Truncate can therefore not be rolled back, and a failure in the truncate process will have issued a commit anyway.
However, see Flashback below.
Space reclamation
Delete does not recover space, Truncate recovers space
Oracle
If you use the REUSE STORAGE clause then the data segments are not de-allocated, which can be marginally more efficient if the table is to be reloaded with data. The high water mark is reset.
Row scope
Delete can be used to remove all rows or only a subset of rows. Truncate removes all rows.
Oracle
When a table is partitioned, the individual partitions can be truncated in isolation, thus a partial removal of all the table's data is possible.
Object types
Delete can be applied to tables and tables inside a cluster. Truncate applies only to tables or the entire cluster. (May be Oracle specific)
Data Object Identity
Oracle
Delete does not affect the data object id, but truncate assigns a new data object id unless there has never been an insert against the table since its creation Even a single insert that is rolled back will cause a new data object id to be assigned upon truncation.
Flashback (Oracle)
Flashback works across deletes, but a truncate prevents flashback to states prior to the operation.
However, from 11gR2 the FLASHBACK ARCHIVE feature allows this, except in Express Edition
Use of FLASHBACK in Oracle
http://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS638
Privileges
Variable
Oracle
Delete can be granted on a table to another user or role, but truncate cannot be without using a DROP ANY TABLE grant.
Redo/Undo
Delete generates a small amount of redo and a large amount of undo. Truncate generates a negligible amount of each.
Indexes
Oracle
A truncate operation renders unusable indexes usable again. Delete does not.
Foreign Keys
A truncate cannot be applied when an enabled foreign key references the table. Treatment with delete depends on the configuration of the foreign keys.
Table Locking
Oracle
Truncate requires an exclusive table lock, delete requires a shared table lock. Hence disabling table locks is a way of preventing truncate operations on a table.
Triggers
DML triggers do not fire on a truncate.
Oracle
DDL triggers are available.
Remote Execution
Oracle
Truncate cannot be issued over a database link.
Identity Columns
SQL*Server
Truncate resets the sequence for IDENTITY column types, delete does not.
Result set
In most implementations, a DELETE
statement can return to the client the rows that were deleted.
e.g. in an Oracle PL/SQL subprogram you could:
DELETE FROM employees_temp
WHERE employee_id = 299
RETURNING first_name,
last_name
INTO emp_first_name,
emp_last_name;
Transaction vs Truncation Database Cleaner
Putting it in a very simple way: truncation removes all data from the database and transaction rollbacks all changes that has been made by the running scenario.
What's the difference between TRUNCATE and DELETE in SQL
Here's a list of differences. I've highlighted Oracle-specific features, and hopefully the community can add in other vendors' specific difference also. Differences that are common to most vendors can go directly below the headings, with differences highlighted below.
General Overview
If you want to quickly delete all of the rows from a table, and you're really sure that you want to do it, and you do not have foreign keys against the tables, then a TRUNCATE is probably going to be faster than a DELETE.
Various system-specific issues have to be considered, as detailed below.
Statement type
Delete is DML, Truncate is DDL (What is DDL and DML?)
Commit and Rollback
Variable by vendor
SQL*Server
Truncate can be rolled back.
PostgreSQL
Truncate can be rolled back.
Oracle
Because a TRUNCATE is DDL it involves two commits, one before and one after the statement execution. Truncate can therefore not be rolled back, and a failure in the truncate process will have issued a commit anyway.
However, see Flashback below.
Space reclamation
Delete does not recover space, Truncate recovers space
Oracle
If you use the REUSE STORAGE clause then the data segments are not de-allocated, which can be marginally more efficient if the table is to be reloaded with data. The high water mark is reset.
Row scope
Delete can be used to remove all rows or only a subset of rows. Truncate removes all rows.
Oracle
When a table is partitioned, the individual partitions can be truncated in isolation, thus a partial removal of all the table's data is possible.
Object types
Delete can be applied to tables and tables inside a cluster. Truncate applies only to tables or the entire cluster. (May be Oracle specific)
Data Object Identity
Oracle
Delete does not affect the data object id, but truncate assigns a new data object id unless there has never been an insert against the table since its creation Even a single insert that is rolled back will cause a new data object id to be assigned upon truncation.
Flashback (Oracle)
Flashback works across deletes, but a truncate prevents flashback to states prior to the operation.
However, from 11gR2 the FLASHBACK ARCHIVE feature allows this, except in Express Edition
Use of FLASHBACK in Oracle
http://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS638
Privileges
Variable
Oracle
Delete can be granted on a table to another user or role, but truncate cannot be without using a DROP ANY TABLE grant.
Redo/Undo
Delete generates a small amount of redo and a large amount of undo. Truncate generates a negligible amount of each.
Indexes
Oracle
A truncate operation renders unusable indexes usable again. Delete does not.
Foreign Keys
A truncate cannot be applied when an enabled foreign key references the table. Treatment with delete depends on the configuration of the foreign keys.
Table Locking
Oracle
Truncate requires an exclusive table lock, delete requires a shared table lock. Hence disabling table locks is a way of preventing truncate operations on a table.
Triggers
DML triggers do not fire on a truncate.
Oracle
DDL triggers are available.
Remote Execution
Oracle
Truncate cannot be issued over a database link.
Identity Columns
SQL*Server
Truncate resets the sequence for IDENTITY column types, delete does not.
Result set
In most implementations, a DELETE
statement can return to the client the rows that were deleted.
e.g. in an Oracle PL/SQL subprogram you could:
DELETE FROM employees_temp
WHERE employee_id = 299
RETURNING first_name,
last_name
INTO emp_first_name,
emp_last_name;
Database Cleaner: Clean vs truncation
What is the difference between the following?
- DatabaseCleaner.clean_with(:truncation)
- DatabaseCleaner.clean
The difference is pretty straightforward: in the first case you're telling DatabaseCleaner
to clean your db now with truncation
strategy, and in the second case DatabaseCleaner
will clean your db using currently configured strategy.
I think your setup is pretty good already. Since creating a ton of factories (as you said) in before(:all)
hook is quite rare, you just need to add to that specific test after(:all)
hook to put the db back to stable state.
Cleaning with transaction won't work, since before(:all)
is not wrapped in transaction.
You're left with 2 options here:
after(:all) { DatabaseCleaner.with(:truncation) }
after(:all) { DatabaseCleaner.with(:deletion) }
In order to choose between these two, as documentation clearly states, you have to measure and choose what's fastest for you, or just pick some if it doesn't matter.
How to truncation all data in a schema different from the Public (database_cleaner)
I created own class to clear test database
class CleanTestDatabase
TABLE_TO_EXCLUDE = ['spatial_ref_sys', 'schema_migrations']
CONNECTION = ActiveRecord::Base.connection
def self.clean(*tenants)
tenants.each{ |tenant| delete_all_in_tenant(tenant) }
end
def self.drop_all_schemas
schemas = ActiveRecord::Base.connection.select_values <<-SQL
SELECT
schema_name
FROM
information_schema.schemata
WHERE
schema_name NOT IN ('information_schema','public', 'postgis') AND
schema_name NOT LIKE 'pg%'
SQL
schemas.each { |schema| Apartment::Tenant.drop(schema) }
end
private
def self.delete_all_in_tenant(tenant)
CONNECTION.disable_referential_integrity do
tables_to_clean(tenant).each do |table|
delete_from(table) if table_has_new_rows?(table)
end
end
end
def self.tables_to_clean(tenant)
tables = CONNECTION.tables - TABLE_TO_EXCLUDE
tables.map{ |table| "#{tenant}.#{table}" }
end
def self.table_has_new_rows?(table_name)
CONNECTION.select_value("SELECT count(*) FROM #{table_name}").to_i > 0
end
def self.delete_from(table_name)
CONNECTION.execute("DELETE FROM #{table_name}")
end
end
spec/rails_helper.rb
config.before(:each) do
CleanTestDatabase.clean('public', 'app')
Apartment::Tenant.switch!('app')
end
Is the database trucation strategy will force every thread to use the same database?
There is only one database independent of what database cleaner strategy you are using and RSpec and Capybara always run in the same thread.
The SO question you are referring to is discussing the fact that the Selenium server is run in a separate thread, which (normally) implies a separate database connection which implies a separate transaction from the transaction used by Capybara/RSpec.
Postgresql Truncation speed
This has come up a few times recently, both on SO and on the PostgreSQL mailing lists.
The TL;DR for your last two points:
(a) The bigger shared_buffers may be why TRUNCATE is slower on the CI server. Different fsync configuration or the use of rotational media instead of SSDs could also be at fault.
(b) TRUNCATE
has a fixed cost, but not necessarily slower than DELETE
, plus it does more work. See the detailed explanation that follows.
UPDATE: A significant discussion on pgsql-performance arose from this post. See this thread.
UPDATE 2: Improvements have been added to 9.2beta3 that should help with this, see this post.
Detailed explanation of TRUNCATE
vs DELETE FROM
:
While not an expert on the topic, my understanding is that TRUNCATE
has a nearly fixed cost per table, while DELETE
is at least O(n) for n rows; worse if there are any foreign keys referencing the table being deleted.
I always assumed that the fixed cost of a TRUNCATE
was lower than the cost of a DELETE
on a near-empty table, but this isn't true at all.
TRUNCATE table;
does more than DELETE FROM table;
The state of the database after a TRUNCATE table
is much the same as if you'd instead run:
DELETE FROM table;
VACCUUM (FULL, ANALYZE) table;
(9.0+ only, see footnote)
... though of course TRUNCATE
doesn't actually achieve its effects with a DELETE
and a VACUUM
.
The point is that DELETE
and TRUNCATE
do different things, so you're not just comparing two commands with identical outcomes.
A DELETE FROM table;
allows dead rows and bloat to remain, allows the indexes to carry dead entries, doesn't update the table statistics used by the query planner, etc.
A TRUNCATE
gives you a completely new table and indexes as if they were just CREATE
ed. It's like you deleted all the records, reindexed the table and did a VACUUM FULL
.
If you don't care if there's crud left in the table because you're about to go and fill it up again, you may be better off using DELETE FROM table;
.
Because you aren't running VACUUM
you will find that dead rows and index entries accumulate as bloat that must be scanned then ignored; this slows all your queries down. If your tests don't actually create and delete all that much data you may not notice or care, and you can always do a VACUUM
or two part-way through your test run if you do. Better, let aggressive autovacuum settings ensure that autovacuum does it for you in the background.
You can still TRUNCATE
all your tables after the whole test suite runs to make sure no effects build up across many runs. On 9.0 and newer, VACUUM (FULL, ANALYZE);
globally on the table is at least as good if not better, and it's a whole lot easier.
IIRC Pg has a few optimisations that mean it might notice when your transaction is the only one that can see the table and immediately mark the blocks as free anyway. In testing, when I've wanted to create bloat I've had to have more than one concurrent connection to do it. I wouldn't rely on this, though.
DELETE FROM table;
is very cheap for small tables with no f/k refs
To DELETE
all records from a table with no foreign key references to it, all Pg has to do a sequential table scan and set the xmax
of the tuples encountered. This is a very cheap operation - basically a linear read and a semi-linear write. AFAIK it doesn't have to touch the indexes; they continue to point to the dead tuples until they're cleaned up by a later VACUUM
that also marks blocks in the table containing only dead tuples as free.
DELETE
only gets expensive if there are lots of records, if there are lots of foreign key references that must be checked, or if you count the subsequent VACUUM (FULL, ANALYZE) table;
needed to match TRUNCATE
's effects within the cost of your DELETE
.
In my tests here, a DELETE FROM table;
was typically 4x faster than TRUNCATE
at 0.5ms vs 2ms. That's a test DB on an SSD, running with fsync=off
because I don't care if I lose all this data. Of course, DELETE FROM table;
isn't doing all the same work, and if I follow up with a VACUUM (FULL, ANALYZE) table;
it's a much more expensive 21ms, so the DELETE
is only a win if I don't actually need the table pristine.
TRUNCATE table;
does a lot more fixed-cost work and housekeeping than DELETE
By contrast, a TRUNCATE
has to do a lot of work. It must allocate new files for the table, its TOAST table if any, and every index the table has. Headers must be written into those files and the system catalogs may need updating too (not sure on that point, haven't checked). It then has to replace the old files with the new ones or remove the old ones, and has to ensure the file system has caught up with the changes with a synchronization operation - fsync() or similar - that usually flushes all buffers to the disk. I'm not sure whether the the sync is skipped if you're running with the (data-eating) option fsync=off
.
I learned recently that TRUNCATE
must also flush all PostgreSQL's buffers related to the old table. This can take a non-trivial amount of time with huge shared_buffers
. I suspect this is why it's slower on your CI server.
The balance
Anyway, you can see that a TRUNCATE
of a table that has an associated TOAST table (most do) and several indexes could take a few moments. Not long, but longer than a DELETE
from a near-empty table.
Consequently, you might be better off doing a DELETE FROM table;
.
--
Note: on DBs before 9.0, CLUSTER table_id_seq ON table; ANALYZE table;
or VACUUM FULL ANALYZE table; REINDEX table;
would be a closer equivalent to TRUNCATE
. The VACUUM FULL
impl changed to a much better one in 9.0.
Related Topics
Ruby on Rails Rmagick on Windows 7
Why Does Code Need to Be Reloaded in Rails 3
What's the Point of Argv in Ruby
Why Doesn't Minitest::Spec Have a Wont_Raise Assertion
Understanding the Gemfile.Lock File
How to Sort a Ruby Hash by Number Value
How to Get Sinatra to Auto-Reload the File After Each Change
Select Arrays Between Date Ranges with Ruby
Why Do I Get a Bcrypt-Ruby Gem Install Error
Can You Eval Code in the Context of a Caller in Ruby
Unexpected Output in Ruby on Rails
Why Do People Say That Ruby Is Slow
How to Map/Collect with Index in Ruby
Portable Ruby on Rails Environment
Variable Scope and Order of Parsing VS. Operations: Assignment in an "If"
Ruby on Rails: How to Explicitly Define Plural Names and Singular Names in Rails
Rails: How to Use Dependent: :Destroy in Rails
How to Detect Certain Unicode Characters in a String in Ruby