Is There a Difference Between a Select Statement Inside a Transaction and One That Is Outside of It

Is there a difference between a SELECT statement inside a transaction and one that is outside of it?

Yes, the one inside the transaction can see changes made by other previous Insert/Update/delete statements in that transaction; a Select statement outside the transaction cannot.

If all you are asking about is what the Isolation Level does, then understand that all Select statements (hey, all statements of any kind) - are in a transaction. The only difference between one that is explicitly in a transaction and one that is standing on its own is that the one that is standing alone starts its transaction immediately before it executes it, and commits or roll back immediately after it executes;

whereas the one that is explicitly in a transaction can (because it has a Begin Transaction statement) can have other statements (inserts/updates/deletes, whatever) occurring within that same transaction, either before or after that Select statement.

So whatever the isolation level is set to, both selects (inside or outside an explicit transaction) will nevertheless be in a transaction which is operating at that isolation level.

Addition:
The following is for SQL Server, but all databases MUST work in the same way. In SQL Server the Query Processor is always in one of 3 Transaction Modes, AutoCommit, Implicit, or Explicit.

  • AutoCommit is the default transaction management mode of the SQL Server Database Engine. .. Every Transact-SQL statement is committed or rolled back when it completes. ... If a statement completes successfully, it is committed; if it encounters any error, it is rolled back. This is the default, and is the answer to @Alex's question in the comments.

  • In Implicit Transaction mode, "... the SQL Server Database Engine automatically starts a new transaction after the current transaction is committed or rolled back. You do nothing to delineate the start of a transaction; you only commit or roll back each transaction. Implicit transaction mode generates a continuous chain of transactions. ..." Note that the italicized snippet is for each transaction, whether it be a single or multiple statement transaction.

  • The engine is placed in Explicit Transaction mode when you explicitly initiate a transaction with BEGIN TRANSACTION Statement. Then, every statement is executed within that transaction until you explicitly terminate the transaction (with COMMIT or ROLLBACK) or if a failure occurs that causes the engine to terminate and Rollback.

What's the point to enclose select statements in a transaction?

You're right: at the standard isolation level, read committed, you do not need to wrap select statements in transactions. Select statements will be protected from dirty reads whether you wrap them in a transaction or not.

connection 1:                          connection 2:

begin transaction
update user set name = 'Bill' where id = 1
select name from users where id = 1
rollback transaction

The select statement will not read the rolled-back update: it doesn't matter that they are not wrapped in a transaction.

If you need repeatable reads, then wrapping selects in a default transaction doesn't help:

connection 1:                          connection 2:

begin transaction
select name from users where id = 1
update user set name = 'Bill' where id = 1
select name from users where id = 1
commit transaction

The begin and commit statements won't help here: the second select may read the old name, or it may read the new name.

However, if you run at a higher isolation level, like serializable or repeatable read, the group will be protected from non-repeatable reads:

connection 1:                          connection 2:

set transaction isolation level
repeatable read
begin transaction
select name from users where id = 1
update user set name = 'Bill' where id = 1
select name from users where id = 1 |
commit transaction |
|--> executed here

In this scenario, the update will block until the first transaction is complete.

Higher isolation levels are rarely used because they lower the number of people that can work in the database at the same time. At the highest level, serializable, a reporting query halts any update activity.

SQL Server transaction and SELECT statement

try doing this and you will understand:

Open a two new queries on SSMS (lets call it A and B from now one) and on A, create a simple table like this:

create table transTest(id int)
insert into transTest values(1)

now, do the following:

do select * from transTest in both of them. You will see the value 1

On A run:

set transaction isolation level read committed

On B run:

begin transaction
insert into transTest values(2)

On A run:

select * from transTest

you will see that the query wont finish because it is locked by the transaction on A

On B run:

commit transaction

Go back to A and you will see that the query finished

Repeat the test with
set transaction isolation level read uncommitted on A
you will see that the query wont be locked by the transaction

What is the difference between a query and transaction in SQL?

A query usually refers to a SELECT statement, but may also refer to data modifications as well -- UPDATE, INSERT, DELETE, and MERGE are common DML ("data modification language") statements. Personally, I would call these four operations DML statements and reserve query for SELECT statements; I find this a useful distinction.

Databases implement a set of properties called ACID properties. These basically say that any SQL statement sees consistent data, regardless of what else is going on in the database. A simple way to think of about these is that all operations are serialized -- one statement completes before another begins, even in a multi-user environment. Serialization guarantees that operations are isolated from each other. In practice, serialization is very expensive and databases have other mechanisms to ensure integrity, but it is a useful abstraction when learning about the concepts.

What are transactions? Transactions are the mechanism that databases use to ensure the integrity of the data when data is being modified. Transactions often consist of one statement that modifies the data. But that is not necessary. In fact, you can have a complex set of data transformations within a single transaction.

The three key operations on a transactions are:

  • BEGIN TRANSACTION: Tell the database that a transaction is beginning. All changes within the transaction are invisible to other users while the transaction is "active".
  • COMMIT TRANSACTION: Make all the changes visible in the database. Conceptually, this happens instantaneously. The database guarantees that other users will not see partial changes to the data (i.e. enforces data integrity).
  • ROLLBACK TRANSACTION: Undo all the work. No other users ever see the changes.

Note: Many databases have options to weaken the data integrity. These can be useful for performance reasons for users who know what they are doing.

I should note that if all operations on a database are SELECT statements, then transactions are not necessary. The data is not changing, so the view of the data is consistent. So transactions are generally associated with DML statements.

Use transactions for select statements?

In a highly concurrent application it could (theoretically) happen that data you've read in the first select is modified before the other selects are executed.

If that is a situation that could occur in your application you should use a transaction to wrap your selects. Make sure you pick the correct isolation level though, not all transaction types guarantee consistent reads.

Update :
You may also find this article on concurrent update/insert solutions (aka upsert) interesting. It puts several common methods of upsert to the test to see what method actually guarantees data is not modified between a select and the next statement. The results are, well, shocking I'd say.

What does a transaction around a single statement do?

It does nothing. All individual SQL Statements, (with rare exceptions like Bulk Inserts with No Log, or Truncate Table) are automaticaly "In a Transaction" whether you explicitly say so or not.. (even if they insert, update, or delete millions of rows).

EDIT: based on @Phillip's comment below... In current versions of SQL Server, Even Bulk Inserts and Truncate Table do write some data to the transaction log, although not as much as other operations do. The critical distinction from a transactional perspective, is that in these other types of operations, the data in your database tables being modified is not in the log in a state that allows it to be rolled back.

All this means is that the changes the statement makes to data in the database are logged to the transaction log so that they can be undone if the operation fails.

The only function that the "Begin Transaction", "Commit Transaction" and "RollBack Transaction" commands provide is to allow you to put two or more individual SQL statements into the same transaction.

EDIT: (to reinforce marks comment...) YES, this could be attributed to "superstitious" programming, or it could be an indication of a fundamental misunderstanding of the nature of database transactions. A more charitable interpretation is that it is simply the result of an over-application of consistency which is inappropriate and yet another example of Emersons euphemism that:

A foolish consistency is the hobgoblin of little minds,

adored by little statesmen and philosophers and divines

Does Postgresql implicitly wrap select statements in transaction?

Does this mean that even select statement will be implicitly wrapped in transaction and there is no way to work with postgres out of transaction scope?

Yes.

Transactions aren't just about writes, but also isolation level and what SELECTs can read. For example, at the usual isolation level, a SELECT can't read uncommitted writes. It also can't read the updates done from transactions that commit while the SELECT is running. Otherwise it wouldn't have a coherent view of the database and basically everything would break. For example it would look up in an index, then go get the corresponding row, and ithe row could have changed and no longer correspond to the index.

That requires a way to take a "snapshot" of the database right before the SELECT begins to execute, and that is one important function of the transaction mechanism.

What kind of lock is placed for SELECT statement within a transaction in SQL Server

A SELECT will always place a shared lock - unless you use the WITH (NOLOCK) hint (then no lock will be placed), use a READ UNCOMMITTED transaction isolation level (same thing), or unless you specifically override it with query hints like WITH (XLOCK) or WITH (UPDLOCK).

A shared lock allows other reading processes to also acquire a shared lock and read the data - but they prevent exclusive locks (for insert, delete, update operations) from being acquired.

In this case, with just three rows selected, there will be no lock escalation (that only happens when more than 5000 locks are being acquired by a single transaction).

Depending on the transaction isolation level, those shared locks will be held for different amounts of times. With READ COMMITTED, the default level, the locks is released immediately after the data has been read, while with REPEATABLE READ or SERIALIZABLE levels, the locks will be held until the transaction is committed or rolled back.

Simplifying SELECT statement

Here is an attempt to help simplify. I stripped out all the [brackets] that really are not required unless you are running into something like a reserved keyword, or columns with spaces in their name (bad to begin with).

Anyhow, your main query had 3 instances of a select per ID. To eliminate that, I did a LEFT JOIN to a subquery that pulls all transfers of type < 2 AND JOINS to the price attempts ONCE. This way, the result will have already pre-aggregated the count(*) and Max(Created) done ONCE for the same basis of transfers in question with your WITH CTE declaration. So you dont have to keep running the 3 queries each time, and you dont have to query the entire table of ALL transfers, just those with same underlying type < 2 condition. The result subquery alias "PQ" (preQuery)

This now simplifies the readability of the outer WHERE clause from the redundant counts per Id.

WITH transferDateDiff AS 
(
SELECT
t1.Id,
t1.TransactionId,
t1.From,
t1.To,
t1.Value,
t1.Type,
t1.ContractAddress,
t1.TokenId,
t2.Hash,
t2.Timestamp,
ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) AS diff
FROM
dbo.Transfers t1
LEFT JOIN dbo.Transactions t2
ON t1.TransactionId = t2.Id
LEFT JOIN dbo.TokenPrices tp
ON t1.ContractAddress = tp.ContractAddress
AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
WHERE
t1.Type < 2
)

SELECT
tdd.Id,
tdd.TransactionId,
tdd.From,
tdd.To,
tdd.Value,
tdd.Type,
tdd.ContractAddress,
tdd.TokenId,
tdd.Hash,
tdd.Timestamp
FROM
transferDateDiff tdd
LEFT JOIN
( SELECT
t1.Id,
COUNT(*) Attempts,
MAX(tpa.Created) MaxCreated
FROM
dbo.Transfers t1
JOIN dbo.TokenPriceAttempts tpa
on t1.Id = tpa.TransferId
WHERE
t1.Type < 2
GROUP BY
t1.Id ) PQ
on tdd.Id = PQ.Id
WHERE
tdd.diff IS NULL
AND ( PQ.Attempts IS NULL
OR PQ.Attempts = 0
OR ( PQ.Attempts < 5
AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7
)
)

REVISED to remove the WITH CTE into a single query

SELECT
t1.Id,
t1.TransactionId,
t1.From,
t1.To,
t1.Value,
t1.Type,
t1.ContractAddress,
t1.TokenId,
t2.Hash,
t2.Timestamp
FROM
-- Now, this pre-query is left-joined to token price attempts
-- so ALL Transfers of type < 2 are considered
( SELECT
t1.Id,
coalesce( COUNT(*), 0 ) Attempts,
MAX(tpa.Created) MaxCreated
FROM
dbo.Transfers t1
LEFT JOIN dbo.TokenPriceAttempts tpa
on t1.Id = tpa.TransferId
WHERE
t1.Type < 2
GROUP BY
t1.Id ) PQ
-- Now, we can just directly join to transfers for the rest
JOIN dbo.Transfers t1
on PQ.Id = t1.Id
-- and the rest from the WITH CTE construct
LEFT JOIN dbo.Transactions t2
ON t1.TransactionId = t2.Id
LEFT JOIN dbo.TokenPrices tp
ON t1.ContractAddress = tp.ContractAddress
AND tp.Timestamp >= DATEADD(HOUR, - 3, t2.Timestamp)
AND tp.Timestamp <= DATEADD(HOUR, 3, t2.Timestamp)
WHERE
ABS( DATEDIFF( SECOND, tp.Timestamp, t2.Timestamp )) IS NULL
AND ( PQ.Attempts = 0
OR ( PQ.Attempts < 5
AND DATEDIFF(DAY, PQ.MaxCreated, CURRENT_TIMESTAMP ) >= 7 )
)


Related Topics



Leave a reply



Submit