Are SQL Queries Guaranteed to Execute Atomically When Using Union

Are SQL queries guaranteed to execute atomically when using UNION?

Yes the statement is atomic but yes the data can change between the 2 reads.

Read Committed only guarantees that you don't read dirty data it promises nothing else about consistency of reads for that you would need a higher isolation level.

As you said that you would accept a SQL Server Example...

Connection 1

(Assumes under pessimistic read committed isolation level)

CREATE TABLE employee
(
name VARCHAR(50),
DepartmentID INT
)

CREATE TABLE department
(
DepartmentID INT
)

INSERT INTO department VALUES (1)
INSERT INTO employee VALUES ('bob',1)

declare @employee TABLE
(
name VARCHAR(50),
DepartmentID INT
)

WHILE ((SELECT COUNT(*) FROM @employee) < 2)
BEGIN
DELETE FROM  @employee

INSERT INTO @employee
SELECT employee.*
FROM   employee 
       LEFT JOIN department 
          ON employee.DepartmentID = department.DepartmentID
UNION
SELECT employee.*
FROM   employee
       RIGHT JOIN department
          ON employee.DepartmentID = department.DepartmentID

END;          

SELECT * FROM @employee

Connection 2

while (1=1)
UPDATE employee SET name = CASE WHEN name = 'bob' THEN 'bill' else 'bob' END

Now go back to connection 1

name                                               DepartmentID
-------------------------------------------------- ------------
bill                                               1
bob                                                1

(Remember to switch back to Connection 2 to kill it!)

The specific documentation covering this READ COMMITED behaviour is here

The shared lock type determines when
it will be released. Row locks are
released before the next row is
processed. Page locks are released
when the next page is read, and table
locks are released when the statement
finishes.

Is update with nested select atomic operation?

SELECT places shared locks on the rows read which then can be lifted in READ COMMITED isolation mode.

UPDATE places the update locks later promoted to exclusive locks. They are not lifted until the end of the transaction.

You should make the locks to retain as soon as they are placed.

You can do it by making the transaction isolation level REPEATABLE READ which will retain the shared locks until the end of the transaction and will prevent UPDATE part from locking these rows.

Alternatively, you can rewrite your query as this:

WITH    q AS
        (
        SELECT  TOP 10000 *
        FROM    mytable WITH (ROWLOCK, READPAST)
        WHERE   batch_id IS NULL
        ORDER BY
                date
        )
UPDATE  q
SET     batch_id = @myid

, which will just skip the locked rows.

Are queries inside a MySQL transaction guaranteed to be executed together?

You still have a possibility of a race condition, even though you execute the SELECT followed by the UPDATE in a single transaction. SELECT by itself does not lock anything, so you could have two concurrent sessions both SELECT and get the same id. Then both would attempt to UPDATE, but only one would "win" - the other would have to wait.

To get around this, use the SELECT...FOR UPDATE clause, which creates a lock on the rows it returns.

Prepare Transaction...

$id = SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1
FOR UPDATE;

This means that the lock is created as the row is selected. This is atomic, which means no other session can sneak in and get a lock on the same row. If they try, their transaction will block on the SELECT.

UPDATE companies
SET processing = 1
WHERE id = $id;

Commit Transaction

I changed your "execute transaction" pseudocode to "commit transaction." Statements within a transaction execute immediately, which means they create locks and so on. Then when you COMMIT, the locks are released and any changes are committed. Committed means they can't be rolled back, and they are visible to other transactions.

Here's a quick example of using mysqli to accomplish this:

$mysqli = new mysqli(...);
$mysqli->report_mode = MYSQLI_REPORT_STRICT; /* throw exception on error */

$mysqli->begin_transaction();
$sql = "SELECT id
    FROM companies
    WHERE processing = 0
    ORDER BY last_crawled ASC
    LIMIT 1
    FOR UPDATE";
$result = $mysqli->query($sql);
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
    $id = $row["id"]; 
}

$sql = "UPDATE companies
    SET processing = 1
    WHERE id = ?";
$stmt = $mysqli->prepare($sql);
$stmt->bind_param("i", $id);
$stmt->execute();

$mysqli->commit();

Re your comment:

I tried an experiment and created a table companies, filled it with 512 rows, then started a transaction and issues the SELECT...FOR UPDATE statement above. I did this in the mysql client, no need to write PHP code.

Then, before committing my transaction, I examined the locks reported:

mysql> show engine innodb status\G
=====================================
2013-12-04 16:01:28 7f6a00117700 INNODB MONITOR OUTPUT
=====================================
...
---TRANSACTION 30012, ACTIVE 2 sec
2 lock struct(s), heap size 376, 513 row lock(s)
...

Despite using LIMIT 1, this report shows transaction appears to lock every row in the table (plus 1, for some reason).

So you're right, if you have hundreds of requests per second, it's likely that the transactions are queuing up. You should be able to verify this by watching SHOW PROCESSLIST and seeing many processes stuck in a state of Locked (i.e. waiting for access to rows that another thread has locked).

If you have hundreds of requests per second, you may have outgrown the ability for an RDBMS to function as a fake message queue. This isn't what an RDBMS is good at.

There are a variety of scalable message queue frameworks with good integration with PHP, like RabbitMQ, STOMP, AMQP, Gearman, Beanstalk.

Check out http://www.slideshare.net/mwillbanks/message-queues-a-primer-international-php-conference-fall-2012

T-SQL Is a sub query for an Update restriction Atomic with the update?

If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?

In that case I'd use this approach.

;WITH cte As
(
SELECT TOP(@numberToProcess) 
            *
        FROM 
            ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST) 
        WHERE
            ToBeProcessed.IsBeingProcessed = 0
        ORDER BY 
            ToBeProcessed.Id, 
            ToBeProcessed.Priority DESC
)            
UPDATE 
    cte
SET
    IsBeingProcessed = 1
OUTPUT 
    INSERTED.Id 
INTO
    #IdsToProcess

I was a bit uncertain earlier whether SQL Server would take U locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N rows. This does not appear to be the case.

Test Table

CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)

INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2

Test Script (Run in 2 concurrent SSMS sessions)

BEGIN TRY
DECLARE @FinishedMessage VARBINARY (128) = CAST('TestFinished' AS  VARBINARY (128))
DECLARE @SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS  VARBINARY (128))
SET CONTEXT_INFO @SynchMessage

DECLARE @OtherSpid int

WHILE(@OtherSpid IS NULL)
SELECT @OtherSpid=spid 
FROM sys.sysprocesses 
WHERE context_info=@SynchMessage and spid<>@@SPID

SELECT @OtherSpid

DECLARE @increment INT = @@spid
DECLARE @number INT = @increment

WHILE (@number = @increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=@FinishedMessage))
UPDATE JobsToProcess 
SET @number=number +=@increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority 
                   FROM JobsToProcess 
                   WHERE isprocessed=0 
                   ORDER BY priority DESC)

SELECT * 
FROM JobsToProcess 
WHERE number not in (0,@OtherSpid,@@spid)
SET CONTEXT_INFO @FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO @FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH

Almost immediately execution stops as both concurrent transactions update the same row so the S locks taken whilst identifying the TOP 1 priority must get released before it aquires a U lock then the 2 transactions proceed to get the row U and X lock in sequence.

Heap

If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority) then deadlock occurs almost immediately instead as in this case the row S lock doesn't get released, one transaction aquires a U lock on the row and waits to convert it to an X lock and the other transaction is still waiting to convert its S lock to a U lock.

Clustered Index

If the query above is changed to use MIN rather than TOP

WHERE priority = (SELECT MIN(priority)
                   FROM JobsToProcess 
                   WHERE isprocessed=0 
                   )

Then SQL Server manages to completely eliminate the sub query from the plan and takes U locks all the way.

Sample Image

SQLite query problem, and atomicity of transactions

If I were you. I made a unique 'umembers' table.

I filled it with triggers. (members1 and members2 too) You can figure out the update trigger.

CREATE TRIGGER IF NOT EXISTS members1trigger
BEFORE INSERT ON members1 
BEGIN
    INSERT INTO umembers VALUES(new.uname, other properties);
END;

Then your update is soo

#!/usr/bin/env bash
sqlite3 test.db <<EOF

update members1 set uname=$2
where uname='$1' and not exists (select 1 from umembers where uname= $2);

EOF

Of course you should create unique constraint on umembers.
And if there are paralell update then second perhaps can throw exception.
But I think index locks it, and can't happen.
Security says use case insensitive logins.

https://security.stackexchange.com/questions/241701/case-sensitive-logins

Do databases always lock non-existent rows after a query or update?

Answering my own question since there seems to be a lot of confusion around the topic. It seems that:

-- BAD! DO NOT DO THIS! --
insert customer (email, count) 
select 'foo@example.com', 0
where not exists (
      select 1 from customer
      where email = 'foo@example.com'
)

is open to race-conditions (see Only inserting a row if it's not already there). From what I've been able to gather, the only portable solution to this problem:

Pick a key to merge against. This could be the primary key, or another unique key, but it must have a unique constraint.
Try to insert a new row. You must catch the error that will occur if the row already exists.
The hard part is over. At this point, the row is guaranteed to exist and you are protected from race-conditions by the fact that you are holding a write-lock on it (due to the insert from the previous step).
Go ahead and update if needed or select its primary key.

Oracle Union with order by clause issue

The underlying obstacle is that a SQL dataset is inherently UNordered. As soon as you UNION two datasets you lose any guarantee of ordering previously present.

You can often get away with the following structure, but it is STILL not guaranteed...

SELECT
    *
FROM
(
    (
        select to_char(first_name||'('||substr(last_name,0,1)||')') 
        from employees 
        order by first_name
    ) 
        Individuals

    UNION ALL

    (
        select to_char('There are total '||count(job_id)||' '||lower(job_id)||'s') 
        from employees
        group by job_id 
        order by count(job_id),job_id
    )
        Totals
)
    Combined
;

In practice you Often get what you want with this structure.

The brackets ensure the ordering is done before the UNION ALL and the database engine PROBABLY won't screw with the ordering.

But it MIGHT. The only way to Guarantee the order of the results is to put an ORDER BY on the outer query. Something like the following normally works fairly well...

SELECT
    rowdata
FROM
(
    (
        select 1 AS myset, rownum AS myrow, to_char(first_name||'('||substr(last_name,0,1)||')') AS rowdata
        from employees 
        order by first_name
    ) 
        Individuals

    UNION ALL

    (
        select 2 AS myset, rownum AS myrow, to_char('There are total '||count(job_id)||' '||lower(job_id)||'s') 
        from employees
        group by job_id 
        order by count(job_id),job_id
    )
        Totals
)
    Combined
ORDER BY
    myset, myrow
;

I'm on my phone, so there may be typos, but the sentiment is there...

Use ROWNUM or ROW_NUMBER() to generate an extra field in your data sets. Then union them. Then order by your new field(s).

Are SQL Queries Guaranteed to Execute Atomically When Using Union