How to Deal with Concurrent Updates in Databases

How to deal with concurrent updates in databases?

Use transactions:

BEGIN WORK;
SELECT creds FROM credits WHERE userid = 1;
-- do your work
UPDATE credits SET creds = 150 WHERE userid = 1;
COMMIT;

Some important notes:

Not all database types support transactions. In particular, mysql's old default database engine (default before version 5.5.5), MyISAM, doesn't. Use InnoDB (the new default) if you're on mysql.
Transactions can abort due to reasons beyond your control. If this happens, your application must be prepared to start all over again, from the BEGIN WORK.
You'll need to set the isolation level to SERIALIZABLE, otherwise the first select can read data that other transactions have not committed yet(transactions arn't like mutexes in programming languages). Some databases will throw an error if there's concurrent ongoing SERIALIZABLE transactions, and you'll have to restart the transaction.
Some DBMS provide SELECT .. FOR UPDATE , which will lock the rows retreived by select until the transaction ends.

Combining transactions with SQL stored procedures can make the latter part easier to deal with; the application would just call a single stored procedure in a transaction, and re-call it if the transaction aborts.

How do I deal with concurrent changes in a web application?

To the best of my knowledge, there is no general solution to the problem.

The root of the problem is that the user may retrieve data and stare at it on the screen for a long time before making an update and saving.

I know of three basic approaches:

When the user reads the database, lock the record, and don't release until the user saves any updates. In practice, this is wildly impractical. What if the user brings up a screen and then goes to lunch without saving? Or goes home for the day? Or is so frustrated trying to update this stupid record that he quits and never comes back?
Express your updates as deltas rather than destinations. To take the classic example, suppose you have a system that records stock in inventory. Every time there is a sale, you must subtract 1 (or more) from the inventory count.

So say the present quantity on hand is 10. User A creates a sale. Current quantity = 10. User B creates a sale. He also gets current quantity = 10. User A enters that two units are sold. New quantity = 10 - 2 = 8. Save. User B enters one unit sold. New quantity = 10 (the value he loaded) - 1 = 9. Save. Clearly, something went wrong.

Solution: Instead of writing "update inventory set quantity=9 where itemid=12345", write "update inventory set quantity=quantity-1 where itemid=12345". Then let the database queue the updates. This is very different from strategy #1, as the database only has to lock the record long enough to read it, make the update, and write it. It doesn't have to wait while someone stares at the screen.

Of course, this is only useable for changes that can be expressed as a delta. If you are, say, updating the customer's phone number, it's not going to work. (Like, old number is 555-1234. User A says to change it to 555-1235. That's a change of +1. User B says to change it to 555-1243. That's a change of +9. So total change is +10, the customer's new number is 555-1244. :-) ) But in cases like that, "last user to click the enter key wins" is probably the best you can do anyway.

On update, check that relevant fields in the database match your "from" value. For example, say you work for a law firm negotiating contracts for your clients. You have a screen where a user can enter notes about negotiations. User A brings up a contract record. User B brings up the same contract record. User A enters that he just spoke to the other party on the phone and they are agreeable to the proposed terms. User B, who has also been trying to call the other party, enters that they are not responding to phone calls and he suspects they are stonewalling. User A clicks save. Do we want user B's comments to overwrite user A's? Probably not. Instead we display a message indicating that the notes have been changed since he read the record, and allowing him to see the new value before deciding whether to proceed with the save, abort, or enter something different.

[Note: the forum is automatically renumbering my numbered lists. I'm not sure how to override this.]

How to handle concurrent sql updates, given database structure can change at runtime

Transactions are the way to go when it comes to concurrent sql updates, in spring you can use a transaction manager.

As for the database structure, as far as I know MySql does not support transactions for DDL commands, that is if you change the structure concurrently with updating, you're likely to run into problems.

To handle multiple users working on the same data, you need to implement a manual "lock" or "version" field on the table to keep track of last updates.

Maintain integrity on concurrent updates of the same row

To answer this I think its best to remove the complexity of the goroutine (and, in fact, go at all) and focus on the SQL. Following are the SQL statements in the order they will be run (I have ignored everything after the error occurs as that is mostly irrelevant and the order of execution gets complex/variable!).

In the main routine

INSERT  INTO "product" ("code","price") VALUES ('A',1000) RETURNING "products"."id"

In GoRoutine

BEGIN TX1
SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE
DELETE FROM "product"  WHERE "product"."id" = 1

In the main routine

BEGIN TX2
SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE -- ERROR occurs here

on to your questions.

Question 1

If I use isolation level "ReadCommitted", I get a not found error -
this makes no sense to me, because I thought that a ReadCommitted
transaction can see updates applied by others.

From the docs for Read Committed Isolation Level:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time. However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row.

So the SELECT * FROM "product" WHERE (code = 'A') FOR UPDATE in TX2 will wait for TX1 to complete. At that point TX1 has deleted product A so the row is ignored and no results are returned. Now I understand that TX1 also recreates product A but remember that "a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began;" and as the select begun before TX1 recreated the record it will not be seen.

Question 2

If I use isolation level "Serializable", I get the error: pq: could
not serialize access due to concurrent update.

From the docs for Repeatable Read Isolation Level (Serializable is a higher level so these rules, plus some stricter ones, apply):

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the transaction
start time. However, such a target row might have already been updated
(or deleted or locked) by another concurrent transaction by the time
it is found. In this case, the repeatable read transaction will wait
for the first updating transaction to commit or roll back (if it is
still in progress). If the first updater rolls back, then its effects
are negated and the repeatable read transaction can proceed with
updating the originally found row. But if the first updater commits
(and actually updated or deleted the row, not just locked it) then the
repeatable read transaction will be rolled back with the message

In your code TX1 updates product A meaning that the query in TX2 will be delayed until TX1 commits at which time it will abort with the error (if TX1 rolled back then it would continue).

How can I make the second update happen?*

Maintaining transactional integrity is a hard problem and the functionality in PostgreSQL is the result of a lot of work by some very smart people. If you find yourself fighting the database its often a good idea to take a step back and consider whether you need to change your approach (or if the poblem you perceive is a real issue).

In your example you have two routines that delete and recreate the same record; I cannot forsee a situation when you would want both transactions to proceed. In a real system where this was possible you would not have carefully arranged timers to ensure one transaction starts first. This would mean that the state of the database after the transactions complete would depend upon which got to the SELECT * FROM "product" WHERE (code = 'A') FOR UPDATE first. So in reality it does not matter if one fails (because the result is pretty much random in any event); its actually a better result because you can advise the user (who can check the record and rerun the task if needed).

So prior to reading the rest of this I would suggest that you consider if this is a problem at all (I have no background on what you are trying to accomplish so its difficult to comment).

If you really want to ensure the update proceeds you have a few options:

If using "Serializable" you need to detect the failure and retry the transaction (if thats what the business logic demands)
If using "Read committed" then replace the DELETE/INSERT with an UPDATE (in that case PostgreSQL will re-evaluate the WHERE clause when the first transactions lock is released).

However I feel that a better approach is to do away with much of this and attempt to perform updates like this in a single step (which may mean bypassing the ORM). If you want to minimise the liklihood of issues like this then minimising the number/duration of locks is important and performing the operation in a single step helps considerably. For complicated operations using a stored procedure speeds things up but there is still a (reduced) chance of a conflict with other concurrently running operations.

You may also want to take a look at Optimistic Locking because in some cases this makes more sense (e.g. where you read info, display it to the user and wait for changes but in the meantime another user could have made changes).

What is a good approach for safe concurrent updates in a relational database?

The SERIALIZABLE transaction isolation certainly is the most certain way to achieve your goal but it could mean that performance will suffer.

There is one option you have not considered and that is to build your own semaphore.

You could create a static ConcurrentHashMap of items currently being processed and (at the start of each insert process - put a record and when done delete it.

Then each Thread process could consult this semaphore before starting inserts.

How to Deal with Concurrent Updates in Databases