Redundant data in update statements
Due to PostgreSQL MVCC, an UPDATE
is effectively much like a DELETE
plus an INSERT
. With the notable exception of toasted values - see:
- Does Postgres rewrite entire row on update?
(And minor differences for heap only tuples - DELETE
+ INSERT
starts a new HOT chain - but that has no bearing on the case at hand.)
To be precise, the "deleted" row is just invisible to any transaction starting after the delete has been committed, and vacuumed later. Therefore, on the database side, including index manipulation, there is in effect no difference between the two statements. (Exceptions apply, keep reading.) It increases network traffic a bit (depending on your data) and needs a bit of parsing.
I studied HOT updates some more after @araqnid's input and ran some tests. Updates on columns that don't actually change the value make no difference whatsoever as far as HOT updates are concerned. My answer holds. See details below.
This also applies to toasted attributes, since those are also not touched unless the values actually change.
However, if you use per-column triggers (introduced with pg 9.0), this may have undesired side effects!
I quote the manual on triggers:
... a command such as
UPDATE ... SET x = x ...
will fire a trigger on
columnx
, even though the column's value did not change.
Bold emphasis mine.
Abstraction layers are for convenience. They are useful for SQL-illiterate developers or if the application needs to be portable between different RDBMS. On the downside, they can butcher performance and introduce additional points of failure. I avoid them wherever possible.
HOT (Heap-only tuple) updates
Heap-Only Tuples were introduced with Postgres 8.3, with important improvements in 8.3.4 and 8.4.9.
The release notes for Postgres 8.3:
UPDATE
s andDELETE
s leave dead tuples behind, as do failedINSERT
s.
Previously onlyVACUUM
could reclaim space taken by dead tuples. With
HOT dead tuple space can be automatically reclaimed at the time of
INSERT
orUPDATE
if no changes are made to indexed columns. This
allows for more consistent performance. Also, HOT avoids adding
duplicate index entries.
Emphasis mine. And "no changes" includes cases where columns are updated with the same value as they already hold. I actually tested, as I wasn't sure.
Ultimately, the extensive README.HOT in the source code confirms it.
Toasted columns also don't stand in the way of HOT updates. The HOT-updated tuple just links to the same, unchanged tuple(s) in the toast fork of the relation. HOT updates even work with toasted values in the target list (actually changed or not). If toasted values are changed, it entails writes to the toast relation fork, obviously. I tested all of that, too.
Don't take my word for it, see for yourself. Postgres provides a couple of functions to check statistics. Run your UPDATE
with and without all columns and check if it makes any difference.
-- Number of rows HOT-updated in table:
SELECT pg_stat_get_tuples_hot_updated('table_name'::regclass::oid)
-- Number of rows HOT-updated in table, in the current transaction:
SELECT pg_stat_get_xact_tuples_hot_updated('table_name'::regclass::oid)
Or use pgAdmin. Select your table and inspect the "Statistics" tab in the main window.
Be aware that HOT updates are only possible when there is room for the new tuple version on the same page of the main relation fork. One simple way to force that condition is to test with a small table that holds only a few rows. Page size is typically 8k, so there must be free space on the page.
Update one of 2 duplicates in an sql server database table
Try This with CTE
and PARTITION BY
;WITH cte AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Column1 ORDER BY Column1 ) AS rno,
Column1
FROM Clients
)
UPDATE cte SET Column1 =Column1 +' 1 '
WHERE rno=2
Avoid redundant updates
Given the nature of how SQL works, this is exactly what you need to do. If you tell it:
update table_name
set field_one = 'one';
that means something entirely different in SQL than
update table_name
set field_one = 'one'
where field_one != 'one';
The database can only process what you told it to process, In the first case, because there is no where clause, you have told it to process all the records.
In the second case you have put a filter on it to process only some specific records.
It is up to the code writer not the database to determine the content of the query. If you didn't want every record updated, you should not have told it to do so. The database is quite literal about the commands you give it. Yes the second set of queries are longer becasue they are being more specific. They havea differnt meaning than the orginal queries. That is all to the good as it is far faster to update the ten records you are interested in than all 1,000,000 records in the table.
You need to get over the idea that longer is somehow a bad thing in database queries. Often it is a good thing as you are being more correct in what you are asking for. Your orginal queries were simply incorrect. And now you have pay the price to fix what was a systemically bad practice.
How to update the duplicate records instead of deleting them
(Can only answer based on the info you've provided on your question up to now)
You could group by id as you are already doing and get the max t_id_pk (assuming they are integers), then feed those max primary keys into an update query, excluding them from processing.
Use that update query to turn all the remaining t_version_ind into FALSE.
This is best explained in the following code:
UPDATE T1
SET t_version_ind = 'FALSE'
where t_id_pk NOT IN (
SELECT t_id_pk FROM
(
SELECT t_id, MAX(t_id_pk) as t_id_pk FROM T1
WHERE t_version_ind ='TRUE'
GROUP BY t_id
HAVING count(*)>1)
)
Update Query For Duplicate Records Oracle
You may use the count
analytic function with LPAD
SELECT card_no
,LPAD('D', count(civil_no) OVER (
PARTITION BY civil_no ORDER BY card_no
), 'D') || civil_no as output
FROM t;
Demo
It's not clear which table you want to update, you may do so with a correlated update using the above select or a MERGE INTO
UPDATE t t1
SET output = (SELECT output
FROM (SELECT card_no,
lpad('D', COUNT(civil_no)
over (
PARTITION BY civil_no
ORDER BY card_no ), 'D')
|| civil_no AS output
FROM t) t2
WHERE t1.card_no = t2.card_no);
Related Topics
Microsoft T-SQL to Oracle SQL Translation
Optional Arguments in Where Clause
Spark Replacement for Exists and In
SQL Server - Lack of Natural Join/X Join Y Using(Field)
Operation Not Allowed When the Object Is Closed When Running More Advanced Query
Backup a Single Table with Its Data from a Database in SQL Server 2008
Database in Use Error with Entity Framework 4 Code First
Slow Bulk Insert for Table with Many Indexes
How to Group by a Calculated Field
Group All Related Records in Many to Many Relationship, SQL Graph Connected Components
Ora-00972 Identifier Is Too Long Alias Column Name
"Order By" Using a Parameter for the Column Name
Hive - Unpivot Functionality in Hive
How to Combine Aggregate Functions in MySQL
How to Use Alias in Where Clause
Functions VS Stored Procedures