How to Bulk Update with SQL Server

Bulk Record Update with SQL

Your way is correct, and here is another way you can do it:

update      Table1
set Description = t2.Description
from Table1 t1
inner join Table2 t2
on t1.DescriptionID = t2.ID

The nested select is the long way of just doing a join.

How to Bulk Update with SQL Server?

First of all there is no such thing as BULK UPDATE, a few things that you can do are as follow:

  1. If possible put your database in simple recovery mode before doing this operation.
  2. Drop indexes before doing update and create them again once update is completed.
  3. do updates in smaller batches , something like

    WHILE (1=1)
    BEGIN
    -- update 10,000 rows at a time
    UPDATE TOP (10000) O
    FROM Table O inner join ... bla bla

    IF (@@ROWCOUNT = 0)
    BREAK;
    END

Note

if you go with the simple mode option, dont forget to take a full-backup after you switch the recovery mode back to full. Since simply switching it back to full recovery mode will not strat logging until you take a full backup.

How to update large table with millions of rows in SQL Server?

  1. You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.

  2. You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:

    1. It has that been deprecated since SQL Server 2005 was released (11 years ago):

      Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax

    2. It can affect more than just the statement you are dealing with:

      Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.

    Instead, use the TOP () clause.

  3. There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).

  4. Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:

    Are we required to handle Transaction in C# Code as well as in Store procedure

I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model (please see note below regarding performance) would be:

DECLARE @Rows INT,
@BatchSize INT; -- keep below 5000 to be safe

SET @BatchSize = 2000;

SET @Rows = @BatchSize; -- initialize just to enter the loop

BEGIN TRY
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP (@BatchSize) tab
SET tab.Value = 'abc1'
FROM TableName tab
WHERE tab.Parameter1 = 'abc'
AND tab.Parameter2 = 123
AND tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
-- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
-- that you don't skip differences that compare the same due to
-- insensitivity of case, accent, etc, or linguistic equivalence.

SET @Rows = @@ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(stuff);
RETURN;
END CATCH;

By testing @Rows against @BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than @BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to @BatchSize will this code run a final UPDATE affecting 0 rows.

I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.

NOTE REGARDING PERFORMANCE

I emphasized "better" above (as in, "this is a better model") because this has several improvements over the O.P.'s original code, and works fine in many cases, but is not perfect for all cases. For tables of at least a certain size (which varies due to several factors so I can't be more specific), performance will degrade as there are fewer rows to fix if either:

  1. there is no index to support the query, or
  2. there is an index, but at least one column in the WHERE clause is a string data type that does not use a binary collation, hence a COLLATE clause is added to the query here to force the binary collation, and doing so invalidates the index (for this particular query).

This is the situation that @mikesigs encountered, thus requiring a different approach. The updated method copies the IDs for all rows to be updated into a temporary table, then uses that temp table to INNER JOIN to the table being updated on the clustered index key column(s). (It's important to capture and join on the clustered index columns, whether or not those are the primary key columns!).

Please see @mikesigs answer below for details. The approach shown in that answer is a very effective pattern that I have used myself on many occasions. The only changes I would make are:

  1. Explicitly create the #targetIds table rather than using SELECT INTO...
  2. For the #targetIds table, declare a clustered primary key on the column(s).
  3. For the #batchIds table, declare a clustered primary key on the column(s).
  4. For inserting into #targetIds, use INSERT INTO #targetIds (column_name(s)) SELECT and remove the ORDER BY as it's unnecessary.

So, if you don't have an index that can be used for this operation, and can't temporarily create one that will actually work (a filtered index might work, depending on your WHERE clause for the UPDATE query), then try the approach shown in @mikesigs answer (and if you use that solution, please up-vote it).

Conditional Bulk Update in SQL Server for multiple columns

I don't see value to updating both columns at the same time, so I would suggest something like this:

update people
set firstname = 'N/A'
where firstname = 'XXX';

update people
set lastname = 'N/A'
where lastname = 'xxx';

If you want to put these in loops, then you can just repeat:

declare @reccnt int;
set @reccent = 1;

while @reccnt > 0
begin
update top (2000) people
set firstname = 'N/A'
where firstname = 'XXX';

set @reccnt = @@ROWCOUNT;
end;

And for the last name as well.

Fastest way of performing Bulk Update in C# / .NET

The fastest way would be to bulk insert the data into temporary table using the built in SqlBulkCopy Class, and then update using join to that table

Or you can use a tool such as SqlBulkTools which does exactly this in an easy way.

var bulk = new BulkOperations();

using (TransactionScope trans = new TransactionScope())
{
using (SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=mydb;Integrated Security=SSPI")
{
bulk.Setup()
.ForCollection(items)
.WithTable("Items")
.AddColumn(x => x.QuantitySold)
.BulkUpdate()
.MatchTargetOn(x => x.ItemID)
.Commit(conn);
}

trans.Complete();
}

Bulk update of the same row column using trigger in SQL Server

I would recommend to split this up into two separate triggers - one for INSERT, one for UPDATE. Makes it much simpler to work with the code.

Also: your trigger will be called once per statement - not once per row - so you cannot just select a value from the Inserted pseudo table like this (since that pseudo table will contain all 25 newly inserted rows - not just one):

SELECT 
@Id = Id,
@DateJoining = DateofJoining,
FROM INSERTED

Which of the 25 inserted rows would you be looking at?? It's arbitrary and undetermined - plus you'll just look at one row and ignore all other 24 rows.....

So the INSERT trigger should look like this:

CREATE TRIGGER [dbo].[trigger_JoiningDate_Insert] 
ON [dbo].[JoiningDate]
AFTER INSERT
AS
BEGIN
UPDATE jd
SET DaysCount = [Formula To Calculate Days]
FROM JoiningDate jd
INNER JOIN Inserted i ON i.Id = jd.Id
END

Since you have no previous values, there's no need to check if one of those two columns was updated.

Your UPDATE trigger should look like this:

CREATE TRIGGER [dbo].[trigger_JoiningDate_Update] 
ON [dbo].[JoiningDate]
AFTER UPDATE
AS
BEGIN
UPDATE jd
SET DaysCount = DATEDIFF(DAY, jd.DateofJoining, SYSDATETIME())
FROM JoiningDate jd
INNER JOIN Inserted i ON i.Id = jd.Id
INNER JOIN Deleted d ON d.Id = i.Id
-- check whether "DateOfJoining" or "Status" have been updated
WHERE
i.DateOfJoining <> d.DateOfJoining
OR i.Status <> d.Status
END

How to do bulk update using the Below query

When you do an UPDATE that will potentially affect a million rows, it is best to do it in batches. Try batches of 50,000 rows at a time.



Related Topics



Leave a reply



Submit