Bulk Record Update with SQL
Your way is correct, and here is another way you can do it:
update Table1
set Description = t2.Description
from Table1 t1
inner join Table2 t2
on t1.DescriptionID = t2.ID
The nested select is the long way of just doing a join.
Mass/bulk update in rails without using update_all with a single query?
Mass update without using update_all can be achievable using activerecord-import gem.
Please refer to this gem for more information.
Methods with detail.
Example:
Lets say there is a table named "Services" having a "booked" column. We want to update its value using the gem outside the loop.
services.each do |service|
service.booked = false
service.updated_at = DateTime.current if service.changed?
end
ProvidedService.import services.to_ary, on_duplicate_key_update: { columns: %i[booked updated_at] }
active-record import by default does not update the "updated_at" column. So we've to explicitly update it.
How to bulk update 1000 records using C#/SQL
You have two options, either use MERGE
statement, or UPDATE
.
I will do UPDATE
option, as it's the easiest one. (This would need FastMember
nuget).
private void ExecuteSql(SqlConnection connection , string sql , SqlParameter[] parameters = null)
{
if(connection == null)
{
throw new ArgumentNullException(nameof(connection));
}
if(string.IsNullOrWhiteSpace(sql))
{
throw new ArgumentNullException(nameof(sql));
}
using(var command = new SqlCommand(sql , connection))
{
if(parameters?.Length > 0)
{
command.Parameters.AddRange(parameters);
}
if(connection.State != ConnectionState.Open)
connection.Open();
command.ExecuteNonQuery();
}
}
private void ExecuteBulkCopy<T>(SqlConnection connection , IEnumerable<T> entries , string destinationTableName , string[] columns = null , int batchSize = 1000000)
{
if(connection == null)
{
throw new ArgumentNullException(nameof(connection));
}
if(entries == null)
{
throw new ArgumentNullException(nameof(entries));
}
if(string.IsNullOrWhiteSpace(destinationTableName))
{
throw new ArgumentNullException(nameof(destinationTableName));
}
if(connection.State != ConnectionState.Open)
connection.Open();
using(SqlBulkCopy sbc = new SqlBulkCopy(connection)
{
BulkCopyTimeout = 0 ,
DestinationTableName = destinationTableName ,
BatchSize = batchSize
})
{
using(var reader = ObjectReader.Create(entries , columns))
{
sbc.WriteToServer(reader);
}
}
}
private IEnumerable<Client> GetUpdatedClients(SqlConnection connection)
{
using(var command = new SqlCommand("SELECT ClientID, ServerName, Text FROM [dbo].[Client]", connection))
{
connection.Open();
using(SqlDataReader reader = _connection.ExecuteReader(query , parameters))
{
if(reader.HasRows)
{
while(reader.Read())
{
if(reader.IsDBNull(x)) { continue; }
var clientId = (int)reader["ClientID"];
var serverName = reader["ServerName"]?.ToString();
var text = reader["Text"]?.ToString();
//Modify Text & set ServerName
string textUpdated = UpdateText(text);
if(textUpdated.StartWith("doo"))
{
serverName = "re";
}
var client = new Client()
{
ClientID = clientId,
ServerName = serverName,
Text = textUpdated
};
yield return client;
}
}
}
}
}
private void BulkUpdateClients(SqlConnection connection, IEnumerable<Client> clients)
{
const string dropTempTable = "IF OBJECT_ID('[tempdb].[dbo].[##Client]') IS NOT NULL DROP TABLE [tempdb].[dbo].[##Client];";
// drop temp table if exists
ExecuteSql(connection ,dropTempTable);
// create the temp table
ExecuteSql($"SELECT TOP 1 [ClientID], [ServerName], [Text] INTO [tempdb].[dbo].[##Client] FROM [dbo].[Client];");
// copy rows to the temp table
ExecuteBulkCopy(connection, clients , "[tempdb].[dbo].[##Client]", new[] { "ClientID", "ServerName", "Text" });
// Use UPDATE JOIN
ExecuteSql("UPDATE t1 SET [ServerName] = t2.[ServerName], [Text] = t2.[Text] FROM [dbo].[Client] t1 JOIN [tempdb].[dbo].[##Client] t2 ON t1.[ClientID] = t2.[ClientID];");
// drop temp table
ExecuteSql(connection,dropTempTable);
}
public void BulkUpdateClients()
{
try
{
using(var connection = new SqlConnection(strConn))
{
connection.Open();
var clients = GetUpdatedClients(connection);
// it's important to use the same connection and keep it a live
// otherwise the temp table will be dropped.
BulkUpdate(connection, clients);
}
}
catch(Exception ex)
{
throw ex;
}
}
If you don't need to use temp table, you can change it to a permanent table (just change the temp table name).
How to update large table with millions of rows in SQL Server?
You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the
UPDATE
operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:
It has that been deprecated since SQL Server 2005 was released (11 years ago):
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax
It can affect more than just the statement you are dealing with:
Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.
Instead, use the
TOP ()
clause.There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a
ROLLBACK
, which isn't even needed since each statement is its own transaction (i.e. auto-commit).Assuming you find a reason to keep the explicit transaction, then you do not have a
TRY
/CATCH
structure. Please see my answer on DBA.StackExchange for aTRY
/CATCH
template that handles transactions:Are we required to handle Transaction in C# Code as well as in Store procedure
I suspect that the real WHERE
clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model (please see note below regarding performance) would be:
DECLARE @Rows INT,
@BatchSize INT; -- keep below 5000 to be safe
SET @BatchSize = 2000;
SET @Rows = @BatchSize; -- initialize just to enter the loop
BEGIN TRY
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP (@BatchSize) tab
SET tab.Value = 'abc1'
FROM TableName tab
WHERE tab.Parameter1 = 'abc'
AND tab.Parameter2 = 123
AND tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
-- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
-- that you don't skip differences that compare the same due to
-- insensitivity of case, accent, etc, or linguistic equivalence.
SET @Rows = @@ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(stuff);
RETURN;
END CATCH;
By testing @Rows
against @BatchSize
, you can avoid that final UPDATE
query (in most cases) because the final set is typically some number of rows less than @BatchSize
, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to @BatchSize
will this code run a final UPDATE
affecting 0 rows.
I also added a condition to the WHERE
clause to prevent rows that have already been updated from being updated again.
NOTE REGARDING PERFORMANCE
I emphasized "better" above (as in, "this is a better model") because this has several improvements over the O.P.'s original code, and works fine in many cases, but is not perfect for all cases. For tables of at least a certain size (which varies due to several factors so I can't be more specific), performance will degrade as there are fewer rows to fix if either:
- there is no index to support the query, or
- there is an index, but at least one column in the
WHERE
clause is a string data type that does not use a binary collation, hence aCOLLATE
clause is added to the query here to force the binary collation, and doing so invalidates the index (for this particular query).
This is the situation that @mikesigs encountered, thus requiring a different approach. The updated method copies the IDs for all rows to be updated into a temporary table, then uses that temp table to INNER JOIN
to the table being updated on the clustered index key column(s). (It's important to capture and join on the clustered index columns, whether or not those are the primary key columns!).
Please see @mikesigs answer below for details. The approach shown in that answer is a very effective pattern that I have used myself on many occasions. The only changes I would make are:
- Explicitly create the
#targetIds
table rather than usingSELECT INTO...
- For the
#targetIds
table, declare a clustered primary key on the column(s). - For the
#batchIds
table, declare a clustered primary key on the column(s). - For inserting into
#targetIds
, useINSERT INTO #targetIds (column_name(s)) SELECT
and remove theORDER BY
as it's unnecessary.
So, if you don't have an index that can be used for this operation, and can't temporarily create one that will actually work (a filtered index might work, depending on your WHERE
clause for the UPDATE
query), then try the approach shown in @mikesigs answer (and if you use that solution, please up-vote it).
Related Topics
Cumulative Total in Ms SQL Server
Differencebetween a Stored Procedure and a View
SQL Server Select into @Variable
What's the Difference Between Charfield and Textfield in Django
Check Bound Datatable for Null Value Vb.Net
Calculate Difference Between 2 Dates in SQL, Excluding Weekend Days
SQL Add Filter Only If a Variable Is Not Null
The Alter Table Statement Conflicted with the Foreign Key Constraint
How to Change Db Schema to Dbo
Insert Dates in the Return from a Query Where There Is None
Parallel Unnest() and Sort Order in Postgresql
Why Is a Primary-Foreign Key Relation Required When We Can Join Without It
Check If Table Exists and If It Doesn't Exist, Create It in SQL Server 2008
Script All Data from SQL Server Database
Difference Between Timestamps in Milliseconds in Oracle