Update Rows in One Table with Data from Another Table Based on One Column in Each Being Equal

Update rows in one table with data from another table based on one column in each being equal

update 
table1 t1
set
(
t1.column1,
t1.column2
) = (
select
t2.column1,
t2.column2
from
table2 t2
where
t2.column1 = t1.column1
)
where exists (
select
null
from
table2 t2
where
t2.column1 = t1.column1
);

Or this (if t2.column1 <=> t1.column1 are many to one and anyone of them is good):

update 
table1 t1
set
(
t1.column1,
t1.column2
) = (
select
t2.column1,
t2.column2
from
table2 t2
where
t2.column1 = t1.column1
and
rownum = 1
)
where exists (
select
null
from
table2 t2
where
t2.column1 = t1.column1
);

How can I UPDATE a table based on another table, using values from groups of rows?

Getting a value from a column different to the column used in a MIN/MAX expression in a GROUP BY query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they're completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.

Anyway, as-ever in modern SQL, there's usually 3 or 4 different ways to accomplish the same task, with a few gotchas.

Preface:

  • As Site, Date, Year, and Month are all keywords in T-SQL, I've escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.

    • SQL Server supports this by default. If (for some ungodly reason) you have SET QUOTED IDENTIFIER OFF then change the double-quotes to square-brackets: []
  • I assume that the Site column in both tables is just a plain' ol' data column, as such:

    • It is not a PRIMARY KEY member column.
    • It should not be used as a GROUP BY.
    • It should not be used in a JOIN predicate.
  • All of the approaches below assume this database state:

CREATE TABLE "Employee" (
"Site" int NOT NULL,
WorkTypeId char(2) NOT NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);

CREATE TABLE "PTO" (
"Site" int NOT NULL,
WorkTypeId char(2) NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);

GO

INSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, 'MB', 1005, '2022-02-01' ),
( 5015, 'MI', 1005, '2022-02-04' ),
( 5015, 'PO', 1005, '2022-02-04' ),
( 5015, 'ME', 2003, '2022-01-01' ),
( 5015, 'TT', 2003, '2022-01-10' );

INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, NULL, 1005, '2022-02-03' ),
( 5015, NULL, 1005, '2022-02-14' ),
( 5014, NULL, 2003, '2022-01-09' );
  • Both approaches define CTEs e and p that extend Employee and PTO respectively to add computed "Year" and "Month" columns, which avoids having to repeatedly use YEAR( "Date" ) AS "Year" in GROUP BY and JOIN expressions.
    • I suggest you add those as computed-columns in your base tables, if you're able, as they'll be useful generally anyway. Don't forget to index them appropriately too.

Approach 1: Composed CTEs with elementary aggregates, then UPDATE:

WITH
-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):
e AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",

YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
Employee
),
p AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",

YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
PTO
),
-- Step 2: Get the MIN( "Date" ) value for each group:
minDatesForEachEmployeeMonthYearGroup AS (
SELECT
e.Emp_No,
e."Year",
e."Month",

MIN( "Date" ) AS "FirstDate"
FROM
e
GROUP BY
e.Emp_No,
e."Year",
e."Month"
),
-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
/* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */
SELECT
e.Emp_No,
e."Year",
e."Month",

e.WorkTypeId AS FirstWorkTypeId
FROM
e
INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON
e.Emp_NO = q.Emp_NO
AND
e."Date" = q.FirstDate
)
-- Step 4: Do the UPDATE.
-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
p
INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
p.Emp_No = f.Emp_No
AND
p."Year" = f."Year"
AND
p."Month" = f."Month"
WHERE
p.WorkTypeId IS NULL;

Here's a screenshot of SSMS showing the contents of the PTO table from before, and after, the above query runs:

Sample Image

Approach 2: Skip the self-JOIN with FIRST_VALUE:

This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).

Surprisingly, you cannot use FIRST_VALUE in a GROUP BY query, despite its obvious similarities with MIN, but an equivalent query can be built with SELECT DISTINCT:

WITH
-- Step 1: Extend the `Employee` table with YEAR and MONTH columns:
e AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",

YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
Employee
),
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (

SELECT
DISTINCT
e.Emp_No,
e."Year",
e."Month",
FIRST_VALUE( WorkTypeId ) OVER (
PARTITION BY
Emp_No,
e."Year",
e."Month"
ORDER BY
"Date" ASC
) AS FirstWorkTypeId
FROM
e
)
-- Step 3: UPDATE PTO:
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
PTO AS p
INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
p.Emp_No = f.Emp_No
AND
YEAR( p."Date" ) = f."Year"
AND
MONTH( p."Date" ) = f."Month"
WHERE
p.WorkTypeId IS NULL;

Doing a SELECT * FROM PTO after this runs gives me the exact same output as Approach 2.

Approach 2b, but made shorter:

Just so @SOS doesn't feel too smug about their SQL being considerably more shorter than mine , the Approach 2 SQL above can be compacted down to this:

WITH empYrMoGroups AS (
SELECT
DISTINCT
e.Emp_No,
YEAR( e."Date" ) AS "Year",
MONTH( e."Date" ) AS "Month",
FIRST_VALUE( e.WorkTypeId ) OVER (
PARTITION BY
e.Emp_No,
YEAR( e."Date" ),
MONTH( e."Date" )
ORDER BY
e."Date" ASC
) AS FirstWorkTypeId
FROM
Employee AS e
)
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
PTO AS p
INNER JOIN empYrMoGroups AS f ON
p.Emp_No = f.Emp_No
AND
YEAR( p."Date" ) = f."Year"
AND
MONTH( p."Date" ) = f."Month"
WHERE
p.WorkTypeId IS NULL;
  • The execution-plans for both Approach 2 and Approach 2b are almost identical, excepting that Approach 2b has an extra Computed Scalar step for some reason.
  • The execution plans for Approach 1 and Approach 2 are very different, however, with Approach 1 having more branches than Approach 2 despite their similar semantics.
  • But my execution-plans won't match yours because it's very context-dependent, especially w.r.t. what indexes and PKs you have, and if there's any other columns involved, etc.

Approach 1's plan looks like this:

Sample Image

Approach 2b's plan looks like this:

Sample Image

@SOS's plan, for comparison, is a lot simpler... and I honestly don't know why, but it does show how good SQL Server's query optimizer is thesedays:

Sample Image

Update column with values from another table if ID exists in another table

Use an INNER JOIN not a subquery. This will implicitly filter to only rows where the related row is found:

UPDATE T1
SET [Value] = T2.Value
FROM dbo.Table1 T1
JOIN dbo.Table2 T2 ON T1.ID = T2.ID;

db<>fiddle

Update Table's rows based on another table'rows

Just Join the tables Based on col3 = col33 and Update the other columns.

UPDATE T2
SET
col11 = T1.col1,
col22 = T1.col2
FROM Tab1 T1
INNER JOIN Tab2 T2
ON T2.col33 = T1.col3

How to update each row of a column from one table with a list of values from another?

Given the following table structure:

CREATE TABLE #tempA (stringEntry NVARCHAR(50));
INSERT INTO #tempA (stringEntry) VALUES ('abcd'), ('efgh'), ('ijkl');

CREATE TABLE #tempB (stringEntry NVARCHAR(50));
INSERT INTO #tempB (stringEntry) VALUES ('mnop'), ('qrst'), ('uvwx');

You can do the following:

SELECT
ROW_NUMBER() OVER(ORDER BY #tempA.stringEntry) AS RowNumber,
#tempA.stringEntry AS entryA
INTO #tempA2
FROM #tempA;

SELECT
ROW_NUMBER() OVER(ORDER BY #tempB.stringEntry) AS RowNumber,
#tempB.stringEntry AS entryB
INTO #tempB2
FROM #tempB;

UPDATE #tempA
SET #tempA.stringEntry = #tempB2.entryB
FROM #tempA
INNER JOIN #tempA2 ON #tempA.stringEntry = #tempA2.entryA
INNER JOIN #tempB2 ON #tempB2.RowNumber = #tempA2.RowNumber;

This assumes that you have equal number of rows in each table, as you indicated, or are okay with having the "excess" entries in your first table not being updated.

Update column with values from field in the same table if field is not null

I figured it out. The select was off as intimated by the posted who questioned above. So I changed the select statement but even that threw an error. Subquery returned more that 1 value. This is not permitted... So I read up on that error and included MAX in my query and it worked.

The correct answer is

update [DMSEngine].[dbo].[IndexForm_ePermitsResults] 
set Name_Field = (select MAX (b2.Name_Field)
from [DMSEngine].[dbo].[IndexForm_ePermitsResults] b2
where b2.Name_Field <>'' and
b2.ParcelID_Field = ParcelID_Field and
ParcelID_Field = 12257
)
where ParcelID_Field = 12257 and Name_Field = ' ';


Related Topics



Leave a reply



Submit