Tsql Comparing Two Sets

TSQL Comparing two Sets

Use:

SELECT CASE 
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'Elements in TableA and TableB contains identical sets'
ELSE 'TableA and TableB do NOT contain identical sets'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x

Test with:

WITH a AS (
SELECT 'a' AS col
UNION ALL
SELECT 'b'
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'),
b AS (
SELECT 'b' AS col
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'
UNION ALL
SELECT 'a')
SELECT CASE
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'yes'
ELSE 'no'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x

Compare two sets of an SQL GROUP BY result

I assumed a TrainRoutes table with one row for each of R1, R2 etc. You could replace this with select distinct RouteID from Stops if required.

Select
r1.RouteID Route1,
r2.RouteID Route2
From
-- cross to compare each route with each route
dbo.TrainRoutes r1
Cross Join
dbo.TrainRoutes r2
Inner Join
dbo.Stops s1
On r1.RouteID = s1.RouteID
Inner Join
dbo.Stops s2
On r2.RouteID = s2.RouteID
Where
r1.RouteID < r2.RouteID -- no point in comparing R1 with R2 and R2 with R1
Group By
r1.RouteID,
r2.RouteID
Having
-- check each route has the same number of stations
count(Distinct s1.stationID) = count(Distinct s2.stationID) And
-- check each route has the same stops
Sum(Case When s1.StationID = s2.StationID Then 1 Else 0 End) = count(Distinct s1.StationID) And
-- check each route has different halts
sum(Case When s1.StationID = s2.StationID And s1.Halts = s2.Halts Then 1 Else 0 End) != count(Distinct s1.StationID)

You can also do this without the TrainRoute table like so, but you're now cross joining two larger tables:

Select
s1.RouteID Route1,
s2.RouteID Route2
From
dbo.Stops s1
Cross Join
dbo.Stops s2
Where
s1.RouteID < s2.RouteID
Group By
s1.RouteID,
s2.RouteID
Having
count(Distinct s1.stationID) = count(Distinct s2.stationID) And
Sum(Case When s1.StationID = s2.StationID Then 1 Else 0 End) = count(Distinct s1.StationID) And
sum(Case When s1.StationID = s2.StationID And s1.Halts = s2.Halts Then 1 Else 0 End) != count(Distinct s1.StationID)

http://sqlfiddle.com/#!6/76978/8

SQL comparing sets, part II: How to join sets of sets

OK, let's solve question 2 step by step:

(1) Inner join sets and probes on their individual elements. This way we'll see how do test sets and probe sets relate (which sets have what elements in common with which probe):

SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
e.elem [common element]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem

Result:

test set    probe set   common element
----------- ----------- --------------
1 3 A
1 1 B
1 3 B
1 1 C
1 2 C
1 3 C
1 1 F
1 2 F
2 3 A
2 1 B
2 3 B
2 1 C
2 2 C
2 3 C
3 1 F
3 2 F
4 1 B
4 3 B
4 1 C
4 2 C
4 3 C
4 1 F
4 2 F
5 1 F
5 2 F

(2) Count how many common elements between each test set and probe set (inner joins mean we already left the "no matches" aside)

SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
COUNT(*) AS [common element count]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
ORDER BY
e.set_no, m.set_no

Result:

 test set    probe set   common element count
----------- ----------- --------------------
1 1 3
1 2 2
1 3 3
2 1 2
2 2 1
2 3 3
3 1 1
3 2 1
4 1 3
4 2 2
4 3 2
5 1 1
5 2 1

(3) Bring the counts of the test set and probe set on each row (subqueries may not be the most elegant)

SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
COUNT(*) AS [common element count],
(SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no) AS [test set count],
(SELECT COUNT(*) FROM @multi_probe m1 WHERE m1.set_no = m.set_no) AS [probe set count]
FROM
@elements e
JOIN @multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
ORDER BY
e.set_no, m.set_no

Result:

test set    probe set   common element count test set count probe set count
----------- ----------- -------------------- -------------- ---------------
1 1 3 6 3
1 2 2 6 2
1 3 3 6 3
2 1 2 3 3
2 2 1 3 2
2 3 3 3 3
3 1 1 3 3
3 2 1 3 2
4 1 3 3 3
4 2 2 3 2
4 3 2 3 3
5 1 1 1 3
5 2 1 1 2

(4) Find the solution: only retain those test sets and probe sets that have the same number of elements AND this number is also the number of common elements, i.e. the test set and the probe set are identical

SELECT
e.set_no AS [test set],
m.set_no AS [probe set]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
HAVING
COUNT(*) = (SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no)
AND (SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no) = (SELECT COUNT(*) FROM @multi_probe m1 WHERE m1.set_no = m.set_no)
ORDER BY
e.set_no, m.set_no

Result:

test set    probe set
----------- -----------
2 3
4 1

Excuse the @s instead of #s, I like table variables better :)

Comparing two T-SQL tables for diffs

SELECT t1.id
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id
WHERE ISNULL(t1.field1,'') <> ISNULL(t2.field1,'')
OR ISNULL(t1.field2,'') <> ISNULL(t2.field2,'')
OR ...

To produce long WHERE part you can use this function:

CREATE PROCEDURE compareTables
@db1 NVARCHAR(100),
@table1 NVARCHAR(100),
@db2 NVARCHAR(100),
@table2 NVARCHAR(100)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @where NVARCHAR(MAX)
DECLARE @cmd NVARCHAR(MAX)

SET @where = ''

SELECT @where = @where + 'ISNULL(t1.' + name + ','''') <> ISNULL(t2.' + name + ','''') OR '
FROM sys.columns WHERE object_id = OBJECT_ID(@table1)

SET @where = SUBSTRING(@where,1,LEN(@where)-3)

SET @cmd = 'SELECT t1.id FROM ' + @db1 + '.' + @table1 + ' t1 '
SET @cmd = @cmd + 'INNER JOIN ' + @db2 + '.' + @table2 + ' t2 ON t1.id = t2.id '
SET @cmd = @cmd + 'WHERE ' + @where

EXEC sp_executesql @cmd
END
GO

Example usage:

EXEC compareTables 'db1_name','dbo.table1','db2_name','dbo.table1'

Remember to put schema in the table name.

SQL comparing two sets of complex data

Q#1:

select id
from table1
group by id
having count(*) =
(
select count(*)
from table2
group by table2.id
having table2.id = table1.id
)
and count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)

Explanation of this query:

  • It is grouping table1 by ID
  • For each group (for each ID), it is counting the number of rows in table1 that have this ID.
  • For each group, it is counting the number of rows in table2 that have this ID.
  • For each group, it is counting the number of rows where the name appears in both tables for this ID (it does that by inner joining table1 and table2 on the ID and Name which means only rows where both ID and Name match in both tables will be counted, for each ID).
  • It then returns IDs (from table1) where each of the above counts are equal. This is what results in returning IDs where all names are in both tables (no more, no less).

Q#2 - In this case you don't care that table2 has the same number of names per ID. So remove the first sub-query (that counts matching rows in table2).

select id
from table1
group by id
having count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)

Although the above is easy enough to understand following the same logic as Q#1, it is probably more efficient to do the following, and more straightforward. It only matters if you find it running too slow for your data (which is subjective and context dependent).

select table1.id
from table1
left join table2 on table1.id = table2.id and table1.name = table2.name
group by table1.id
having count(table1.id) = count(table2.id)

Here, the two tables are LEFT (outer) joined which means all records from table1 are gathered and records in table2 that match by ID and Name are also included alongside. Then, we group them by ID and we compare the count of each group in table1 with those that had matching names in table2.

Q#3 - This case is the same as Q#2 except table1 and table2 are swapped.

Q#4 - In this case you only care about IDs that have at least one name that appears in both tables. So join the tables and return the distinct IDs:

select distinct id
from table1
inner join table2 on table1.id = table2.id and table1.name = table2.name

Here is a SQLFiddle to play with containing the four queries: http://www.sqlfiddle.com/#!18/3fc71/22

How do I compare two columns for equality in SQL Server?

What's wrong with CASE for this? In order to see the result, you'll need at least a byte, and that's what you get with a single character.

CASE WHEN COLUMN1 = COLUMN2 THEN '1' ELSE '0' END AS MyDesiredResult

should work fine, and for all intents and purposes accomplishes the same thing as using a bit field.

SQL: Compare Columns for Exact Match of sets

Try with this:

Create table t(id int, c char(1))

Insert into t values
(1, 'a'),
(1, 'a'),
(2, 'b'),
(2, null),
(3, null),
(3, null),
(4, 'c'),
(4, 'd')

;with cte as(
select id, count(*) c1, count(c) c2, count(distinct c) c3 from t
group by id)
select t.id, t.c, ca.m from t
Cross apply(select case when c2 = 0 and c3 = 0 then 'empty'
when c1 = c2 and c3 = 1 then 'yes'
else 'no' end as m
from cte where cte.id = t.id) ca

Output:

id  c       m
1 a yes
1 a yes
2 b no
2 (null) no
3 (null) empty
3 (null) empty
4 c no
4 d no

sql query to return differences between two tables

IF you have tables A and B, both with colum C, here are the records, which are present in table A but not in B:

SELECT A.*
FROM A
LEFT JOIN B ON (A.C = B.C)
WHERE B.C IS NULL

To get all the differences with a single query, a full join must be used, like this:

SELECT A.*, B.*
FROM A
FULL JOIN B ON (A.C = B.C)
WHERE A.C IS NULL OR B.C IS NULL

What you need to know in this case is, that when a record can be found in A, but not in B, than the columns which come from B will be NULL, and similarly for those, which are present in B and not in A, the columns from A will be null.



Related Topics



Leave a reply



Submit