TSQL Comparing two Sets
Use:
SELECT CASE
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'Elements in TableA and TableB contains identical sets'
ELSE 'TableA and TableB do NOT contain identical sets'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x
Test with:
WITH a AS (
SELECT 'a' AS col
UNION ALL
SELECT 'b'
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'),
b AS (
SELECT 'b' AS col
UNION ALL
SELECT 'c'
UNION ALL
SELECT 'd'
UNION ALL
SELECT 'a')
SELECT CASE
WHEN COUNT(*) = (SELECT COUNT(*) FROM a)
AND COUNT(*) = (SELECT COUNT(*) FROM b) THEN 'yes'
ELSE 'no'
END
FROM (SELECT a.col
FROM a
INTERSECT
SELECT b.col
FROM b) x
Compare two sets of an SQL GROUP BY result
I assumed a TrainRoutes table with one row for each of R1, R2 etc. You could replace this with select distinct RouteID from Stops if required.
Select
r1.RouteID Route1,
r2.RouteID Route2
From
-- cross to compare each route with each route
dbo.TrainRoutes r1
Cross Join
dbo.TrainRoutes r2
Inner Join
dbo.Stops s1
On r1.RouteID = s1.RouteID
Inner Join
dbo.Stops s2
On r2.RouteID = s2.RouteID
Where
r1.RouteID < r2.RouteID -- no point in comparing R1 with R2 and R2 with R1
Group By
r1.RouteID,
r2.RouteID
Having
-- check each route has the same number of stations
count(Distinct s1.stationID) = count(Distinct s2.stationID) And
-- check each route has the same stops
Sum(Case When s1.StationID = s2.StationID Then 1 Else 0 End) = count(Distinct s1.StationID) And
-- check each route has different halts
sum(Case When s1.StationID = s2.StationID And s1.Halts = s2.Halts Then 1 Else 0 End) != count(Distinct s1.StationID)
You can also do this without the TrainRoute table like so, but you're now cross joining two larger tables:
Select
s1.RouteID Route1,
s2.RouteID Route2
From
dbo.Stops s1
Cross Join
dbo.Stops s2
Where
s1.RouteID < s2.RouteID
Group By
s1.RouteID,
s2.RouteID
Having
count(Distinct s1.stationID) = count(Distinct s2.stationID) And
Sum(Case When s1.StationID = s2.StationID Then 1 Else 0 End) = count(Distinct s1.StationID) And
sum(Case When s1.StationID = s2.StationID And s1.Halts = s2.Halts Then 1 Else 0 End) != count(Distinct s1.StationID)
http://sqlfiddle.com/#!6/76978/8
SQL comparing sets, part II: How to join sets of sets
OK, let's solve question 2 step by step:
(1) Inner join sets and probes on their individual elements. This way we'll see how do test sets and probe sets relate (which sets have what elements in common with which probe):
SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
e.elem [common element]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem
Result:
test set probe set common element
----------- ----------- --------------
1 3 A
1 1 B
1 3 B
1 1 C
1 2 C
1 3 C
1 1 F
1 2 F
2 3 A
2 1 B
2 3 B
2 1 C
2 2 C
2 3 C
3 1 F
3 2 F
4 1 B
4 3 B
4 1 C
4 2 C
4 3 C
4 1 F
4 2 F
5 1 F
5 2 F
(2) Count how many common elements between each test set and probe set (inner joins mean we already left the "no matches" aside)
SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
COUNT(*) AS [common element count]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
ORDER BY
e.set_no, m.set_no
Result:
test set probe set common element count
----------- ----------- --------------------
1 1 3
1 2 2
1 3 3
2 1 2
2 2 1
2 3 3
3 1 1
3 2 1
4 1 3
4 2 2
4 3 2
5 1 1
5 2 1
(3) Bring the counts of the test set and probe set on each row (subqueries may not be the most elegant)
SELECT
e.set_no AS [test set],
m.set_no AS [probe set],
COUNT(*) AS [common element count],
(SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no) AS [test set count],
(SELECT COUNT(*) FROM @multi_probe m1 WHERE m1.set_no = m.set_no) AS [probe set count]
FROM
@elements e
JOIN @multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
ORDER BY
e.set_no, m.set_no
Result:
test set probe set common element count test set count probe set count
----------- ----------- -------------------- -------------- ---------------
1 1 3 6 3
1 2 2 6 2
1 3 3 6 3
2 1 2 3 3
2 2 1 3 2
2 3 3 3 3
3 1 1 3 3
3 2 1 3 2
4 1 3 3 3
4 2 2 3 2
4 3 2 3 3
5 1 1 1 3
5 2 1 1 2
(4) Find the solution: only retain those test sets and probe sets that have the same number of elements AND this number is also the number of common elements, i.e. the test set and the probe set are identical
SELECT
e.set_no AS [test set],
m.set_no AS [probe set]
FROM
@elements e
JOIN
@multi_probe m ON e.elem = m.elem
GROUP BY
e.set_no, m.set_no
HAVING
COUNT(*) = (SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no)
AND (SELECT COUNT(*) FROM @elements e1 WHERE e1.set_no = e.set_no) = (SELECT COUNT(*) FROM @multi_probe m1 WHERE m1.set_no = m.set_no)
ORDER BY
e.set_no, m.set_no
Result:
test set probe set
----------- -----------
2 3
4 1
Excuse the @
s instead of #
s, I like table variables better :)
Comparing two T-SQL tables for diffs
SELECT t1.id
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id
WHERE ISNULL(t1.field1,'') <> ISNULL(t2.field1,'')
OR ISNULL(t1.field2,'') <> ISNULL(t2.field2,'')
OR ...
To produce long WHERE part you can use this function:
CREATE PROCEDURE compareTables
@db1 NVARCHAR(100),
@table1 NVARCHAR(100),
@db2 NVARCHAR(100),
@table2 NVARCHAR(100)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @where NVARCHAR(MAX)
DECLARE @cmd NVARCHAR(MAX)
SET @where = ''
SELECT @where = @where + 'ISNULL(t1.' + name + ','''') <> ISNULL(t2.' + name + ','''') OR '
FROM sys.columns WHERE object_id = OBJECT_ID(@table1)
SET @where = SUBSTRING(@where,1,LEN(@where)-3)
SET @cmd = 'SELECT t1.id FROM ' + @db1 + '.' + @table1 + ' t1 '
SET @cmd = @cmd + 'INNER JOIN ' + @db2 + '.' + @table2 + ' t2 ON t1.id = t2.id '
SET @cmd = @cmd + 'WHERE ' + @where
EXEC sp_executesql @cmd
END
GO
Example usage:
EXEC compareTables 'db1_name','dbo.table1','db2_name','dbo.table1'
Remember to put schema in the table name.
SQL comparing two sets of complex data
Q#1:
select id
from table1
group by id
having count(*) =
(
select count(*)
from table2
group by table2.id
having table2.id = table1.id
)
and count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)
Explanation of this query:
- It is grouping table1 by ID
- For each group (for each ID), it is counting the number of rows in table1 that have this ID.
- For each group, it is counting the number of rows in table2 that have this ID.
- For each group, it is counting the number of rows where the name appears in both tables for this ID (it does that by
inner join
ing table1 and table2 on the ID and Name which means only rows where both ID and Name match in both tables will be counted, for each ID). - It then returns IDs (from table1) where each of the above counts are equal. This is what results in returning IDs where all names are in both tables (no more, no less).
Q#2 - In this case you don't care that table2 has the same number of names per ID. So remove the first sub-query (that counts matching rows in table2).
select id
from table1
group by id
having count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)
Although the above is easy enough to understand following the same logic as Q#1, it is probably more efficient to do the following, and more straightforward. It only matters if you find it running too slow for your data (which is subjective and context dependent).
select table1.id
from table1
left join table2 on table1.id = table2.id and table1.name = table2.name
group by table1.id
having count(table1.id) = count(table2.id)
Here, the two tables are LEFT
(outer) joined which means all records from table1 are gathered and records in table2 that match by ID and Name are also included alongside. Then, we group them by ID and we compare the count of each group in table1 with those that had matching names in table2.
Q#3 - This case is the same as Q#2 except table1 and table2 are swapped.
Q#4 - In this case you only care about IDs that have at least one name that appears in both tables. So join the tables and return the distinct IDs:
select distinct id
from table1
inner join table2 on table1.id = table2.id and table1.name = table2.name
Here is a SQLFiddle to play with containing the four queries: http://www.sqlfiddle.com/#!18/3fc71/22
How do I compare two columns for equality in SQL Server?
What's wrong with CASE for this? In order to see the result, you'll need at least a byte, and that's what you get with a single character.
CASE WHEN COLUMN1 = COLUMN2 THEN '1' ELSE '0' END AS MyDesiredResult
should work fine, and for all intents and purposes accomplishes the same thing as using a bit field.
SQL: Compare Columns for Exact Match of sets
Try with this:
Create table t(id int, c char(1))
Insert into t values
(1, 'a'),
(1, 'a'),
(2, 'b'),
(2, null),
(3, null),
(3, null),
(4, 'c'),
(4, 'd')
;with cte as(
select id, count(*) c1, count(c) c2, count(distinct c) c3 from t
group by id)
select t.id, t.c, ca.m from t
Cross apply(select case when c2 = 0 and c3 = 0 then 'empty'
when c1 = c2 and c3 = 1 then 'yes'
else 'no' end as m
from cte where cte.id = t.id) ca
Output:
id c m
1 a yes
1 a yes
2 b no
2 (null) no
3 (null) empty
3 (null) empty
4 c no
4 d no
sql query to return differences between two tables
IF you have tables A
and B
, both with colum C
, here are the records, which are present in table A
but not in B
:
SELECT A.*
FROM A
LEFT JOIN B ON (A.C = B.C)
WHERE B.C IS NULL
To get all the differences with a single query, a full join must be used, like this:
SELECT A.*, B.*
FROM A
FULL JOIN B ON (A.C = B.C)
WHERE A.C IS NULL OR B.C IS NULL
What you need to know in this case is, that when a record can be found in A
, but not in B
, than the columns which come from B
will be NULL, and similarly for those, which are present in B
and not in A
, the columns from A
will be null.
Related Topics
Sum of Digits of a Number in SQL Server Without Using Traditional Loops Like While
Update an Excel Sheet Using Vba/Ado
Sql:Find Rows and Sort According to Number of Matching Columns
Join/Pivot Items with Eav Table
Why Isn't Postgres Using the Index
Time Based Priority in Active Record Query
Introducing Foreign Key Constraint May Cause Cycles or Multiple Cascade Paths
Sql: Parse Comma-Delimited String and Use as Join
SQL Server:Find Duplicates in a Table Based on Values in a Single Column
Oracle (11.2.0.1):How to Identify the Row Which Is Currently Updated by the Update Statement
SQL Server Normalization Tactic: Varchar VS Int Identity
Modify(Replace) Xml for Conditions
Choose As400 Query Records Directly from Excel
Query a Table and a Column Name Stored in a Table