SQL Query That Gives Distinct Results That Match Multiple Columns

SQL query that gives distinct results that match multiple columns

SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2

Edit:

Updated for lack of constraints...

SQL Query Multiple Columns Using Distinct on One Column Only

select * from tblFruit where
tblFruit_ID in (Select max(tblFruit_ID) FROM tblFruit group by tblFruit_FruitType)

Select distinct values from multiple columns in same table

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

CREATE TABLE tbl_data (
id INT NOT NULL,
code_1 CHAR(2),
code_2 CHAR(2)
);

INSERT INTO tbl_data (
id,
code_1,
code_2
)
VALUES
(1, 'AB', 'BC'),
(2, 'BC', NULL),
(3, 'DE', 'EF'),
(4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

SELECT code
FROM (
SELECT code_1
FROM tbl_data
UNION
SELECT code_2
FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

The query optimizer produces this execution plan for both queries: two table scans, a concatenation, a distinct sort, and a select.

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

Compare the above to the execution plan for the query in your question:

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation.

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

Counting DISTINCT over multiple columns

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

How do I (or can I) SELECT DISTINCT on multiple columns?

SELECT DISTINCT a,b,c FROM t

is roughly equivalent to:

SELECT a,b,c FROM t GROUP BY a,b,c

It's a good idea to get used to the GROUP BY syntax, as it's more powerful.

For your query, I'd do it like this:

UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)

MySQL: Select DISTINCT / UNIQUE, but return all columns?

You're looking for a group by:

select *
from table
group by field1

Which can occasionally be written with a distinct on statement:

select distinct on field1 *
from table

On most platforms, however, neither of the above will work because the behavior on the other columns is unspecified. (The first works in MySQL, if that's what you're using.)

You could fetch the distinct fields and stick to picking a single arbitrary row each time.

On some platforms (e.g. PostgreSQL, Oracle, T-SQL) this can be done directly using window functions:

select *
from (
select *,
row_number() over (partition by field1 order by field2) as row_number
from table
) as rows
where row_number = 1

On others (MySQL, SQLite), you'll need to write subqueries that will make you join the entire table with itself (example), so not recommended.

Select distinct of multiple columns from prestodb

You can use group by on all required columns:

SELECT tid, open_dt 
FROM table_name
GROUP BY tid, open_dt

How to select DISTINCT records based on multiple columns and without considering their order

I think I understand what you are looking for but it seems over simplified to your actual problem. Your query you posted was incredibly close to working. You can't reference columns by their alias in the where predicates so you will need to use the string concatenation you had in your column. Then you can simply change the <> to either > or < so you only get one match. This example should work for your problem as I understand it.

declare @Customer table
(
CustID int identity
, Name varchar(10)
, Surname varchar(10)
, City varchar(10)
)

insert @Customer
select 'Foo', 'Foo', 'New York' union all
select 'Bar', 'Bar', 'New York' union all
select 'Smith', 'Smith', 'New York' union all
select 'Alice', 'A', 'London' union all
select 'Bob', 'B', 'London'

SELECT CustomerA = C1.Name + ' ' + C1.Surname
, CustomerB = C2.Name + ' ' + C2.Surname
, C1.City
FROM @Customer C1
JOIN @Customer C2 ON C1.City = C2.City
where C1.Name + ' ' + C1.Surname > C2.Name + ' ' + C2.Surname

Applying distinct in multiple columns in SQL server

Group by the column you want to be unique and use an aggregate function on the other column. You want the lowest id for every message, so use MIN()

select min(id) as id,
message
from your_table
group by message

MySQL SELECT DISTINCT multiple columns

can this help?

select 
(SELECT group_concat(DISTINCT a) FROM my_table) as a,
(SELECT group_concat(DISTINCT b) FROM my_table) as b,
(SELECT group_concat(DISTINCT c) FROM my_table) as c,
(SELECT group_concat(DISTINCT d) FROM my_table) as d


Related Topics



Leave a reply



Submit