Select Distinct Values from Multiple Columns in Same Table

Select distinct values from multiple columns in same table

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

CREATE TABLE tbl_data (
id INT NOT NULL,
code_1 CHAR(2),
code_2 CHAR(2)
);

INSERT INTO tbl_data (
id,
code_1,
code_2
)
VALUES
(1, 'AB', 'BC'),
(2, 'BC', NULL),
(3, 'DE', 'EF'),
(4, NULL, 'BC');

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

SELECT code
FROM (
SELECT code_1
FROM tbl_data
UNION
SELECT code_2
FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

The query optimizer produces this execution plan for both queries: two table scans, a concatenation, a distinct sort, and a select.

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

Compare the above to the execution plan for the query in your question:

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation.

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

Select multiple distinct columns from same table separately

One problem with what you're asking is that you're expected resultset doesn't really make any sense from a relational database standpoint. Every column in a row of data should have a relationship with the other columns in the row.

The best way to approach this, IMO, is to return two result sets and process each one for each of your drop down boxes:

SELECT DISTINCT column_1 FROM My_Table

and

SELECT DISTINCT column_2 FROM My_Table

I'd also look into why you have that data in the same table to begin with if the two columns are not related. If they are related and you're trying to have a drop down for one column that then filters the items in the second drop down list then you really should return the full set of rows and let your front end application handle the filtering (and displaying unique results). Most drop down widgets should allow this kind of linking.

How do I (or can I) SELECT DISTINCT on multiple columns?

SELECT DISTINCT a,b,c FROM t

is roughly equivalent to:

SELECT a,b,c FROM t GROUP BY a,b,c

It's a good idea to get used to the GROUP BY syntax, as it's more powerful.

For your query, I'd do it like this:

UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)

How do I select unique values from multiple columns (each value being unique not the total row)

I would do a union of queries to get all distinct telephones by column then do a query on this union of telephones to get them only once:

SELECT DISTINCT `Tel` FROM
(
SELECT DISTINCT `Tel1` AS `Tel` FROM products
UNION
SELECT DISTINCT `Tel2` AS `Tel` FROM products
UNION
SELECT DISTINCT `Tel3` AS `Tel` FROM products
UNION
SELECT DISTINCT `Tel4` AS `Tel` FROM products
) all_telephones

How to select DISTINCT records based on multiple columns and without considering their order

I think I understand what you are looking for but it seems over simplified to your actual problem. Your query you posted was incredibly close to working. You can't reference columns by their alias in the where predicates so you will need to use the string concatenation you had in your column. Then you can simply change the <> to either > or < so you only get one match. This example should work for your problem as I understand it.

declare @Customer table
(
CustID int identity
, Name varchar(10)
, Surname varchar(10)
, City varchar(10)
)

insert @Customer
select 'Foo', 'Foo', 'New York' union all
select 'Bar', 'Bar', 'New York' union all
select 'Smith', 'Smith', 'New York' union all
select 'Alice', 'A', 'London' union all
select 'Bob', 'B', 'London'

SELECT CustomerA = C1.Name + ' ' + C1.Surname
, CustomerB = C2.Name + ' ' + C2.Surname
, C1.City
FROM @Customer C1
JOIN @Customer C2 ON C1.City = C2.City
where C1.Name + ' ' + C1.Surname > C2.Name + ' ' + C2.Surname

Select only distinct values from two columns from a table

You could use a union to create a table of all values from both columns:

select  col1 as BothColumns
from YourTable
union
select col2
from YourTable

Unlike union all, union removes duplicates, even if they come from the same side of the union.

Select distinct values one column into multiple columns

You want to pivot your table, but your table doesn't currently contain the field that you want to pivot on ("col1", "col2", "col3", etc...). You need a row number, partitioned by col1. The Jet database does not provide a ROW_NUMBER function, so you have to fake it by joining the table to itself:

select t1.col1, t1.col2, count(*) as row_num
from [Sheet1$] t1
inner join [Sheet1$] t2 on t2.col1 = t1.col1 and t2.col2 <= t1.col2
group by t1.col1, t1.col2

Now you can pivot on row_num:

transform Min(x.col2) select x.col1
from(
select t1.col1, t1.col2, count(*) as row_num
from [Sheet1$] t1
inner join [Sheet1$] t2 on t2.col1 = t1.col1 and t2.col2 <= t1.col2
group by t1.col1, t1.col2
) x
group by x.col1
pivot x.row_num

Selecting distinct values for multiple columns

Seems like you are simply trying to have all the distinct values at hand. Why? For displaying purposes? It's the application's job, not the server's. You could simply have three queries like this:

SELECT DISTINCT [Food Group] FROM atable;

SELECT DISTINCT Name FROM atable;

SELECT DISTINCT [Caloric Value] FROM atable;

and display their results accordingly.

But if you insist on having them all in one table, you might try this:

WITH atable ([Food Group], Name, [Caloric Value]) AS (
SELECT 'Vegetables', 'Broccoli', 100 UNION ALL
SELECT 'Vegetables', 'Carrots', 80 UNION ALL
SELECT 'Fruits', 'Apples', 120 UNION ALL
SELECT 'Fruits', 'Bananas', 120 UNION ALL
SELECT 'Fruits', 'Oranges', 90
),
atable_numbered AS (
SELECT
[Food Group], Name, [Caloric Value],
fg_rank = DENSE_RANK() OVER (ORDER BY [Food Group]),
n_rank = DENSE_RANK() OVER (ORDER BY Name),
cv_rank = DENSE_RANK() OVER (ORDER BY [Caloric Value])
FROM atable
)
SELECT
fg.[Food Group],
n.Name,
cv.[Caloric Value]
FROM (
SELECT fg_rank FROM atable_numbered UNION
SELECT n_rank FROM atable_numbered UNION
SELECT cv_rank FROM atable_numbered
) r (rank)
LEFT JOIN (
SELECT DISTINCT [Food Group], fg_rank
FROM atable_numbered) fg ON r.rank = fg.fg_rank
LEFT JOIN (
SELECT DISTINCT Name, n_rank
FROM atable_numbered) n ON r.rank = n.n_rank
LEFT JOIN (
SELECT DISTINCT [Caloric Value], cv_rank
FROM atable_numbered) cv ON r.rank = cv.cv_rank
ORDER BY r.rank


Related Topics



Leave a reply



Submit