How to Select Top 3 Values from Each Group in a Table with SQL Which Have Duplicates

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

SQL-select top 3 values per group WITH CONDITION

First ROW_NUMBER removes duplicate rows per tag and product_category, second ROW_NUMBER selects top 3 selling products per tag

;WITH cte AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY tag, product_category ORDER BY order_count DESC) AS rn
FROM yourtable
), cte2 AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY tag ORDER BY order_count DESC) AS rn2
FROM cte
WHERE rn = 1
)
SELECT *
FROM cte2
WHERE rn2 <= 3

Demo on SQLFiddle

Next one uses a derived table

;WITH cte AS
(SELECT t2.tag, t2.product_name, t2.product_category, t2.order_count,
ROW_NUMBER() OVER(PARTITION BY t2.tag ORDER BY order_count DESC) AS rn
FROM (SELECT tag, product_category, MAX(order_count) AS maxCount
FROM yourtable
GROUP BY tag, product_category
) t1 JOIN yourtable t2 ON t1.tag = t2.tag
AND t1.product_category = t2.product_category
AND maxCount = order_count
)
SELECT *
FROM cte
WHERE rn <= 3

Demo on SQLFiddle

How to group by a column with duplicate values while pulling all associated columns in table with SQL?

So, not sure if you want to end up with a completely deduped table or not, but this window function + QUALIFY allow you to choose a record to keep for each dupe:

SELECT *
FROM F1_TABLE
QUALIFY row_number() OVER (PARTITION BY Group_Num ORDER BY Party_ID) = 1;

This example chooses the first Party_ID, but you can choose whatever you want (or let Snowflake choose for you by excluding the ORDER BY clause.

If this isn't what you are looking for, can you please provide what your resulting table would look like in your question?

Finding duplicate values in a SQL table

SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:

    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

Finding Duplicates: GROUP BY and DISTINCT giving different answers

Your counting logic is off, and mine was too, until I came up with a simple example to better understand your question. Imagine a simple table with only one column, text:

text
----
A
B
B
C
C
C

Running SELECT COUNT(*) just yields 6 records, as expected. SELECT DISTINCT text returns 3 records, for A,B,C. Finally, SELECT text with HAVING COUNT(*) > 1 returns only two records, for the B and C groups.

None of these numbers add up at all. The issue here is that a distinct select also returns records which are not duplicate, in addition to records which are duplicate. Also, a given duplicate record could occur more than two times. Your current comparison is somewhat apples to oranges.

Edit:

If you want to remove all duplicates in your six-column table, leaving only one distinct record from all columns, then try using a deletable CTE:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ScanNumber, DB_ID, PluginID,
PluginID_Version, Result, ActualValue
ORDER BY (SELECT NULL)) rn
FROM DBAScanResults
)

DELETE
FROM cte
WHERE rn > 1;

Select top 3 most count group by - SQL

Depending on what RDBMS you are using:

SQL SERVER:

SELECT TOP 3 COUNTRY, count(*) 
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC

MySQL:

SELECT COUNTRY, count(*) 
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC
LIMIT 3

Oracle:

SELECT *
FROM (
SELECT COUNTRY, count(*)
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC
) mr
WHERE rownum <= 3
ORDER BY rownum;


Related Topics



Leave a reply



Submit