Get top 1 row of each group
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
SQL-select top 3 values per group WITH CONDITION
First ROW_NUMBER removes duplicate rows per tag and product_category, second ROW_NUMBER selects top 3 selling products per tag
;WITH cte AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY tag, product_category ORDER BY order_count DESC) AS rn
FROM yourtable
), cte2 AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY tag ORDER BY order_count DESC) AS rn2
FROM cte
WHERE rn = 1
)
SELECT *
FROM cte2
WHERE rn2 <= 3
Demo on SQLFiddle
Next one uses a derived table
;WITH cte AS
(SELECT t2.tag, t2.product_name, t2.product_category, t2.order_count,
ROW_NUMBER() OVER(PARTITION BY t2.tag ORDER BY order_count DESC) AS rn
FROM (SELECT tag, product_category, MAX(order_count) AS maxCount
FROM yourtable
GROUP BY tag, product_category
) t1 JOIN yourtable t2 ON t1.tag = t2.tag
AND t1.product_category = t2.product_category
AND maxCount = order_count
)
SELECT *
FROM cte
WHERE rn <= 3
Demo on SQLFiddle
How to group by a column with duplicate values while pulling all associated columns in table with SQL?
So, not sure if you want to end up with a completely deduped table or not, but this window function + QUALIFY
allow you to choose a record to keep for each dupe:
SELECT *
FROM F1_TABLE
QUALIFY row_number() OVER (PARTITION BY Group_Num ORDER BY Party_ID) = 1;
This example chooses the first Party_ID, but you can choose whatever you want (or let Snowflake choose for you by excluding the ORDER BY
clause.
If this isn't what you are looking for, can you please provide what your resulting table would look like in your question?
Finding duplicate values in a SQL table
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Finding Duplicates: GROUP BY and DISTINCT giving different answers
Your counting logic is off, and mine was too, until I came up with a simple example to better understand your question. Imagine a simple table with only one column, text
:
text
----
A
B
B
C
C
C
Running SELECT COUNT(*)
just yields 6 records, as expected. SELECT DISTINCT text
returns 3 records, for A,B,C
. Finally, SELECT text
with HAVING COUNT(*) > 1
returns only two records, for the B
and C
groups.
None of these numbers add up at all. The issue here is that a distinct select also returns records which are not duplicate, in addition to records which are duplicate. Also, a given duplicate record could occur more than two times. Your current comparison is somewhat apples to oranges.
Edit:
If you want to remove all duplicates in your six-column table, leaving only one distinct record from all columns, then try using a deletable CTE:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ScanNumber, DB_ID, PluginID,
PluginID_Version, Result, ActualValue
ORDER BY (SELECT NULL)) rn
FROM DBAScanResults
)
DELETE
FROM cte
WHERE rn > 1;
Select top 3 most count group by - SQL
Depending on what RDBMS you are using:
SQL SERVER:
SELECT TOP 3 COUNTRY, count(*)
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC
MySQL:
SELECT COUNTRY, count(*)
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC
LIMIT 3
Oracle:
SELECT *
FROM (
SELECT COUNTRY, count(*)
FROM DRUG_SEIZURE
WHERE COUNTRY IS NOT NULL
GROUP BY COUNTRY
ORDER BY count(*) DESC
) mr
WHERE rownum <= 3
ORDER BY rownum;
Related Topics
SQL Query to Search for Room Availability
How to Find the Number of Occurrences of a Particular Character in a String Using SQL
How to Change Date Format in Hive
Select "Where Clause" Evaluation Order
Ways to Validate T-SQL Queries
How to Set a Size Limit for an "Int" Datatype in Postgresql 9.5
Window Functions: Partition by One Column After Order by Another
Ora-30926: Unable to Get a Stable Set of Rows in the Source Tables When Merging Tables
Database/Sql: How to Store Longitude/Latitude Data
SQL Coalesce with Empty String
How to Use a SQL Update Statement to Add 1 Year to a Datetime Column
SQL Server 2008 Express Concat() Doesn't Exist
If I Stop a Long Running Query, Does It Rollback
Concatenate One Field After Group By
Differencein These Two Queries as Getting Two Different Result Set