Count Distinct Over Partition by SQL

Partition Function COUNT() OVER possible using DISTINCT

There is a very simple solution using dense_rank()

dense_rank() over (partition by [Mth] order by [UserAccountKey]) 
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc)
- 1

This will give you exactly what you were asking for: The number of distinct UserAccountKeys within each month.

Count distinct over partition by

Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()s minus one:

select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role

Count Distinct over partition by sql

Try this:

DECLARE @DataSource TABLE
(
[col1ID] INT
,[col2String] VARCHAR(12)
,[Col3ID] INT
,[Col4String] VARCHAR(12)
,[Col5Data] DATE
);

INSERT INTO @DataSource
VALUES (1, 'xxx', 20, 'abc', '2018-09-14')
,(1, 'xxx', 20, 'xyz', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14');

SELECT *
,dense_rank() over (partition by col1ID, col3ID order by [Col4String]) + dense_rank() over (partition by col1ID, col3ID order by [Col4String] desc) - 1
FROM @DataSource

Sample Image

SQL count distinct over partition by cumulatively

One option is

  • creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
  • computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte

How to apply: count(distinct ...) over (partition by ... order by) in big query?

I think some of your sample data is incorrect but I did play with it and get a matching result, for the MPE data at least. You can accomplish this by first tagging the "distinctly counted" rows with an extra partition on CUST_ID and then first ordering on FLAG DESC. Then you would sum over that in the same way you hoped to apply count(distinct <expr>) over ...

WITH SE AS (
SELECT 1 LINE_ID, 'TW' MARKET_ID, 'X' LOCAL_POS_ID, 'MPE' BC_ID,
1 CUST_ID, '20200201' SALE_CREATION_DATE, 1 FLAG UNION ALL
SELECT 2, 'TW', 'X', 'MPE', 2, '20201005', 1 UNION ALL
SELECT 3, 'TW', 'X', 'MPE', 3, '20200415', 0 UNION ALL
SELECT 4, 'TW', 'X', 'MPE', 1, '20200223', 1 UNION ALL
SELECT 5, 'TW', 'X', 'MPE', 6, '20200217', 1 UNION ALL
SELECT 6, 'TW', 'X', 'MPE', 9, '20200715', 1 UNION ALL
SELECT 7, 'TW', 'X', 'MPE', 4, '20200223', 1 UNION ALL
SELECT 8, 'TW', 'X', 'MPE', 1, '20201008', 1 UNION ALL
SELECT 9, 'TW', 'X', 'MPE', 2, '20201019', 1 UNION ALL
SELECT 10, 'TW', 'X', 'MPE', 1, '20200516', 1 UNION ALL
SELECT 11, 'TW', 'X', 'MPE', 1, '20200129', 1 UNION ALL
SELECT 12, 'TW', 'X', 'MPE', 1, '20201007', 1 UNION ALL
SELECT 13, 'TW', 'X', 'MPE', 2, '20201005', 1 UNION ALL
SELECT 14, 'TW', 'X', 'MPE', 3, '20200505', 1 UNION ALL
SELECT 15, 'TW', 'X', 'MPE', 8, '20201103', 1 UNION ALL
SELECT 16, 'TW', 'X', 'MPE', 9, '20200820', 1
),
DATA AS (
SELECT *,
LEFT(SALE_CREATION_DATE, 6) AS SALE_MONTH,
LEFT(SALE_CREATION_DATE, 4) AS SALE_YEAR,
CASE ROW_NUMBER() OVER (
PARTITION BY MARKET_ID, LOCAL_POS_ID, BC_ID,
LEFT(SALE_CREATION_DATE, 4), CUST_ID
ORDER BY FLAG DESC, LEFT(SALE_CREATION_DATE, 6)
) WHEN 1 THEN FLAG END AS COUNTER /* assumes possible to have no flagged row */
FROM SE
)
SELECT MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_MONTH,
SUM(SUM(COUNTER)) OVER (
PARTITION BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR
ORDER BY SALE_MONTH
) AS NB_ACTIVE_CUSTOMERS
FROM DATA
GROUP BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR, SALE_MONTH
ORDER BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR, SALE_MONTH

Count (Distinct ([value)) OVER (Partition by) in SQL Server 2008

Here's what I recently came across. I got it from this post. So far it works really well for me.

DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount

Running Count Distinct using Over Partition By

Return each member only once for the first month they make a purchase, count by month and then apply a Cumulative Sum:

select Year, Country, State, month,
sum(cnt)
over (partition by Year, Country, State
order by month
rows unbounded preceding) AS YTD_Active_Member_Count
from
(
Select Year, Country, State, month,
COUNT(*) as cnt -- 1st purchses per month
From
( -- this assumes there's at least one new active member per year/month/country
-- otherwise there would be mising rows
Select *
from MemberActivity
where ActiveUserFlag > 0 -- only active members
and Month <= 5
-- and year = 2019 -- seems to be for this year only
qualify row_number() -- only first purchase per member/year
over (partition by MBR_ID, year
order by month --? probably there's a purchase_date) = 1
) as dt
group by 1,2,3,4
) as dt
;

Count distinct customers over rolling window partition

For this operation:

select p_date, seconds_read, 
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;

You can do pretty much what you want with two levels of aggregation:

select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;

Summing the seconds read as well is a bit tricky, but you can use the same idea:

select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;

Distinct Counts in a Window Function

Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.

So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.

The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.

SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;

db<>fiddle



Related Topics



Leave a reply



Submit