Partition Function COUNT() OVER possible using DISTINCT
There is a very simple solution using dense_rank()
dense_rank() over (partition by [Mth] order by [UserAccountKey])
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc)
- 1
This will give you exactly what you were asking for: The number of distinct UserAccountKeys within each month.
Count distinct over partition by
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT)
as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()
s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Count Distinct over partition by sql
Try this:
DECLARE @DataSource TABLE
(
[col1ID] INT
,[col2String] VARCHAR(12)
,[Col3ID] INT
,[Col4String] VARCHAR(12)
,[Col5Data] DATE
);
INSERT INTO @DataSource
VALUES (1, 'xxx', 20, 'abc', '2018-09-14')
,(1, 'xxx', 20, 'xyz', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14');
SELECT *
,dense_rank() over (partition by col1ID, col3ID order by [Col4String]) + dense_rank() over (partition by col1ID, col3ID order by [Col4String] desc) - 1
FROM @DataSource
SQL count distinct over partition by cumulatively
One option is
- creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
- computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte
How to apply: count(distinct ...) over (partition by ... order by) in big query?
I think some of your sample data is incorrect but I did play with it and get a matching result, for the MPE data at least. You can accomplish this by first tagging the "distinctly counted" rows with an extra partition on CUST_ID
and then first ordering on FLAG DESC
. Then you would sum over that in the same way you hoped to apply count(distinct <expr>) over ...
WITH SE AS (
SELECT 1 LINE_ID, 'TW' MARKET_ID, 'X' LOCAL_POS_ID, 'MPE' BC_ID,
1 CUST_ID, '20200201' SALE_CREATION_DATE, 1 FLAG UNION ALL
SELECT 2, 'TW', 'X', 'MPE', 2, '20201005', 1 UNION ALL
SELECT 3, 'TW', 'X', 'MPE', 3, '20200415', 0 UNION ALL
SELECT 4, 'TW', 'X', 'MPE', 1, '20200223', 1 UNION ALL
SELECT 5, 'TW', 'X', 'MPE', 6, '20200217', 1 UNION ALL
SELECT 6, 'TW', 'X', 'MPE', 9, '20200715', 1 UNION ALL
SELECT 7, 'TW', 'X', 'MPE', 4, '20200223', 1 UNION ALL
SELECT 8, 'TW', 'X', 'MPE', 1, '20201008', 1 UNION ALL
SELECT 9, 'TW', 'X', 'MPE', 2, '20201019', 1 UNION ALL
SELECT 10, 'TW', 'X', 'MPE', 1, '20200516', 1 UNION ALL
SELECT 11, 'TW', 'X', 'MPE', 1, '20200129', 1 UNION ALL
SELECT 12, 'TW', 'X', 'MPE', 1, '20201007', 1 UNION ALL
SELECT 13, 'TW', 'X', 'MPE', 2, '20201005', 1 UNION ALL
SELECT 14, 'TW', 'X', 'MPE', 3, '20200505', 1 UNION ALL
SELECT 15, 'TW', 'X', 'MPE', 8, '20201103', 1 UNION ALL
SELECT 16, 'TW', 'X', 'MPE', 9, '20200820', 1
),
DATA AS (
SELECT *,
LEFT(SALE_CREATION_DATE, 6) AS SALE_MONTH,
LEFT(SALE_CREATION_DATE, 4) AS SALE_YEAR,
CASE ROW_NUMBER() OVER (
PARTITION BY MARKET_ID, LOCAL_POS_ID, BC_ID,
LEFT(SALE_CREATION_DATE, 4), CUST_ID
ORDER BY FLAG DESC, LEFT(SALE_CREATION_DATE, 6)
) WHEN 1 THEN FLAG END AS COUNTER /* assumes possible to have no flagged row */
FROM SE
)
SELECT MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_MONTH,
SUM(SUM(COUNTER)) OVER (
PARTITION BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR
ORDER BY SALE_MONTH
) AS NB_ACTIVE_CUSTOMERS
FROM DATA
GROUP BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR, SALE_MONTH
ORDER BY MARKET_ID, LOCAL_POS_ID, BC_ID, SALE_YEAR, SALE_MONTH
Count (Distinct ([value)) OVER (Partition by) in SQL Server 2008
Here's what I recently came across. I got it from this post. So far it works really well for me.
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount
Running Count Distinct using Over Partition By
Return each member only once for the first month they make a purchase, count by month and then apply a Cumulative Sum:
select Year, Country, State, month,
sum(cnt)
over (partition by Year, Country, State
order by month
rows unbounded preceding) AS YTD_Active_Member_Count
from
(
Select Year, Country, State, month,
COUNT(*) as cnt -- 1st purchses per month
From
( -- this assumes there's at least one new active member per year/month/country
-- otherwise there would be mising rows
Select *
from MemberActivity
where ActiveUserFlag > 0 -- only active members
and Month <= 5
-- and year = 2019 -- seems to be for this year only
qualify row_number() -- only first purchase per member/year
over (partition by MBR_ID, year
order by month --? probably there's a purchase_date) = 1
) as dt
group by 1,2,3,4
) as dt
;
Count distinct customers over rolling window partition
For this operation:
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;
You can do pretty much what you want with two levels of aggregation:
select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;
Summing the seconds read as well is a bit tricky, but you can use the same idea:
select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;
Distinct Counts in a Window Function
Unfortunately, SQL Server does not support COUNT(DISTINCT
as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX
over a DENSE_RANK
, but there are others.
The partitioning clause is the equivalent of GROUP BY
in a normal aggregate, then the value you are DISTINCT
ing goes in the ORDER BY
of the DENSE_RANK
. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;
db<>fiddle
Related Topics
Return Value from MySQL Stored Procedure
When Should You Consider Indexing Your SQL Tables
How to Write a Query to Extract Individual Changes from Snapshots of Data
Hive - How to Further Optimize a Hiveql Query
Replace Unicode Characters in T-Sql
Bigquery SQL: Average, Geometric Mean, Remove Outliers, Median
Search Count of Words Within a String Using SQL
Oracle/Sql: Wm_Concat & Order By
Returning the Value of Identity Column After Insertion in Oracle
Join Tables on Nearest Date in the Past, in MySQL
Select Single Row from Child Table for Each Row in Parent Table
Oracle Equivalent of Rowlock, Updlock, Readpast Query Hints
Combining Insert Statements in a Data-Modifying Cte with a Case Expression
Logging SQL Statements of Entity Framework 5 for Database-First Aproach
Calculating Age from Birthday with Oracle Plsql Trigger and Insert the Age in Table