Window functions to count distinct records
Doing a count(distinct)
as a windows function requires a trick. Several levels of tricks, actually.
Because your request is actually truly simple -- the value is always 1 because rx.drugClass is in the partitioning clause -- I will make an assumption. Let's say you want to count the number of unique drug classes per patid.
If so, do a row_number()
partitioned by patid and drugClass. When this is 1, within a patid, , then a new drugClass is starting. Create a flag that is 1 in this case and 0 in all other cases.
Then, you can simply do a sum
with a partitioning clause to get the number of distinct values.
The query (after formatting it so I can read it), looks like:
select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
(case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
then 1 else 0
end) as IsFirstRowInGroup
from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn,
d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
from DrugsPerTimeFrame as d
) d2 inner join
rx
on rx.patid = d2.patid inner join
DrugTable dt
on dt.drugClass = rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
) t
order by patid
Distinct Counts in a Window Function
Unfortunately, SQL Server does not support COUNT(DISTINCT
as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX
over a DENSE_RANK
, but there are others.
The partitioning clause is the equivalent of GROUP BY
in a normal aggregate, then the value you are DISTINCT
ing goes in the ORDER BY
of the DENSE_RANK
. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;
db<>fiddle
How to do a COUNT(DISTINCT) using window functions with a frame in SQL Server
dense_rank()
gives the dense ranking of the the current record. When you run that with ASC
sort order first, you get the current record's dense rank (unique value rank) from the first element. When you run with DESC
order, then you get the current record's dense rank from the last record. Then you remove 1 because the dense ranking of the current record is counted twice. This gives the total unique values in the whole partition (and repeated for every row).
Since, dense_rank
does not support frames
, you can't use this solution directly. You need to generate the frame
by other means. One way could be JOIN
ing the same table with proper unique id
comparisons. Then, you can use dense_rank
on the combined version.
Please check out the following solution proposal. The assumption there is you have a unique record key (record_id
) available in your table. If you don't have a unique key, add another CTE before the first CTE and generate a unique key for each record (using new_id()
function OR combining multiple columns using concat()
with delimiter in between to account for NULLs
)
; WITH cte AS (
SELECT
record_id
, record_id_6_record_earlier = LEAD(machine_id, 6, NULL) OVER (PARTITION BY model ORDER BY _timestamp)
, .... other columns
FROM mainTable
)
, cte2 AS (
SELECT
c.*
, DistinctCntWithin6PriorRec = dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp)
+ dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp DESC)
- 1
, RN = ROW_NUMBER() OVER (PARTITION BY c.record_id ORDER BY t._timestamp )
FROM cte c
LEFT JOIN mainTable t ON t.record_id BETWEEN c.record_id_6_record_earlier and c.record_id
)
SELECT *
FROM cte2
WHERE RN = 1
There are 2 LIMITATIONS of this solution:
If the frame has less than 6 records, then the
LAG()
function will beNULL
and thus this solution will not work. This can be handled in different ways: One quick way I can think of is to generate 6 LEAD columns (1 record prior, 2 records prior, etc.) and then change theBETWEEN
clause to something like thisBETWEEN COALESCE(c.record_id_6_record_earlier, c.record_id_5_record_earlier, ...., c.record_id_1_record_earlier, c.record_id) and c.record_id
COUNT()
does not countNULL
. ButDENSE_RANK
does. You need account for that too if it applies to your data
SQL Server Count Distinct records with a specific condition in window functions
Unfortunately you can't do COUNT(DISTINC ...) OVER ()
, but here is one workaround
with
cte as
(
select *,
dr = dense_rank() over (partition by [Group], flag order by [TradeMonth])
from yourtable
)
select [Group], [TradeMonth], flag,
max(case when flag = 1 then dr end) over (partition by [Group])
from cte
dbfiddle demo
count distinct window function Databricks
Using collect_set
+ size
functions:
select *, size(collect_set(Marks)) over(partition by Name) from data
count distinct and window functions
You can use a correlated subquery for this:
SELECT id, trxn_dt, trxn_amt, trxn_category,
(SELECT COUNT(DISTINCT trxn_category)
FROM mytable AS t2
WHERE t2.id = t1.id) AS cnt
FROM mytable AS t1
Demo here
Count Distinct Window Function with Groupby
This might work , you could wrangle to get into the format you're after but it produces the answer without subquery.
Uses the awesome GROUPING SETS which allows multiple group-by clauses in a single statement - the exact error you were hitting :-).
Awesome question!
SELECT
COUNT(DISTINCT PURCHASE_ID)
, USER_NAME
, MARKET
FROM
CTE
GROUP BY
GROUPING SETS (USER_NAME, MARKET);
Copy|Paste|Run
WITH CTE AS (SELECT 'JOHN SMITH' USER_NAME, 'NYC' MARKET, 1
PURCHASE_ID
UNION SELECT 'JOHN SMITH' USER_NAME, 'NYC' MARKET, 2 PURCHASE_ID
UNION SELECT 'BOB MILLER' USER_NAME, 'NYC' MARKET, 2 PURCHASE_ID
UNION SELECT 'BOB MILLER' USER_NAME, 'NYC' MARKET, 4 PURCHASE_ID
UNION SELECT 'TIM WILSON' USER_NAME, 'NYC' MARKET, 3 PURCHASE_ID)
SELECT
COUNT(DISTINCT PURCHASE_ID)
, USER_NAME
, MARKET
FROM
CTE
GROUP BY
GROUPING SETS (USER_NAME, MARKET);
Rolling Windows function for count distinct customer state
I don't think you can do this with window functions. You can do this using a lateral join:
select cd.*, cd2.*
from customer_details cd cross join lateral
(select count(distinct cd2.customer_status) as cnt1,
count(distinct case when cd2.calendar_date > cd.calendar_date - interval '3 month' then cd2.customer_status end) as cnt2,
count(distinct case when cd2.calendar_date > cd.calendar_date - interval '6 month' then cd2.customer_status end) as cnt3
from customer_details cd2
where cd2.customer_id = cd.customer_id
) cd2
Related Topics
Limit Characters Returned in Oracle SQL Query
From What Do SQL Parameters Protect You
How to Concatenate All Columns in a Select with SQL Server
Select Distinct Is Slower Than Expected on My Table in Postgresql
Execute Sp_Executesql for Select...Into #Table But Can't Select Out Temp Table Data
How to Pass Table Name as a Parameter in Oracle
Oracle SQL:Get All Integers Between Two Numbers
In VS or of Oracle, Which Faster
Running Total by Grouped Records in Table
Join a Count Query on Generate_Series() and Retrieve Null Values as '0'
Pivot Without Aggregate Function in Mssql 2008 R2
How to Bulk Insert Only New Rows in Postresql
Oracle - Update Join - Non Key-Preserved Table
How to Get List of Values in Group_By Clause
Sql: Sum 3 Columns When One Column Has a Null Value
Select from Table by Knowing Only Date Without Time (Oracle)