Partition Function COUNT() OVER possible using DISTINCT
There is a very simple solution using dense_rank()
dense_rank() over (partition by [Mth] order by [UserAccountKey])
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc)
- 1
This will give you exactly what you were asking for: The number of distinct UserAccountKeys within each month.
Count Distinct over partition by sql
Try this:
DECLARE @DataSource TABLE
(
[col1ID] INT
,[col2String] VARCHAR(12)
,[Col3ID] INT
,[Col4String] VARCHAR(12)
,[Col5Data] DATE
);
INSERT INTO @DataSource
VALUES (1, 'xxx', 20, 'abc', '2018-09-14')
,(1, 'xxx', 20, 'xyz', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14')
,(2, 'xxx', 30, 'abc', '2018-09-14');
SELECT *
,dense_rank() over (partition by col1ID, col3ID order by [Col4String]) + dense_rank() over (partition by col1ID, col3ID order by [Col4String] desc) - 1
FROM @DataSource
Count (Distinct ([value)) OVER (Partition by) in SQL Server 2008
Here's what I recently came across. I got it from this post. So far it works really well for me.
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount
How to do a COUNT(DISTINCT) using window functions with a frame in SQL Server
dense_rank()
gives the dense ranking of the the current record. When you run that with ASC
sort order first, you get the current record's dense rank (unique value rank) from the first element. When you run with DESC
order, then you get the current record's dense rank from the last record. Then you remove 1 because the dense ranking of the current record is counted twice. This gives the total unique values in the whole partition (and repeated for every row).
Since, dense_rank
does not support frames
, you can't use this solution directly. You need to generate the frame
by other means. One way could be JOIN
ing the same table with proper unique id
comparisons. Then, you can use dense_rank
on the combined version.
Please check out the following solution proposal. The assumption there is you have a unique record key (record_id
) available in your table. If you don't have a unique key, add another CTE before the first CTE and generate a unique key for each record (using new_id()
function OR combining multiple columns using concat()
with delimiter in between to account for NULLs
)
; WITH cte AS (
SELECT
record_id
, record_id_6_record_earlier = LEAD(machine_id, 6, NULL) OVER (PARTITION BY model ORDER BY _timestamp)
, .... other columns
FROM mainTable
)
, cte2 AS (
SELECT
c.*
, DistinctCntWithin6PriorRec = dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp)
+ dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp DESC)
- 1
, RN = ROW_NUMBER() OVER (PARTITION BY c.record_id ORDER BY t._timestamp )
FROM cte c
LEFT JOIN mainTable t ON t.record_id BETWEEN c.record_id_6_record_earlier and c.record_id
)
SELECT *
FROM cte2
WHERE RN = 1
There are 2 LIMITATIONS of this solution:
If the frame has less than 6 records, then the
LAG()
function will beNULL
and thus this solution will not work. This can be handled in different ways: One quick way I can think of is to generate 6 LEAD columns (1 record prior, 2 records prior, etc.) and then change theBETWEEN
clause to something like thisBETWEEN COALESCE(c.record_id_6_record_earlier, c.record_id_5_record_earlier, ...., c.record_id_1_record_earlier, c.record_id) and c.record_id
COUNT()
does not countNULL
. ButDENSE_RANK
does. You need account for that too if it applies to your data
Count distinct over partition by
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT)
as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()
s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Distinct Counts in a Window Function
Unfortunately, SQL Server does not support COUNT(DISTINCT
as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX
over a DENSE_RANK
, but there are others.
The partitioning clause is the equivalent of GROUP BY
in a normal aggregate, then the value you are DISTINCT
ing goes in the ORDER BY
of the DENSE_RANK
. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;
db<>fiddle
SQL count distinct over partition by cumulatively
One option is
- creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
- computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte
Count distinct customers over rolling window partition
For this operation:
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;
You can do pretty much what you want with two levels of aggregation:
select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;
Summing the seconds read as well is a bit tricky, but you can use the same idea:
select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;
Spark sql distinct count over window function
The simplest method is to use row_number()
to identify the first occurrence of each week, and then use a cumulative sum:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by id order by days) as num_unique_weeks
from (select t.*,
row_number() over (partition by id, weeks order by days) as seqnum
from t
) t
Related Topics
What Is the Order of Execution for This SQL Statement
How to Set Auto Increment Primary Key in Postgresql
How to Create Table Using Select Query in SQL Server
Storing Sex (Gender) in Database
How to Delete Duplicate Rows with SQL
Find the Number of Columns in a Table
MySQL Returning the Top 5 of Each Category
In Clause with Null or Is Null
What Can Cause an Oracle Rowid to Change
Write a Number with Two Decimal Places SQL Server
List Columns with Indexes in Postgresql
Add Unique Constraint to Combination of Two Columns
Sql: Select Dynamic Column Name Based on Variable
When How to Save JSON or Xml Data in an SQL Table