Window Functions: Partition by One Column After Order by Another

Window function based on transition of a column value

The response query would be ordered by some field which maintains the order given in your result set,for the query to work.

You would look for patterns in data where the current value is 0 and the previous value is 1 and start a new grp as below.

Here is a way to do this.

create table t(id int, dest int, emp int);

insert into t 
select 1,893106,0 union all
select 2,717205,1 union all
select 3,888305,0 union all
select 4,312301,1 union all
select 5,645100,0 union all
select 6,222001,0 union all
select 7,761104,1;

commit;

with main_data
as (
select *,case when emp=0 and lag(emp) over(order by id)=1 then
                   1
                   else 0
         end as grp_val
  from t
    )
select *,sum(grp_val) over(order by id) as grp
  from main_data;

+====+========+=====+=========+=====+
| id | dest   | emp | grp_val | grp |
+====+========+=====+=========+=====+
| 1  | 893106 | 0   | 0       | 0   |
+----+--------+-----+---------+-----+
| 2  | 717205 | 1   | 0       | 0   |
+----+--------+-----+---------+-----+
| 3  | 888305 | 0   | 1       | 1   |
+----+--------+-----+---------+-----+
| 4  | 312301 | 1   | 0       | 1   |
+----+--------+-----+---------+-----+
| 5  | 645100 | 0   | 1       | 2   |
+----+--------+-----+---------+-----+
| 6  | 222001 | 0   | 0       | 2   |
+----+--------+-----+---------+-----+
| 7  | 761104 | 1   | 0       | 2   |
+----+--------+-----+---------+-----+

https://sqlize.online/sql/psql14/053971a469e423ef65d97984f9017fbf/

The field in ORDER BY affects the result of window functions

For Aggregate functions generally it is not required to have order in the window definition unless you want to do the aggregation one at a time in an ordered fashion, it is like running total. Simply removing the orders will fix the problem.

If I want to explain it from another way it would be like a window that is expanding row by row as you move on to another row. It is started with the first row, calculate the aggregation with all the rows from before (which in the first row is just the current row!) to the position of row.

if you remove the order, the aggregation will be computed for all the rows in the window definition and no order of applying window will take effect.

You can change the order in window definition to see the effect of it.

Of course, ranking functions need the order and this point is just for the aggregations.

DECLARE @t TABLE
(
    id        varchar(100),
    volume    float,
    prev_date date
);

INSERT INTO @t VALUES
('0318610084', 100, '2019-05-16'),
('0318610084', 200, '2016-06-04');

SELECT
   row_num    = ROW_NUMBER() OVER (PARTITION BY id ORDER BY prev_date),
   rows_count = COUNT(*) OVER (PARTITION BY id),
   vol_total  = SUM(volume) OVER (PARTITION BY  id),
   *
FROM @t;

Enabling order in the window for aggregations added after SqlServer 2012 and it was not part of the first release of the feature in 2005.

For a detailed explanation of the order in window functions on aggregates this is a great help:
Producing a moving average and cumulative total - SqlServer Documentation

Sorting behavior specification in order by for window function in GBQ

Sorting order should be specified on per column basis with ASC being default, so can be omitted. So, Yes - you should use DESC for each column as in below

SELECT partitionDate,
  createdUTC,
  ROW_NUMBER() OVER(PARTITION BY externalid ORDER BY partitionDate DESC, createdUTC DESC NULLS LAST)

PARTITION BY to consider only two specific columns for aggregation?

You need COUNT(DISTINCT, which is unfortunately not supported by SQL Server as a window function.

But you can simulate it with DENSE_RANK and MAX

SELECT
    T.REF_NO,
    T.PRD_GRP,
    T.ACC_NO,
    MAX(T.rn) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY T.REF_NO ORDER BY T.ACC_NO) AS rn
    FROM [TABLE] T
) T;

DENSE_RANK will count up rows ordered by ACC_NO, but ignoring ties, therefore the MAX of that will be the number of distinct values.

db<>fiddle.uk

Find two values of one Column based on value of another column per ID

Use FIRST_VALUE() and SUM() window functions:

SELECT DISTINCT EmpID, CodepayID,
       FIRST_VALUE(Lval) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth) LVal1,
       FIRST_VALUE(Lval) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth DESC) LVal2,
       SUM(Lint) OVER (PARTITION BY EmpID, CodepayID) sum_Lint,
       FIRST_VALUE(Lrmn) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth) Lrmn1,
       FIRST_VALUE(Lrmn) OVER (PARTITION BY EmpID, CodepayID ORDER BY YearMonth DESC) Lrmn2
FROM loan

See the demo.

Order preserved in window function without ORDER BY

Use EXPLAIN (VERBOSE, COSTS OFF) to see what happens:

                        QUERY PLAN                         
═══════════════════════════════════════════════════════════
 WindowAgg
   Output: array_agg(x) OVER (?), array_agg(y) OVER (?), z
   ->  Sort
         Output: z, x, y
         Sort Key: xyz.z
         ->  Seq Scan on laurenz.xyz
               Output: z, x, y

There is only a single sort, so we can deduce that the order will be the same.

But that is not guaranteed, to it is possible (albeit unlikely) that the implementation may change.

But you see that a sort is performed anyway. You may as well add the ORDER BY; all that will do is another sort key, which won't slow down the execution much. So you might just as well add the ORDER BY and be safe.

How to calculate the RANK from another column than the Window order?

Very interesting problem. You seem to want a cumulative ranking of amount by date.

I cannot readily think of a way of doing this using window functions. Here is a method with an explicit JOIN and GROUP BY:

SELECT d.Product_Id, d.Date, d.Amount,
       SUM(CASE WHEN d2.Amount < d.Amount THEN 1 ELSE 0 END) + 1 as rank
FROM Data d JOIN
     Data d2
     ON d2.Product_Id = d.Product_Id AND
        d2.Date <= d.Date
GROUP BY d.Product_Id, d.Date, d.Amount;

Of course, the performance is not as good as a window functions approach would be.

One approach that would work in some databases is to accumulate the amounts into a string or array, and then use string/array manipulations to calculate the rank. However, even that might be tricky.