Group by Every N Records in T-Sql

Group by every N records in T-SQL

WITH T AS (
SELECT RANK() OVER (ORDER BY ID) Rank,
P.Field1, P.Field2, P.Value1, ...
FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;

Something like that should get you started. If you can provide your actual schema I can update as appropriate.

Group by every N rows within a group

Hmmm . . . You can indicate where a group starts using lag() and row number arithmetic. Then do a cumulative sum. A new group starts on the odd numbered rows of each batch:

select t.*,
sum(is_start) over (order by batch, rowid) as groupid
from (select t.*,
(case when row_number() over (partition by batch order by rowid) % 2 = 1
then 1 else 0
end) as is_start
from t
) t

Assigning new group for every n records in each category

Ok found my solution with the help of this

;with FirstRank as 
(select DENSE_RANK() over (order by Subj, Gender) as rnk, * from @students),
SecondRank AS
(select (ROW_NUMBER() OVER (PARTITION BY rnk ORDER by gender)-1)
/ 2 as rn,*
from FirstRank
)
select DENSE_RANK() OVER (ORDER BY rnk,rn) as GrpNo, Subj, Gender, StdId from SecondRank

SQL Server : create group of N rows each and give group number for each group

Use row_number() to enumerate the rows, arithmetic to assign the group and then take the minimum of the id:

SELECT t.*, MIN(ID) OVER (PARTITION BY grp) as groupnumber       
FROM (SELECT t.*,
( (ROW_NUMBER() OVER (ORDER BY ID) - 1) / 3) as grp
FROM Table
) t
ORDER BY ID ASC;

It is possible to do this without a subquery, but the logic is rather messy:

select t.*,
(case when row_number() over (order by id) % 3 = 0
then lag(id, 2) over (order by id)
when row_number() over (order by id) % 3 = 2
then lag(id, 1) over (order by id)
else id
end) as groupnumber
from table t
order by id;

Selecting the first N rows of each group ordered by date

As well as the row_number solution, another option is CROSS APPLY(SELECT TOP:

SELECT m.masterid,
d.detailid,
m.numbers,
d.date_time,
d.value
FROM masters AS m
CROSS APPLY (
SELECT TOP (3) *
FROM details AS d
WHERE d.date_time >= '2020-01-01'
AND m.masterid = d.masterid
) AS d
WHERE m.tags LIKE '%Tag2%'
ORDER BY m.masterid DESC,
d.date_time;

This may be faster or slower than row_number, mostly depending on cardinalities (quantity of rows) and indexing.

If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number.

grouping every N values

The 1st approach that comes to mind is to use row_number() to annotate the table, then group by blocks of 16 rows.

SELECT min(id) as first_id, max(id) AS last_id, avg(rainfall) AS avg_this_16
FROM (
SELECT id, rainfall, row_number() OVER (order by id) AS n
FROM the_table
) x(id,rainfall,n)
GROUP BY n/16
ORDER BY n/16;

Note that this won't necessarily include 16 samples for the last group.

Alternately you can calculate a running average by using avg() as a window function:

SELECT id, avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING)
FROM the_table;

... possibly annotating that with the row number and selecting the ones you want:

SELECT id AS greatest_id_in_group, avg_last_16_inclusive FROM (
SELECT
id,
avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING) AS avg_last_16_inclusive,
row_number() OVER (ORDER BY id) AS n
FROM the_table
) x WHERE n % 16 = 0;

This will disregard the last n<16 samples, not returning a row for them.

Note that I'm assuming the IDs aren't guaranteed to be contiguous. If they are gap-less, you can just group by id/16 and avoid the window function.

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

GROUP BY every n records (Keeping first record's ID)

Below is for BigQuery Standard SQL (and actually works ... )

#standardSQL
WITH skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-02-13' dt, '123' sku, 10 sales UNION ALL
SELECT '2019-02-14', '123', 10 UNION ALL
SELECT '2019-02-15', '123', 10 UNION ALL
SELECT '2019-02-16', '123', 10 UNION ALL
SELECT '2019-02-17', '123', 10 UNION ALL
SELECT '2019-02-18', '123', 10 UNION ALL
SELECT '2019-02-19', '123', 10 UNION ALL
SELECT '2019-02-20', '123', 10 UNION ALL
SELECT '2019-02-21', '456', 10 UNION ALL
SELECT '2019-02-22', '456', 10
), skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week
-- ORDER BY sku, week

with result

Row begin_date  sku sales   week     
1 2019-02-13 123 70 1
2 2019-02-20 123 10 2
3 2019-02-21 456 20 1

T-SQL update top n rows for each group with n variable for each group (cross apply alternative)

You could try to use simple JOIN instead of correlated subquery:

WITH cte AS
(
SELECT d.id, d.col, d.dest, s.source
FROM (SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY col ORDER BY id) FROM #desttable) d
JOIN #sourcetable s
ON d.col = s.col
AND d.rn <= s.rownum
)
UPDATE cte
SET dest = source;

SELECT *
FROM #desttable;

LiveDemo


You should post your real data sample, data structures and query plans. Otherwise we could only guess how to improve it.



Related Topics



Leave a reply



Submit