Group by every N records in T-SQL
WITH T AS (
SELECT RANK() OVER (ORDER BY ID) Rank,
P.Field1, P.Field2, P.Value1, ...
FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;
Something like that should get you started. If you can provide your actual schema I can update as appropriate.
Group by every N rows within a group
Hmmm . . . You can indicate where a group starts using lag()
and row number arithmetic. Then do a cumulative sum. A new group starts on the odd numbered rows of each batch:
select t.*,
sum(is_start) over (order by batch, rowid) as groupid
from (select t.*,
(case when row_number() over (partition by batch order by rowid) % 2 = 1
then 1 else 0
end) as is_start
from t
) t
Assigning new group for every n records in each category
Ok found my solution with the help of this
;with FirstRank as
(select DENSE_RANK() over (order by Subj, Gender) as rnk, * from @students),
SecondRank AS
(select (ROW_NUMBER() OVER (PARTITION BY rnk ORDER by gender)-1)
/ 2 as rn,*
from FirstRank
)
select DENSE_RANK() OVER (ORDER BY rnk,rn) as GrpNo, Subj, Gender, StdId from SecondRank
SQL Server : create group of N rows each and give group number for each group
Use row_number()
to enumerate the rows, arithmetic to assign the group and then take the minimum of the id
:
SELECT t.*, MIN(ID) OVER (PARTITION BY grp) as groupnumber
FROM (SELECT t.*,
( (ROW_NUMBER() OVER (ORDER BY ID) - 1) / 3) as grp
FROM Table
) t
ORDER BY ID ASC;
It is possible to do this without a subquery, but the logic is rather messy:
select t.*,
(case when row_number() over (order by id) % 3 = 0
then lag(id, 2) over (order by id)
when row_number() over (order by id) % 3 = 2
then lag(id, 1) over (order by id)
else id
end) as groupnumber
from table t
order by id;
Selecting the first N rows of each group ordered by date
As well as the row_number
solution, another option is CROSS APPLY(SELECT TOP
:
SELECT m.masterid,
d.detailid,
m.numbers,
d.date_time,
d.value
FROM masters AS m
CROSS APPLY (
SELECT TOP (3) *
FROM details AS d
WHERE d.date_time >= '2020-01-01'
AND m.masterid = d.masterid
) AS d
WHERE m.tags LIKE '%Tag2%'
ORDER BY m.masterid DESC,
d.date_time;
This may be faster or slower than row_number
, mostly depending on cardinalities (quantity of rows) and indexing.
If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number
.
grouping every N values
The 1st approach that comes to mind is to use row_number()
to annotate the table, then group by blocks of 16 rows.
SELECT min(id) as first_id, max(id) AS last_id, avg(rainfall) AS avg_this_16
FROM (
SELECT id, rainfall, row_number() OVER (order by id) AS n
FROM the_table
) x(id,rainfall,n)
GROUP BY n/16
ORDER BY n/16;
Note that this won't necessarily include 16 samples for the last group.
Alternately you can calculate a running average by using avg()
as a window function:
SELECT id, avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING)
FROM the_table;
... possibly annotating that with the row number and selecting the ones you want:
SELECT id AS greatest_id_in_group, avg_last_16_inclusive FROM (
SELECT
id,
avg(rainfall) OVER (ORDER BY id ROWS 15 PRECEDING) AS avg_last_16_inclusive,
row_number() OVER (ORDER BY id) AS n
FROM the_table
) x WHERE n % 16 = 0;
This will disregard the last n<16 samples, not returning a row for them.
Note that I'm assuming the IDs aren't guaranteed to be contiguous. If they are gap-less, you can just group by id/16
and avoid the window function.
Get top 1 row of each group
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
GROUP BY every n records (Keeping first record's ID)
Below is for BigQuery Standard SQL (and actually works ... )
#standardSQL
WITH skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-02-13' dt, '123' sku, 10 sales UNION ALL
SELECT '2019-02-14', '123', 10 UNION ALL
SELECT '2019-02-15', '123', 10 UNION ALL
SELECT '2019-02-16', '123', 10 UNION ALL
SELECT '2019-02-17', '123', 10 UNION ALL
SELECT '2019-02-18', '123', 10 UNION ALL
SELECT '2019-02-19', '123', 10 UNION ALL
SELECT '2019-02-20', '123', 10 UNION ALL
SELECT '2019-02-21', '456', 10 UNION ALL
SELECT '2019-02-22', '456', 10
), skus AS (
SELECT sku, MIN(dt) AS start_date
FROM `project.dataset.table`
GROUP BY sku
)
SELECT
MIN(dt) begin_date,
sku,
SUM(sales) sales,
DIV(DATE_DIFF(dt, start_date, DAY) + 7, 7) week
FROM `project.dataset.table` t
JOIN skus s USING(sku)
GROUP BY sku, week
-- ORDER BY sku, week
with result
Row begin_date sku sales week
1 2019-02-13 123 70 1
2 2019-02-20 123 10 2
3 2019-02-21 456 20 1
T-SQL update top n rows for each group with n variable for each group (cross apply alternative)
You could try to use simple JOIN
instead of correlated subquery:
WITH cte AS
(
SELECT d.id, d.col, d.dest, s.source
FROM (SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY col ORDER BY id) FROM #desttable) d
JOIN #sourcetable s
ON d.col = s.col
AND d.rn <= s.rownum
)
UPDATE cte
SET dest = source;
SELECT *
FROM #desttable;
LiveDemo
You should post your real data sample, data structures and query plans. Otherwise we could only guess how to improve it.
Related Topics
Find the Maximum Consecutive Years for Each Id's in a Table(Oracle SQL)
Swap Values for Two Rows in the Same Table in SQL Server
Get the Id of Last Inserted Records
Rename Single Column in Select * in SQL, Select All But a Column
Insert Empty String into Int Column for SQL Server
Does the Order of Tables in a Join Matter, When Left (Outer) Joins Are Used
T-SQL Get Number of Working Days Between 2 Dates
Select Sum and Multiple Columns in 1 Select Statement
Search All Databases for Value
Deleting a Row with a Self-Referencing Foreign Key
Select the First 3 Rows of Each Table in a Database
Limiting the Number of Records in a SQLite Db
Does Liquibase Support Dry Run
Sql: Performance Comparison for Exclusion (Join VS Not In)
Getting a Dynamically-Generated Pivot-Table into a Temp Table