Group by Values That Are in Sequence

How to group the same values which is in sequence order

This is a gaps and islands problem. Here is one way to solve it using lag() and a cumulative sum():

select
min(num) num,
count(*) count_num
from (
select
t.*,
sum(case when num = lag_num then 0 else 1 end) over(order by id) grp
from (
select
t.*,
lag(num) over(order by id) lag_num
from #temp t
) t
) t
group by grp

Demo on DB Fiddlde:


num | count_num
--: | --------:
1 | 3
2 | 1
1 | 1
2 | 2
3 | 3

SUM Values by Sequence Number and Group By Flag

This is a classic gaps-and-islands problem.

However, in this case the start of each island is clearly delineated by a P (or a row that is not C). So we don't need LAG for that.

We just need to assign a grouping ID for each island, which we can do using a windowed conditional COUNT. Then we simply group by that ID.

SELECT
pv.SalesOrder,
SalesOrderLine = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.SalesOrderLine END),
MStockCode = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MStockCode END),
MPrice = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MPrice END),
MBomFlag = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MBomFlag END)
FROM (
SELECT *,
GroupingId = COUNT(NULLIF(t.MBomFlag, 'C')) OVER (PARTITION BY t.SalesOrder ORDER BY t.SalesOrderLine ROWS UNBOUNDED PRECEDING)
FROM @tbl t
) pv
GROUP BY
pv.SalesOrder,
pv.GroupingId;

Note that NULLIF(t.MBomFlag, 'C') returns null if the flag is C, so COUNT will only count the other rows. You could also write that explicitly using COUNT(CASE WHEN t.MBomFlag = 'P' THEN 1 END)

db<>fiddle

Group by numbers that are in sequence

create table #temp
(
IDUnique int Identity(1,1),
ID int,
grp int
)

Insert into #temp(ID) Values(1)
Insert into #temp(ID) Values(36)
Insert into #temp(ID) Values(37)
Insert into #temp(ID) Values(38)
Insert into #temp(ID) Values(50)
Insert into #temp(ID) Values(51)

declare @IDUnique int
declare @PreviousUnique int
declare @ID int
declare @grp int
declare @Previous int
declare @Row int

DECLARE @getAccountID CURSOR SET @getAccountID = CURSOR FOR SELECT Row_Number() Over(Order by IDUnique) Row, IDUnique, ID From #temp
OPEN @getAccountID
FETCH NEXT FROM @getAccountID INTO @Row, @IDUnique, @ID
WHILE @@FETCH_STATUS = 0
BEGIN
IF(@Row = 1)
Begin
update #temp set grp = 1 Where IDUnique = @IDUnique
set @Previous = @ID
set @grp = 1
End
Else If (@Previous + 1 = @ID)
Begin
update #temp set grp = @grp Where IDUnique = @IDUnique
set @Previous = @ID
End
Else
Begin
set @Previous = @ID
set @grp = @grp + 1
update #temp set grp = @grp Where IDUnique = @IDUnique
End
FETCH NEXT FROM @getAccountID INTO @Row, @IDUnique, @ID
END
CLOSE @getAccountID
DEALLOCATE @getAccountID

Select * from #temp
Drop Table #temp

Group by sequence of True

Identify the groups with cumsum then filter the rows having True values and use factorize to assign the ordinal number to each unique group

m = df['val_a']
df.loc[m, 'tx'] = (~m).cumsum()[m].factorize()[0]

Alternatively you can also use groupby + ngroup

m = df['val_a']
df['tx'] = m[m].groupby((~m).cumsum()).ngroup()


    val_a   tx
0 True 0.0
1 True 0.0
2 False NaN
3 False NaN
4 False NaN
5 True 1.0
6 False NaN
7 False NaN
8 True 2.0
9 True 2.0
10 True 2.0
11 True 2.0
12 False NaN
13 True 3.0
14 True 3.0

GROUP BY and aggregate sequential numeric values

There's much value to @a_horse_with_no_name's answer, both as a correct solution and, like I already said in a comment, as a good material for learning how to use different kinds of window functions in PostgreSQL.

And yet I cannot help feeling that the approach taken in that answer is a bit too much of an effort for a problem like this one. Basically, what you need is an additional criterion for grouping before you go on aggregating years in arrays. You've already got company and profession, now you only need something to distinguish years that belong to different sequences.

That is just what the above mentioned answer provides and that is precisely what I think can be done in a simpler way. Here's how:

WITH MarkedForGrouping AS (
SELECT
company,
profession,
year,
year - ROW_NUMBER() OVER (
PARTITION BY company, profession
ORDER BY year
) AS seqID
FROM atable
)
SELECT
company,
profession,
array_agg(year) AS years
FROM MarkedForGrouping
GROUP BY
company,
profession,
seqID

Group by sequential data in R

Here is one dplyr option -

library(dplyr)

df %>%
group_by(gene_name) %>%
mutate(grp = gene_number - lag(gene_number, default = 0) > 2) %>%
group_by(grp = cumsum(grp)) %>%
filter(n() > 1) %>%
ungroup

# gene_name gene_number grp
# <chr> <int> <int>
#1 ENSMUSG00000000001 4732 1
#2 ENSMUSG00000000001 4733 1
#3 ENSMUSG00000000058 7603 2
#4 ENSMUSG00000000058 7604 2
#5 ENSMUSG00000000058 8246 3
#6 ENSMUSG00000000058 8248 3

For each gene_name subtract the current gene_number value with the previous one and increment the group count if the difference is greater than 2. Drop the row if a group has a single row in it.

data

df <- structure(list(gene_name = c("ENSMUSG00000000001", "ENSMUSG00000000001", 
"ENSMUSG00000000058", "ENSMUSG00000000058", "ENSMUSG00000000058",
"ENSMUSG00000000058", "ENSMUSG00000000058"), gene_number = c(4732L,
4733L, 7603L, 7604L, 8246L, 8248L, 9001L)),
class = "data.frame", row.names = c(NA, -7L))

SELECT / GROUP BY - sequence of a value

A bit of a complex version due to the lack of analytic functions in MySQL; I'm assuming it's the SUM of val1 you need;

SELECT SUM(val1) val1, MAX(val2) val2
FROM (
SELECT s1.val1, COUNT(s3.val1) grouping, s1.val2 val2
FROM sequence s1
LEFT JOIN sequence s2 ON s1.val1 >= s2.val1
LEFT JOIN sequence s3 ON s3.val1 = s2.val1 - 1 AND s2.val2 <> s3.val2
GROUP BY s1.val1
) a
GROUP BY grouping
ORDER BY grouping

An SQLfiddle to test with.

If you add grouping to the selection list to make the order obvious, you get

val1    val2    grouping
15 1 0
40 2 1
65 1 2


Related Topics



Leave a reply



Submit