How to group the same values which is in sequence order
This is a gaps and islands problem. Here is one way to solve it using lag()
and a cumulative sum()
:
select
min(num) num,
count(*) count_num
from (
select
t.*,
sum(case when num = lag_num then 0 else 1 end) over(order by id) grp
from (
select
t.*,
lag(num) over(order by id) lag_num
from #temp t
) t
) t
group by grp
Demo on DB Fiddlde:
num | count_num
--: | --------:
1 | 3
2 | 1
1 | 1
2 | 2
3 | 3
SUM Values by Sequence Number and Group By Flag
This is a classic gaps-and-islands problem.
However, in this case the start of each island is clearly delineated by a P
(or a row that is not C
). So we don't need LAG
for that.
We just need to assign a grouping ID for each island, which we can do using a windowed conditional COUNT
. Then we simply group by that ID.
SELECT
pv.SalesOrder,
SalesOrderLine = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.SalesOrderLine END),
MStockCode = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MStockCode END),
MPrice = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MPrice END),
MBomFlag = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MBomFlag END)
FROM (
SELECT *,
GroupingId = COUNT(NULLIF(t.MBomFlag, 'C')) OVER (PARTITION BY t.SalesOrder ORDER BY t.SalesOrderLine ROWS UNBOUNDED PRECEDING)
FROM @tbl t
) pv
GROUP BY
pv.SalesOrder,
pv.GroupingId;
Note that NULLIF(t.MBomFlag, 'C')
returns null if the flag is C
, so COUNT
will only count the other rows. You could also write that explicitly using COUNT(CASE WHEN t.MBomFlag = 'P' THEN 1 END)
db<>fiddle
Group by numbers that are in sequence
create table #temp
(
IDUnique int Identity(1,1),
ID int,
grp int
)
Insert into #temp(ID) Values(1)
Insert into #temp(ID) Values(36)
Insert into #temp(ID) Values(37)
Insert into #temp(ID) Values(38)
Insert into #temp(ID) Values(50)
Insert into #temp(ID) Values(51)
declare @IDUnique int
declare @PreviousUnique int
declare @ID int
declare @grp int
declare @Previous int
declare @Row int
DECLARE @getAccountID CURSOR SET @getAccountID = CURSOR FOR SELECT Row_Number() Over(Order by IDUnique) Row, IDUnique, ID From #temp
OPEN @getAccountID
FETCH NEXT FROM @getAccountID INTO @Row, @IDUnique, @ID
WHILE @@FETCH_STATUS = 0
BEGIN
IF(@Row = 1)
Begin
update #temp set grp = 1 Where IDUnique = @IDUnique
set @Previous = @ID
set @grp = 1
End
Else If (@Previous + 1 = @ID)
Begin
update #temp set grp = @grp Where IDUnique = @IDUnique
set @Previous = @ID
End
Else
Begin
set @Previous = @ID
set @grp = @grp + 1
update #temp set grp = @grp Where IDUnique = @IDUnique
End
FETCH NEXT FROM @getAccountID INTO @Row, @IDUnique, @ID
END
CLOSE @getAccountID
DEALLOCATE @getAccountID
Select * from #temp
Drop Table #temp
Group by sequence of True
Identify the groups with cumsum
then filter the rows having True
values and use factorize
to assign the ordinal number to each unique group
m = df['val_a']
df.loc[m, 'tx'] = (~m).cumsum()[m].factorize()[0]
Alternatively you can also use groupby
+ ngroup
m = df['val_a']
df['tx'] = m[m].groupby((~m).cumsum()).ngroup()
val_a tx
0 True 0.0
1 True 0.0
2 False NaN
3 False NaN
4 False NaN
5 True 1.0
6 False NaN
7 False NaN
8 True 2.0
9 True 2.0
10 True 2.0
11 True 2.0
12 False NaN
13 True 3.0
14 True 3.0
GROUP BY and aggregate sequential numeric values
There's much value to @a_horse_with_no_name's answer, both as a correct solution and, like I already said in a comment, as a good material for learning how to use different kinds of window functions in PostgreSQL.
And yet I cannot help feeling that the approach taken in that answer is a bit too much of an effort for a problem like this one. Basically, what you need is an additional criterion for grouping before you go on aggregating years in arrays. You've already got company
and profession
, now you only need something to distinguish years that belong to different sequences.
That is just what the above mentioned answer provides and that is precisely what I think can be done in a simpler way. Here's how:
WITH MarkedForGrouping AS (
SELECT
company,
profession,
year,
year - ROW_NUMBER() OVER (
PARTITION BY company, profession
ORDER BY year
) AS seqID
FROM atable
)
SELECT
company,
profession,
array_agg(year) AS years
FROM MarkedForGrouping
GROUP BY
company,
profession,
seqID
Group by sequential data in R
Here is one dplyr
option -
library(dplyr)
df %>%
group_by(gene_name) %>%
mutate(grp = gene_number - lag(gene_number, default = 0) > 2) %>%
group_by(grp = cumsum(grp)) %>%
filter(n() > 1) %>%
ungroup
# gene_name gene_number grp
# <chr> <int> <int>
#1 ENSMUSG00000000001 4732 1
#2 ENSMUSG00000000001 4733 1
#3 ENSMUSG00000000058 7603 2
#4 ENSMUSG00000000058 7604 2
#5 ENSMUSG00000000058 8246 3
#6 ENSMUSG00000000058 8248 3
For each gene_name
subtract the current gene_number
value with the previous one and increment the group count if the difference is greater than 2. Drop the row if a group has a single row in it.
data
df <- structure(list(gene_name = c("ENSMUSG00000000001", "ENSMUSG00000000001",
"ENSMUSG00000000058", "ENSMUSG00000000058", "ENSMUSG00000000058",
"ENSMUSG00000000058", "ENSMUSG00000000058"), gene_number = c(4732L,
4733L, 7603L, 7604L, 8246L, 8248L, 9001L)),
class = "data.frame", row.names = c(NA, -7L))
SELECT / GROUP BY - sequence of a value
A bit of a complex version due to the lack of analytic functions in MySQL; I'm assuming it's the SUM
of val1
you need;
SELECT SUM(val1) val1, MAX(val2) val2
FROM (
SELECT s1.val1, COUNT(s3.val1) grouping, s1.val2 val2
FROM sequence s1
LEFT JOIN sequence s2 ON s1.val1 >= s2.val1
LEFT JOIN sequence s3 ON s3.val1 = s2.val1 - 1 AND s2.val2 <> s3.val2
GROUP BY s1.val1
) a
GROUP BY grouping
ORDER BY grouping
An SQLfiddle to test with.
If you add grouping
to the selection list to make the order obvious, you get
val1 val2 grouping
15 1 0
40 2 1
65 1 2
Related Topics
Is There a Performance Difference Between Between and in with MySQL or in SQL in General
SQL to Find Time Elapsed from Multiple Overlapping Intervals
Does Oracle Roll Back the Transaction on an Error
SQL - Remove the Duplicate Results
SQL Oracle Left Join and Subquery Error: Ora-00905: Missing Keyword
Issues with SQL Comparison and Null Values
SQL Query Pervious Row Optimisation
Shows Blanks for Repeating Values in a Result Set
Postgresql List and Order Tables by Size
SQL Query Where Field Does Not Contain $X
Get Size of Large Object in Postgresql Query
Split Date Range into One Row Per Month in SQL Server
How to Transform Rows to Columns
Can Insert [...] on Conflict Be Used for Foreign Key Violations