Create group number for contiguous runs of equal values
If you have numeric values like this, you can use diff
and cumsum
to add up changes in values
x <- c(2,3,9,2,4,4,3,4,4,5,5,5,1)
cumsum(c(1,diff(x)!=0))
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9
in r group values if they are in a row
A tidyverse option.
library(dplyr)
library(tidyr)
mutate(df, groupID = cumsum(replace_na(lag(groups) != groups, FALSE)) + 1)
# groups type groupID
# 1 A Apple 1
# 2 A Apple 1
# 3 A Apple 1
# 4 A Apple 1
# 5 A Apple 1
# 6 B Bread 2
# 7 B Bread 2
# 8 B Bread 2
# 9 B Bread 2
# 10 B Bread 2
# 11 B Bread 2
# 12 A Apple 3
# 13 A Apple 3
# 14 F Fish 4
# 15 F Fish 4
# 16 F Fish 4
# 17 F Fish 4
# 18 A Apple 5
# 19 A Apple 5
# 20 C curry 6
# 21 C curry 6
# 22 C curry 6
# 23 C curry 6
# 24 B Bread 7
# 25 B Bread 7
# 26 B Bread 7
# 27 B Bread 7
# 28 B Bread 7
Grouping first few rows with positive value followed by another group with negative values and so on using R
Using rle
:
df$direction <- with(rle(sign(df$strand)), rep(seq_along(values), lengths))
df
# name strand direction
#1 thrL 1 1
#2 thrA 1 1
#3 thrB 1 1
#4 yaaA -1 2
#5 yaaJ -1 2
#6 talB 1 3
#7 mog 1 3
This can be made shorter with data.table
rleid
.
df$direction <- data.table::rleid(sign(df$strand))
SQL Server - group and number matching contiguous values
Sorry for making this question a bit more complicated than it needed to be but for the sake of closure I have found a solution using the lag function.
In order to achieve what I wanted I continued my cte above with the following:
, a as
(
SELECT
*,
Lag(LSF, 1, LSF) OVER(Partition By code_id ORDER BY id) AS prev_LSF,
Lag(code_id, 1, code_id) OVER(Partition By code_id ORDER BY id) AS prev_code
FROM
LongShortFlat
), b as
(
SELECT
id,
LSF,
code_id,
Sum(CASE
WHEN LSF <> prev_LSF AND code_id = prev_code
THEN 1
ELSE 0
END) OVER(Partition By code_id ORDER BY id) AS grp
FROM
a
)
select * from b order by id
Here is the updated sqlfiddle.
Create group ID for runs of non-zero values
Using rle()
. First create a new vector replacing the zeros with NA.
x <- match(value != 0, TRUE)
with(rle(!is.na(x)), {
lv <- lengths[values]
replace(x, !is.na(x), rep(seq_along(lv), lv))
})
# [1] 1 1 1 1 1 1 NA NA NA 2 2 2 2 2 NA NA 3 3
Group observations chronologically and by group R / data.table
data.table::rleid()
: Consecutive runs of identical values belong to the same group
dt[, tracker := rleid(track), by = student]
student year track tracker
1: 1 2001 Highschool 1
2: 1 2002 Highschool 1
3: 1 2003 Highschool 1
4: 1 2004 Vocational 2
5: 1 2005 Vocational 2
6: 1 2006 Uni 3
7: 1 2007 Vocational 4
8: 1 2008 Vocational 4
9: 2 2001 Vocational 1
10: 2 2002 Vocational 1
11: 2 2003 Highschool 2
12: 2 2004 Highschool 2
13: 2 2005 Highschool 2
14: 2 2006 Highschool 2
15: 2 2007 Vocational 3
16: 2 2008 Vocational 3
Without rleid()
just for fun:
dt[, tracker := cumsum(shift(track, fill = track[1]) != track) + 1L, by = student]
Create group index to group by repeating values in a sorted data.table
data.table::rleid
is specifically designed for such jobs. It generates run-length-type IDs as described in ?rleid
.
In this case you can try:
test[, group_id := rleid(val)]
Add index to runs of equal values, accounting for NA
Using rleid
from data.table
and cumsum
.
library(data.table)
df$event <- rleid(df$dyad) - cumsum(is.na(df$dyad))
df$event[is.na(df$dyad)] <- NA
df
# dyad event
#1 a 1
#2 a 1
#3 b 2
#4 <NA> NA
#5 c 3
#6 <NA> NA
#7 c 4
#8 b 5
Well the above solution does not work when you have consecutive NA
's, in that case we can use :
x = c("a", NA, NA, "a", "b", "b", "c", NA)
y <- cumsum(!duplicated(rleid(x)) & !is.na(x))
y[is.na(x)] <- NA
y
#[1] 1 NA NA 2 3 3 4 NA
Create counter within consecutive runs of values
You need to use sequence
and rle
:
> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1
Group values by unique elements
First of all, (I assume) this is your vector
a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
As per possible solutions, here are few (can't find a good dupe right now)
as.integer(factor(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Also rle
will work the similarly in your specific scenario
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or (which is practically the same)
data.table::rleid(a)
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Though be advised that all 4 solutions have their unique behavior in different scenarios, consider the following vector
a <- c("B110","B110","B110","A220","A220","C330","D440","D440","B110","B110","E550")
And the results of the 4 different solutions:
1.
as.integer(factor(a))
# [1] 2 2 2 1 1 3 4 4 2 2 5
The factor
solution begins with 2
because a
is unsorted and hence the first values are getting higher integer
representation within the factor
function. Hence, this solution is only valid if your vector is sorted, so don't use it other wise.
2.
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 5
This cumsum/duplicated
solution got confused because of "B110"
already been present at the beginning and hence grouped "D440","D440","B110","B110"
into the same group.
3.
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 1 1 5
This match/unique
solution added ones at the end, because it is sensitive to "B110"
showing up in more than one sequences (because of unique
) and hence grouping them all into same group regardless of where they appear
4.
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 5 5 6
This solution only cares about sequences, hence different sequences of "B110"
were grouped into different groups
Related Topics
How Does One Reorder Columns in a Data Frame
Group by Multiple Columns in Dplyr, Using String Vector Input
Select First and Last Row from Grouped Data
Method to Extract Stat_Smooth Line Fit
How to Extract a Single Column from a Data.Frame as a Data.Frame
Painless Way to Install a New Version of R
How to Remove All Whitespace from a String
Incomplete Final Line' Warning When Trying to Read a .Csv File into R
How to Use Grep()/Gsub() to Find Exact Match
Why Do R Objects Not Print in a Function or a "For" Loop
Using the Rjava Package on Win7 64 Bit With R
How to Match Fuzzy Match Strings from Two Datasets
Plotting Contours on an Irregular Grid
Explain a Lazy Evaluation Quirk
Calculating Cumulative Sum For Each Row
Efficiently Generate a Random Sample of Times and Dates Between Two Dates