Group rows based on consecutive line numbers
Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.
transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3
If you want to use dplyr
:
library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
Group data frame row by consecutive value in R
We could use diff
on the adjacent values of 'time', check if the difference is not equal to 1, then change the logical vector to numeric index by taking the cumulative sum (cumsum
) so that there is an increment of 1 at each TRUE value
library(dplyr)
df1 %>%
mutate(grp = cumsum(c(TRUE, diff(time) != 1)))
-output
# A tibble: 12 x 2
time grp
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 10 2
7 11 2
8 20 3
9 30 4
10 31 4
11 32 4
12 40 5
Group Data in R for consecutive rows
In dplyr, I would do this by creating another grouping variable for the consecutive rows. This is what the code cumsum(c(1, diff(weight) != 0)
is doing in the code chunk below. An example of this is also here.
The group creation can be done within group_by
, and then you can proceed accordingly with making any summaries by group.
library(dplyr)
df_in %>%
group_by(ID, group_weight = cumsum(c(1, diff(weight) != 0)), weight) %>%
summarise(start_day = min(start_day), end_day = max(end_day))
Source: local data frame [5 x 5]
Groups: ID, group_weight [?]
ID group_weight weight start_day end_day
(dbl) (dbl) (dbl) (dbl) (dbl)
1 1 1 150 1 7
2 1 2 151 7 10
3 1 3 150 10 30
4 2 4 170 5 20
5 2 5 171 20 30
This approach does leave you with the extra grouping variable in the dataset, which can be removed, if needed, with select(-group_weight)
after ungrouping.
How to group consecutive rows having same event and find average?
We can create a grouping variable with rleid
from data.table
, use that to get the mean
of 'pt' as well as return the first
value of 'Event'
library(dplyr)
library(data.table)
group %>%
group_by(grp = rleid(Event)) %>%
summarise(Event = first(Event), Value = mean(pt)) %>%
select(-grp)
# A tibble: 4 x 2
# Event Value
# <dbl> <dbl>
#1 1 2.5
#2 2 4
#3 1 12.5
#4 2 4
Or using tapply/rle
in base R
with(group, tapply(pt, with(rle(Event),
rep(seq_along(values), lengths)), FUN = mean))
# 1 2 3 4
# 2.5 4.0 12.5 4.0
How to group by consecutive rows in a R dataframe?
With R
, an implementation using dplyr
would be to take the cumulative sum of the logical comparison between the 'pv_type' and the lag
of 'pv_type' as a grouping column and then get the min
and max
of 'price' as two new columns
library(dplyr)
segmentation %>%
group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
default = first(pv_type))) %>%
mutate(min_v = min(price), max_p = max(price))
Update
With the OP's example, the expected output is summarise
d, so we use summarise
instead of mutate
. Also, used rleid
(from data.table
) instead of the logical cumulative sum
library(data.table)
segmentation %>%
group_by(grp = rleid(types)) %>%
summarise(types = first(types), expectedvalues = min(values)) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# types expectedvalues
# <fct> <dbl>
#1 peak 1
#2 valley 0.4
#3 peak 1.2
#4 valley 0.1
Select random consecutive rows by group as a proportion of group length
Maybe something like:
df[df[ , {
k = ceiling(0.1 * .N)
sample(head(.I, -k), 1L) + (0L:(k-1L))
}, cell]$V1]
Idea is to pick an sample from the index vector, but the sample must be at least k spaces away from the end of the vector so that if we happen to pick the kth element from the back, we will use the kth to last element from the back. To do this we use head(.I, -k)
.
head(.I, -k)
remove the last k indices. sample(..., 1L)
randomly picks an element and since when we need k elements, we choose this picked element and the subsequent k-1 elements.
Assign unique id to consecutive rows within a grouping variable in dplyr
We can use gl
library(dplyr)
df <- df %>%
group_by(group) %>%
mutate(id = as.integer(gl(n(), 2, n()))) %>%
ungroup
R - build unique groups based on consecutive rows and factor level
We can do a group by on 'letter' and the run-length-id (rleid
from data.table
) on the 'letter', summarise
to get the mean
of 'time', create the sequence column with row_number()
and select out the 'grp' column
library(dplyr)
library(data.table)
test %>%
group_by(letter, grp = rleid(letter)) %>%
summarise(mean_time = mean(time)) %>%
mutate(id = row_number()) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 3
# letter mean_time id
# <fct> <dbl> <int>
#1 a 2 1
#2 a 6 2
#3 b 4 1
#4 b 9 2
Group rows in data frame based on time difference between consecutive rows
Here is another possibility which groups rows where the time difference between consecutive rows is less than 4 days.
# create date variable
df$date <- with(df, as.Date(paste(YEAR, MONTH, DAY, sep = "-")))
# calculate succesive differences between dates
# and identify gaps larger than 4
df$gap <- c(0, diff(df$date) > 4)
# cumulative sum of 'gap' variable
df$group <- cumsum(df$gap) + 1
df
# YEAR MONTH DAY HOUR LON LAT date gap group
# 1 1860 10 3 13 -19.5 3 1860-10-03 0 1
# 2 1860 10 3 17 -19.5 4 1860-10-03 0 1
# 3 1860 10 3 21 -19.5 5 1860-10-03 0 1
# 4 1860 10 5 5 -20.5 6 1860-10-05 0 1
# 5 1860 10 5 13 -21.5 7 1860-10-05 0 1
# 6 1860 10 5 17 -21.5 8 1860-10-05 0 1
# 7 1860 10 6 1 -22.5 9 1860-10-06 0 1
# 8 1860 10 6 5 -22.5 10 1860-10-06 0 1
# 9 1860 12 5 9 -22.5 -7 1860-12-05 1 2
# 10 1860 12 5 18 -23.5 -8 1860-12-05 0 2
# 11 1860 12 5 22 -23.5 -9 1860-12-05 0 2
# 12 1860 12 6 6 -24.5 -10 1860-12-06 0 2
# 13 1860 12 6 10 -24.5 -11 1860-12-06 0 2
# 14 1860 12 6 18 -24.5 -12 1860-12-06 0 2
Disclaimer: the diff
& cumsum
part is inspired by this Q&A: How to partition a vector into groups of regular, consecutive sequences?.
Related Topics
How to Efficiently Read the First Character from Each Line of a Text File
Ggplot2 Add a Legend for Several Stat_Functions
Shiny Dashboard Mainpanel Height Issue
Add Legend to "Geom_Bar" Using the Ggplot2 Package
Plot Negative Values in Logarithmic Scale with Ggplot 2
Ggplot with Customized Font Not Showing Properly on Shinyapps.Io
How to Set Different Scale Limits for Different Facets
Str_Replace (Package Stringr) Cannot Replace Brackets in R
Different Colors with Gradient for Subgroups on a Treemap Ggplot2 R
Change a Column from Birth Date to Age in R
Fast Way to Group Variables Based on Direct and Indirect Similarities in Multiple Columns
Simple R 3D Interpolation/Surface Plot
R: Why Kable Doesn't Print Inside a for Loop
Update Rows of Data Frame in R
How to Include Custom CSS in HTMLwidgets for R And/Or Leafletr
Binning Data, Finding Results by Group, and Plotting Using R