Conditional grouping and summarizing data frame in [R]
A possible option with data.table
library(data.table)
unique(setDT(df)[, time.diff := max(time)-min(time), ID][
time.diff <= 0.3, c('time', 'intensity') := list(mean(time),
sum(intensity)), ID])
# ID time intensity time.diff
#1: A 3.15 30 0.1
#2: B 6.50 30 5.8
#3: B 12.30 40 5.8
#4: C 3.30 110 0.2
Or using dplyr
library(dplyr)
df %>%
group_by(ID) %>%
mutate(time.diff=max(time)-min(time), indx=all(time.diff<=0.3),
intensity=ifelse(indx, sum(intensity), intensity),
time=ifelse(indx, mean(time), time)) %>%
filter(!indx|row_number()==1) %>%
select(-indx)
# ID time intensity time.diff
#1 A 3.15 30 0.1
#2 B 6.50 30 5.8
#3 B 12.30 40 5.8
#4 C 3.30 110 0.2
How to do conditional grouping of data in R?
I think I would use a couple of temporary variables to help you keep track here. Essentially you need to know the first-placed model in the final year as well as the cumulative values of the final year. Then any model that meets the conditions 'Less than 90 in the final year OR first entry in the final year' is retained.
df %>%
group_by(Year) %>%
mutate(Share = 100 * sales/ sum(sales),
order = order(order(-Share))) %>%
arrange(Year, order, by_group = TRUE) %>%
mutate(CumulativeShare= cumsum(Share)) %>%
ungroup() %>%
mutate(finalyear = Year == max(Year),
finval = CumulativeShare[finalyear][match(model, model[finalyear])],
finlast = c(FALSE, diff(finalyear) == 1),
keep = finval <90 | finlast[finalyear][match(model, model[finalyear])],
model = ifelse(keep, model, 'insignificant')) %>%
select(-finalyear, -finval, -finlast, -keep)
With your first example data set, this would look like
#> # A tibble: 20 x 6
#> model Year sales Share order CumulativeShare
#> <chr> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 A 2017 900 43.3 1 43.3
#> 2 insignificant 2017 456 21.9 2 65.2
#> 3 insignificant 2017 345 16.6 3 81.8
#> 4 insignificant 2017 235 11.3 4 93.1
#> 5 E 2017 144 6.92 5 100
#> 6 insignificant 2018 555 25.0 1 25.0
#> 7 insignificant 2018 456 20.5 2 45.5
#> 8 insignificant 2018 445 20.0 3 65.5
#> 9 E 2018 434 19.5 4 85.0
#> 10 A 2018 333 15.0 5 100
#> 11 A 2019 8911 31.6 1 31.6
#> 12 E 2019 5555 19.7 2 51.4
#> 13 insignificant 2019 4567 16.2 3 67.6
#> 14 insignificant 2019 4566 16.2 4 83.8
#> 15 insignificant 2019 4560 16.2 5 100
#> 16 E 2020 1180 73.6 1 73.6
#> 17 A 2020 224 14.0 2 87.6
#> 18 insignificant 2020 170 10.6 3 98.2
#> 19 insignificant 2020 15 0.936 4 99.1
#> 20 insignificant 2020 14 0.873 5 100
And with your second data set, it would look like this:
#> # A tibble: 20 x 6
#> model Year sales Share order CumulativeShare
#> <chr> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 insignificant 2017 900 43.3 1 43.3
#> 2 insignificant 2017 456 21.9 2 65.2
#> 3 insignificant 2017 345 16.6 3 81.8
#> 4 insignificant 2017 235 11.3 4 93.1
#> 5 E 2017 144 6.92 5 100
#> 6 insignificant 2018 555 25.0 1 25.0
#> 7 insignificant 2018 456 20.5 2 45.5
#> 8 insignificant 2018 445 20.0 3 65.5
#> 9 E 2018 434 19.5 4 85.0
#> 10 insignificant 2018 333 15.0 5 100
#> 11 insignificant 2019 8911 31.6 1 31.6
#> 12 E 2019 5555 19.7 2 51.4
#> 13 insignificant 2019 4567 16.2 3 67.6
#> 14 insignificant 2019 4566 16.2 4 83.8
#> 15 insignificant 2019 4560 16.2 5 100
#> 16 E 2020 20000 97.9 1 97.9
#> 17 insignificant 2020 224 1.10 2 99.0
#> 18 insignificant 2020 170 0.832 3 99.9
#> 19 insignificant 2020 15 0.0734 4 99.9
#> 20 insignificant 2020 14 0.0686 5 100
Created on 2022-07-14 by the reprex package (v2.0.1)
conditional filtering based on grouped data in R using dplyr
Here's another method that selects directly using math rather than %in%
data %>% filter(col * sign((group < 3) - 0.5) > 0)
#> # A tibble: 76 x 3
#> group year col
#> <int> <int> <dbl>
#> 1 2 1985 2.20
#> 2 3 1986 -0.205
#> 3 4 1991 -2.10
#> 4 3 1994 -0.113
#> 5 2 1997 1.90
#> 6 1 2000 1.37
#> 7 3 2002 -0.805
#> 8 4 2003 -0.535
#> 9 1 2004 0.792
#> 10 3 2006 -1.28
#> # ... with 66 more rows
Conditional grouping in column in data frame in R
We can use
library(dplyr)
df1 %>%
group_by(b) %>%
summarise_at(vars(starts_with("alpha")), sum)
How to create conditional group tags with nested data in R?
Let's assume that your data is stored in a data frame called df
. The most straightforward approach would be to first sort the rows of the table by "Level" in descending order and set "new_group" to the values of "Name". We'll also track the per-group totals in a column called "new_values". Then iterate through the rows until a row with new_values < 8
is encountered, at which point that row's "new_group" is changed to that of its parent, and its "Parent" is also updated to match its parent's "Parent". At that point, the row loop restarts. The outer loop terminates when no "new_group"s have new_values < 8
:
library(tidyverse)
df_sorted <- df %>%
arrange(desc(Level)) %>%
mutate(new_group = Name) %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values)) %>%
ungroup
while (any(df_sorted$new_values < 8, na.rm = T)) {
for (i in 1:nrow(df_sorted)) {
if (df_sorted$new_values[i] < 8) {
to_id <- df_sorted$Parent[i]
to_row <- which(df_sorted$ID == to_id)
df_sorted$new_group[i] <- df_sorted$Name[to_row]
df_sorted$Parent[i] <- df_sorted$Parent[to_row]
df_sorted <- df_sorted %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values)) %>%
ungroup
break # terminate the for loop immediately and return to the outer while loop
}
}
}
ID Parent Level Name n_values new_group new_values
<dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 50 12 5 Times Square 2 New York 21
2 41 12 4 Manhattan 3 New York 21
3 3 12 3 New York 16 New York 21
4 6 12 3 Boston 13 Boston 13
5 83 19 3 London 7 UK 19
6 9 77 3 Oxford 8 Oxford 8
7 11 105 3 Vancouver 8 Vancouver 8
8 12 19 2 USA 17 USA 17
9 77 19 2 UK 12 UK 19
10 105 19 2 Canada 9 Canada 9
11 19 NA 1 Countries NA Countries NA
Edit: The version below adds a "touched" column to track rows that have been modified in the loop, and also adds some checks for NA
values. For the data set used above, it produces an identical result to the previous version. It also appears to work correctly on the data set below.
df <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11), Parent = c(NA,19,12,3,41,12,19,77,77,19,105), Level = c(1,2,3,4,5,3,2,3,3,2,3), Name = c("Countries","USA","New York","Manhattan","Times Square", "Boston","UK","London","Oxford","Canada","Vancouver"), n_values = c(NA,0,0,3,2,0,12,7,8,9,8)), class = "data.frame", row.names = c(NA, -11L))
df_sorted <- df %>%
arrange(desc(Level)) %>%
mutate(new_group = Name) %>%
group_by(new_group) %>%
mutate(
new_values = sum(n_values),
touched = is.na(n_values) | n_values >= 8
) %>%
ungroup
while (any(!df_sorted$touched)) {
for (i in 1:nrow(df_sorted)) {
if (df_sorted$new_values[i] < 8 & !is.na(df_sorted$Parent[i]) & any(!df_sorted$touched)) {
to_id <- df_sorted$Parent[i]
to_row <- which(df_sorted$ID == to_id)
df_sorted$new_group[i] <- df_sorted$Name[to_row]
df_sorted$Parent[i] <- df_sorted$Parent[to_row]
df_sorted$touched[i] <- TRUE
df_sorted <- df_sorted %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values, na.rm = T)) %>%
ungroup
break # terminate the for loop immediately and return to the outer while loop
}
}
}
ID Parent Level Name n_values new_group new_values touched
<dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <lgl>
1 50 NA 5 Times Square 2 Countries 5 TRUE
2 41 NA 4 Manhattan 3 Countries 5 TRUE
3 3 NA 3 New York 0 Countries 5 TRUE
4 6 NA 3 Boston 0 Countries 5 TRUE
5 83 19 3 London 7 UK 19 TRUE
6 9 77 3 Oxford 8 Oxford 8 TRUE
7 11 105 3 Vancouver 8 Vancouver 8 TRUE
8 12 NA 2 USA 0 Countries 5 TRUE
9 77 19 2 UK 12 UK 19 TRUE
10 105 19 2 Canada 9 Canada 9 TRUE
11 19 NA 1 Countries NA Countries 5 TRUE
R conditional grouping of rows and numbering of groups
You can do:
x = rle(df$CRIT)
mask = x$values
x$values[mask] = 0
x$values[!mask] = cumsum(!x$values[!mask])
mutate(df, GRP=inverse.rle(x))
# THR CRIT GRP
#1 13 TRUE 0
#2 17 TRUE 0
#3 19 FALSE 1
#4 22 FALSE 1
#5 21 FALSE 1
#6 19 FALSE 1
#7 17 TRUE 0
#8 12 TRUE 0
#9 12 TRUE 0
#10 17 TRUE 0
#11 20 FALSE 2
#12 20 FALSE 2
#13 20 FALSE 2
#14 17 TRUE 0
#15 17 TRUE 0
#16 13 TRUE 0
#17 20 FALSE 3
#18 20 FALSE 3
#19 17 TRUE 0
#20 13 TRUE 0
How to mutate and map conditional on values of grouping variables?
You can use the function purrr::map_if()
to accomplish this. It takes a predicate function and can perform different functions whether the predicate is TRUE or FALSE, like this:
purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
Full reprex
Here is a reprex based on your data (I add a column to verify that the logic is correct):
library(dplyr, warn.conflicts = FALSE)
tibble(
group1 = rep(letters[1:10],100),
group2 = rep(letters[1:10],100),
var1 = rnorm(1000),
var2 = rnorm(1000)
) %>%
group_by(group1, group2) %>%
tidyr::nest() %>%
mutate(
model = purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
) %>%
# Note: I add this column to verify the logic
mutate(
formula = purrr::map_chr(.x = model, ~.x$call %>% rlang::as_label())
)
#> # A tibble: 10 x 5
#> # Groups: group1, group2 [10]
#> group1 group2 data model formula
#> <chr> <chr> <list> <list> <chr>
#> 1 a a <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 2 b b <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 3 c c <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 4 d d <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 5 e e <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 6 f f <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 7 g g <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 8 h h <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 9 i i <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 10 j j <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
Ifelse with conditional on grouped data
Another possible solution, based on a nested ifelse
:
library(dplyr)
example2 <- tibble::tribble(
~Group, ~Code, ~Value,
"1", "A", 1,
"1", "B", 1,
"1", "C", 5,
"2", "A", 1,
"2", "B", 5
)
example2 %>%
group_by(Group) %>%
mutate(GroupStatus = ifelse("C" %in% Code,
ifelse(Value[Code == "C"] == 5, 1, 0), 0)) %>%
ungroup
#> # A tibble: 5 × 4
#> Group Code Value GroupStatus
#> <chr> <chr> <dbl> <dbl>
#> 1 1 A 1 1
#> 2 1 B 1 1
#> 3 1 C 5 1
#> 4 2 A 1 0
#> 5 2 B 5 0
Group by with condition using r
You can use :
library(dplyr)
df %>%
group_by(A) %>%
mutate(final = if(any(B == 'yes')) 'C' else 'U')
#Without if/else
#mutate(final = c('U', 'C')[any(B == 'yes') + 1])
# A B C final
# <chr> <chr> <chr> <chr>
#1 w yes C C
#2 x No U U
#3 w yes C C
#4 z No U U
If there are many conditions to check you can use case_when
:
df %>%
group_by(A) %>%
mutate(final = case_when(any(B == 'yes') ~'C',
TRUE ~ 'U'))
conditional grouping by value and number of rows in R
One option using dplyr
could be to create a new column which would keep an account of row_number
and compare the v1
value of one row above and below of those groups which have less than 3 rows and assign the new groups based on it. Here change
is the final output.
library(dplyr)
dt1 <- dt %>%
mutate(group = case_when(v1 < 5 ~ 1,
v1 >=5 & v1 <10 ~ 2,
v1 >= 10 ~3),
row = row_number())
dt1 %>%
group_by(group) %>%
mutate(change = if (n() < 3) {
c(dt1$group[first(row) - 1L], dt1$group[last(row) + 1L])[
which.min(c(abs(mean(v1) - dt1$v1[first(row) - 1L]),
abs(mean(v1) - dt1$v1[last(row) + 1L])))]
} else group)
# v1 group row change
# <dbl> <dbl> <int> <dbl>
# 1 3 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 5 2 4 1
# 5 6 2 5 1
# 6 12 3 6 3
# 7 13 3 7 3
# 8 11 3 8 3
# 9 10 3 9 3
#10 0 1 10 1
#11 2 1 11 1
#12 1 1 12 1
#13 3 1 13 1
Related Topics
Add Moving Average Plot to Time Series Plot in R
Calculate Sum of a List of Variables by Group
Embedding a Miniature Plot Within a Plot
Display Only Months in Daterangeinput or Dateinput for a Shiny App [R Programming]
Fastest Way for Filling-In Missing Dates for Data.Table
Using Prophet Package to Predict by Group in Dataframe in R
R Convert Between Zoo Object and Data Frame, Results Inconsistent for Different Numbers of Columns
Disregarding Simple Warnings/Errors in Trycatch()
Average Values of a Point Dataset to a Grid Dataset
Namespace Dependencies Not Required
How to Properly Document a S3 Method of a Generic from a Different Package, Using Roxygen
Does Roxygen2 Automatically Write Namespace Directives for "Imports:" Packages
R List Files with Multiple Conditions
Multiply Many Columns by a Specific Other Column in R with Data.Table
How to Reference the Local Environment Within a Function, in R