How to Do Conditional Grouping of Data in R

Conditional grouping and summarizing data frame in [R]

A possible option with data.table

library(data.table)
unique(setDT(df)[, time.diff := max(time)-min(time), ID][
time.diff <= 0.3, c('time', 'intensity') := list(mean(time),
sum(intensity)), ID])
# ID time intensity time.diff
#1: A 3.15 30 0.1
#2: B 6.50 30 5.8
#3: B 12.30 40 5.8
#4: C 3.30 110 0.2

Or using dplyr

library(dplyr)
df %>%
group_by(ID) %>%
mutate(time.diff=max(time)-min(time), indx=all(time.diff<=0.3),
intensity=ifelse(indx, sum(intensity), intensity),
time=ifelse(indx, mean(time), time)) %>%
filter(!indx|row_number()==1) %>%
select(-indx)
# ID time intensity time.diff
#1 A 3.15 30 0.1
#2 B 6.50 30 5.8
#3 B 12.30 40 5.8
#4 C 3.30 110 0.2

How to do conditional grouping of data in R?

I think I would use a couple of temporary variables to help you keep track here. Essentially you need to know the first-placed model in the final year as well as the cumulative values of the final year. Then any model that meets the conditions 'Less than 90 in the final year OR first entry in the final year' is retained.

df  %>% 
group_by(Year) %>%
mutate(Share = 100 * sales/ sum(sales),
order = order(order(-Share))) %>%
arrange(Year, order, by_group = TRUE) %>%
mutate(CumulativeShare= cumsum(Share)) %>%
ungroup() %>%
mutate(finalyear = Year == max(Year),
finval = CumulativeShare[finalyear][match(model, model[finalyear])],
finlast = c(FALSE, diff(finalyear) == 1),
keep = finval <90 | finlast[finalyear][match(model, model[finalyear])],
model = ifelse(keep, model, 'insignificant')) %>%
select(-finalyear, -finval, -finlast, -keep)

With your first example data set, this would look like

#> # A tibble: 20 x 6
#> model Year sales Share order CumulativeShare
#> <chr> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 A 2017 900 43.3 1 43.3
#> 2 insignificant 2017 456 21.9 2 65.2
#> 3 insignificant 2017 345 16.6 3 81.8
#> 4 insignificant 2017 235 11.3 4 93.1
#> 5 E 2017 144 6.92 5 100
#> 6 insignificant 2018 555 25.0 1 25.0
#> 7 insignificant 2018 456 20.5 2 45.5
#> 8 insignificant 2018 445 20.0 3 65.5
#> 9 E 2018 434 19.5 4 85.0
#> 10 A 2018 333 15.0 5 100
#> 11 A 2019 8911 31.6 1 31.6
#> 12 E 2019 5555 19.7 2 51.4
#> 13 insignificant 2019 4567 16.2 3 67.6
#> 14 insignificant 2019 4566 16.2 4 83.8
#> 15 insignificant 2019 4560 16.2 5 100
#> 16 E 2020 1180 73.6 1 73.6
#> 17 A 2020 224 14.0 2 87.6
#> 18 insignificant 2020 170 10.6 3 98.2
#> 19 insignificant 2020 15 0.936 4 99.1
#> 20 insignificant 2020 14 0.873 5 100

And with your second data set, it would look like this:

#> # A tibble: 20 x 6
#> model Year sales Share order CumulativeShare
#> <chr> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 insignificant 2017 900 43.3 1 43.3
#> 2 insignificant 2017 456 21.9 2 65.2
#> 3 insignificant 2017 345 16.6 3 81.8
#> 4 insignificant 2017 235 11.3 4 93.1
#> 5 E 2017 144 6.92 5 100
#> 6 insignificant 2018 555 25.0 1 25.0
#> 7 insignificant 2018 456 20.5 2 45.5
#> 8 insignificant 2018 445 20.0 3 65.5
#> 9 E 2018 434 19.5 4 85.0
#> 10 insignificant 2018 333 15.0 5 100
#> 11 insignificant 2019 8911 31.6 1 31.6
#> 12 E 2019 5555 19.7 2 51.4
#> 13 insignificant 2019 4567 16.2 3 67.6
#> 14 insignificant 2019 4566 16.2 4 83.8
#> 15 insignificant 2019 4560 16.2 5 100
#> 16 E 2020 20000 97.9 1 97.9
#> 17 insignificant 2020 224 1.10 2 99.0
#> 18 insignificant 2020 170 0.832 3 99.9
#> 19 insignificant 2020 15 0.0734 4 99.9
#> 20 insignificant 2020 14 0.0686 5 100

Created on 2022-07-14 by the reprex package (v2.0.1)

conditional filtering based on grouped data in R using dplyr

Here's another method that selects directly using math rather than %in%

data %>% filter(col * sign((group < 3) - 0.5) > 0)
#> # A tibble: 76 x 3
#> group year col
#> <int> <int> <dbl>
#> 1 2 1985 2.20
#> 2 3 1986 -0.205
#> 3 4 1991 -2.10
#> 4 3 1994 -0.113
#> 5 2 1997 1.90
#> 6 1 2000 1.37
#> 7 3 2002 -0.805
#> 8 4 2003 -0.535
#> 9 1 2004 0.792
#> 10 3 2006 -1.28
#> # ... with 66 more rows

Conditional grouping in column in data frame in R

We can use

library(dplyr)
df1 %>%
group_by(b) %>%
summarise_at(vars(starts_with("alpha")), sum)

How to create conditional group tags with nested data in R?

Let's assume that your data is stored in a data frame called df. The most straightforward approach would be to first sort the rows of the table by "Level" in descending order and set "new_group" to the values of "Name". We'll also track the per-group totals in a column called "new_values". Then iterate through the rows until a row with new_values < 8 is encountered, at which point that row's "new_group" is changed to that of its parent, and its "Parent" is also updated to match its parent's "Parent". At that point, the row loop restarts. The outer loop terminates when no "new_group"s have new_values < 8:

library(tidyverse)

df_sorted <- df %>%
arrange(desc(Level)) %>%
mutate(new_group = Name) %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values)) %>%
ungroup

while (any(df_sorted$new_values < 8, na.rm = T)) {

for (i in 1:nrow(df_sorted)) {

if (df_sorted$new_values[i] < 8) {

to_id <- df_sorted$Parent[i]
to_row <- which(df_sorted$ID == to_id)

df_sorted$new_group[i] <- df_sorted$Name[to_row]
df_sorted$Parent[i] <- df_sorted$Parent[to_row]

df_sorted <- df_sorted %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values)) %>%
ungroup

break # terminate the for loop immediately and return to the outer while loop
}
}
}

ID Parent Level Name n_values new_group new_values
<dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
1 50 12 5 Times Square 2 New York 21
2 41 12 4 Manhattan 3 New York 21
3 3 12 3 New York 16 New York 21
4 6 12 3 Boston 13 Boston 13
5 83 19 3 London 7 UK 19
6 9 77 3 Oxford 8 Oxford 8
7 11 105 3 Vancouver 8 Vancouver 8
8 12 19 2 USA 17 USA 17
9 77 19 2 UK 12 UK 19
10 105 19 2 Canada 9 Canada 9
11 19 NA 1 Countries NA Countries NA

Edit: The version below adds a "touched" column to track rows that have been modified in the loop, and also adds some checks for NA values. For the data set used above, it produces an identical result to the previous version. It also appears to work correctly on the data set below.

df <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11), Parent = c(NA,19,12,3,41,12,19,77,77,19,105), Level = c(1,2,3,4,5,3,2,3,3,2,3), Name = c("Countries","USA","New York","Manhattan","Times Square", "Boston","UK","London","Oxford","Canada","Vancouver"), n_values = c(NA,0,0,3,2,0,12,7,8,9,8)), class = "data.frame", row.names = c(NA, -11L))

df_sorted <- df %>%
arrange(desc(Level)) %>%
mutate(new_group = Name) %>%
group_by(new_group) %>%
mutate(
new_values = sum(n_values),
touched = is.na(n_values) | n_values >= 8
) %>%
ungroup

while (any(!df_sorted$touched)) {

for (i in 1:nrow(df_sorted)) {

if (df_sorted$new_values[i] < 8 & !is.na(df_sorted$Parent[i]) & any(!df_sorted$touched)) {

to_id <- df_sorted$Parent[i]
to_row <- which(df_sorted$ID == to_id)

df_sorted$new_group[i] <- df_sorted$Name[to_row]
df_sorted$Parent[i] <- df_sorted$Parent[to_row]
df_sorted$touched[i] <- TRUE

df_sorted <- df_sorted %>%
group_by(new_group) %>%
mutate(new_values = sum(n_values, na.rm = T)) %>%
ungroup

break # terminate the for loop immediately and return to the outer while loop
}
}
}

ID Parent Level Name n_values new_group new_values touched
<dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <lgl>
1 50 NA 5 Times Square 2 Countries 5 TRUE
2 41 NA 4 Manhattan 3 Countries 5 TRUE
3 3 NA 3 New York 0 Countries 5 TRUE
4 6 NA 3 Boston 0 Countries 5 TRUE
5 83 19 3 London 7 UK 19 TRUE
6 9 77 3 Oxford 8 Oxford 8 TRUE
7 11 105 3 Vancouver 8 Vancouver 8 TRUE
8 12 NA 2 USA 0 Countries 5 TRUE
9 77 19 2 UK 12 UK 19 TRUE
10 105 19 2 Canada 9 Canada 9 TRUE
11 19 NA 1 Countries NA Countries 5 TRUE

R conditional grouping of rows and numbering of groups

You can do:

 x = rle(df$CRIT)
mask = x$values
x$values[mask] = 0
x$values[!mask] = cumsum(!x$values[!mask])

mutate(df, GRP=inverse.rle(x))

# THR CRIT GRP
#1 13 TRUE 0
#2 17 TRUE 0
#3 19 FALSE 1
#4 22 FALSE 1
#5 21 FALSE 1
#6 19 FALSE 1
#7 17 TRUE 0
#8 12 TRUE 0
#9 12 TRUE 0
#10 17 TRUE 0
#11 20 FALSE 2
#12 20 FALSE 2
#13 20 FALSE 2
#14 17 TRUE 0
#15 17 TRUE 0
#16 13 TRUE 0
#17 20 FALSE 3
#18 20 FALSE 3
#19 17 TRUE 0
#20 13 TRUE 0

How to mutate and map conditional on values of grouping variables?

You can use the function purrr::map_if() to accomplish this. It takes a predicate function and can perform different functions whether the predicate is TRUE or FALSE, like this:

purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)

Full reprex

Here is a reprex based on your data (I add a column to verify that the logic is correct):

library(dplyr, warn.conflicts = FALSE)

tibble(
group1 = rep(letters[1:10],100),
group2 = rep(letters[1:10],100),
var1 = rnorm(1000),
var2 = rnorm(1000)
) %>%
group_by(group1, group2) %>%
tidyr::nest() %>%
mutate(
model = purrr::map_if(
.x = data,
.p = ~ group2 %in% c("a", "b", "c"),
.f = ~lm(var1 ~ var2, .x),
.else = ~lm(var1 ~ 1, .x)
)
) %>%
# Note: I add this column to verify the logic
mutate(
formula = purrr::map_chr(.x = model, ~.x$call %>% rlang::as_label())
)
#> # A tibble: 10 x 5
#> # Groups: group1, group2 [10]
#> group1 group2 data model formula
#> <chr> <chr> <list> <list> <chr>
#> 1 a a <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 2 b b <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 3 c c <tibble [100 x 2]> <lm> lm(formula = var1 ~ var2, data = .x)
#> 4 d d <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 5 e e <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 6 f f <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 7 g g <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 8 h h <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 9 i i <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)
#> 10 j j <tibble [100 x 2]> <lm> lm(formula = var1 ~ 1, data = .x)

Ifelse with conditional on grouped data

Another possible solution, based on a nested ifelse:

library(dplyr)

example2 <- tibble::tribble(
~Group, ~Code, ~Value,
"1", "A", 1,
"1", "B", 1,
"1", "C", 5,
"2", "A", 1,
"2", "B", 5
)

example2 %>%
group_by(Group) %>%
mutate(GroupStatus = ifelse("C" %in% Code,
ifelse(Value[Code == "C"] == 5, 1, 0), 0)) %>%
ungroup

#> # A tibble: 5 × 4
#> Group Code Value GroupStatus
#> <chr> <chr> <dbl> <dbl>
#> 1 1 A 1 1
#> 2 1 B 1 1
#> 3 1 C 5 1
#> 4 2 A 1 0
#> 5 2 B 5 0

Group by with condition using r

You can use :

library(dplyr)

df %>%
group_by(A) %>%
mutate(final = if(any(B == 'yes')) 'C' else 'U')
#Without if/else
#mutate(final = c('U', 'C')[any(B == 'yes') + 1])

# A B C final
# <chr> <chr> <chr> <chr>
#1 w yes C C
#2 x No U U
#3 w yes C C
#4 z No U U

If there are many conditions to check you can use case_when :

df %>%
group_by(A) %>%
mutate(final = case_when(any(B == 'yes') ~'C',
TRUE ~ 'U'))

conditional grouping by value and number of rows in R

One option using dplyr could be to create a new column which would keep an account of row_number and compare the v1 value of one row above and below of those groups which have less than 3 rows and assign the new groups based on it. Here change is the final output.

library(dplyr)
dt1 <- dt %>%
mutate(group = case_when(v1 < 5 ~ 1,
v1 >=5 & v1 <10 ~ 2,
v1 >= 10 ~3),
row = row_number())

dt1 %>%
group_by(group) %>%
mutate(change = if (n() < 3) {
c(dt1$group[first(row) - 1L], dt1$group[last(row) + 1L])[
which.min(c(abs(mean(v1) - dt1$v1[first(row) - 1L]),
abs(mean(v1) - dt1$v1[last(row) + 1L])))]
} else group)

# v1 group row change
# <dbl> <dbl> <int> <dbl>
# 1 3 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 5 2 4 1
# 5 6 2 5 1
# 6 12 3 6 3
# 7 13 3 7 3
# 8 11 3 8 3
# 9 10 3 9 3
#10 0 1 10 1
#11 2 1 11 1
#12 1 1 12 1
#13 3 1 13 1


Related Topics



Leave a reply



Submit