Dplyr Mutate Rowwise Max of Range of Columns

Rowwise operations across columns

For a general solution add rowwise :

library(dplyr)

data.frame(a = c(1:5, 6:10),
b = c(6:10, 1:5)) %>%
rowwise() %>%
mutate(MAX_COLUMN = max(c_across(a:b)))

# a b MAX_COLUMN
# <int> <int> <int>
# 1 1 6 6
# 2 2 7 7
# 3 3 8 8
# 4 4 9 9
# 5 5 10 10
# 6 6 1 6
# 7 7 2 7
# 8 8 3 8
# 9 9 4 9
#10 10 5 10

If you want to take max a faster option would be pmax with do.call.

data.frame(a = c(1:5, 6:10),
b = c(6:10, 1:5)) %>%
mutate(MAX_COLUMN = do.call(pmax, .))

Use max() in dplyr mutate with columns refered to by named index

Perhaps, we can use pmax as a vectorized option

library(dplyr)
library(tidyr)
summary %>%
pivot_wider(names_from = label, values_from = average) %>%
transmute(version, slowest = pmax(pagex, pagey))

Or if it is the other way

summary %>% 
pivot_wider(names_from = version, values_from = average) %>%
mutate(label, slowest = pmax(`0.1`, `0.2`))

If there is more than two columns

library(purrr)
summary %>%
pivot_wider(names_from = version, values_from = average) %>%
mutate(label, slowest = select(., -label) %>% reduce(pmax))

dplyr mutate column while checking range of columns

Here you can use mutate() with c_across() to query all the columns in your condition.

library(tidyverse)

a <- matrix(1:2,12,12) %>%
as.data.frame()

a %>%
rowwise() %>%
mutate(col = as.numeric(!all(c_across(V1:V12) == 1)))
#> # A tibble: 12 x 13
#> # Rowwise:
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 col
#> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 2 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 3 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 4 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 5 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 6 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 7 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 8 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 9 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 10 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 11 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 12 2 2 2 2 2 2 2 2 2 2 2 2 1

As Ian Campbell points out in the comments, dplyr::if_all() can also be used here.

a %>% 
mutate(col = as.numeric(!if_all(everything(), ~ . == 1)))
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 col
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 2 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 3 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 4 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 5 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 6 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 7 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 8 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 9 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 10 2 2 2 2 2 2 2 2 2 2 2 2 1
#> 11 1 1 1 1 1 1 1 1 1 1 1 1 0
#> 12 2 2 2 2 2 2 2 2 2 2 2 2 1

Created on 2022-02-26 by the reprex package (v2.0.1)

Rowwise average over increasing no. of columns using for loop inside mutate : dplyr R

You can use purrr::reduce(or base::Reduce) to do the iteration.

library(tidyverse)

reduce(2:4, ~ mutate(.x, !!paste0("col1to", .y) := mean(c_across(1:.y))), .init = rowwise(a))

# A tibble: 3 x 7
# Rowwise:
A B C D col1to2 col1to3 col1to4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 4 1.5 2 2.5
2 5 6 7 8 5.5 6 6.5
3 9 10 11 12 9.5 10 10.5
  • base::Reduce version:
Reduce(\(x, y) mutate(x, !!paste0("col1to", y) := mean(c_across(1:y))), 2:4, init = rowwise(a))

To fix your for loop, you need to set different column name to each new column. Otherwise, every new column will have the same name, i.e. "mean(c_across(1:i))", and overrides the former column.

b <- rowwise(a)
for(i in 2:4) {
b <- b %>% mutate(!!paste0("col1to", i) := mean(c_across(1:i)))
}

b

Another choice using tidyr::unnest_wider():

a %>%
rowwise() %>%
mutate(mean = list(cummean(c_across(1:4))[-1])) %>%
unnest_wider(mean, names_sep = "_")

How to use dplyr to get column with max value for each row

I think you can try max.col like below

mat_hard %>%
mutate(Class = names(.)[max.col(.)])

which gives

           V1         V2          V3         V4         V5 Class
1 2.1253660 26.4273345 13.72893486 1.7513873 6.2979783 V2
2 1.8323489 10.2414409 5.30846484 0.3347235 29.2477737 V5
3 6.1496678 5.6894422 5.45460715 4.5035747 11.6467207 V5
4 3.5326145 10.3974636 6.55605448 4.4221171 1.7139087 V2
5 21.1107027 9.9280219 0.22849661 0.2101213 1.0334978 V1
6 2.8714553 4.7813566 3.32465853 15.8878010 4.0049670 V4
7 0.6601019 14.6976125 1.37343714 13.4155430 7.5144204 V2
8 5.3986340 9.9330388 28.30681662 5.9243824 8.6695885 V3
9 7.1672128 0.1135649 0.02006355 7.4839158 27.4311080 V5
10 0.3579145 3.3261009 3.59446750 11.3528078 31.4819959 V5
11 3.5569986 1.4915687 11.81571650 12.5108163 10.5650964 V4
12 15.6411692 14.9843178 13.01627289 1.4870455 13.9162441 V1
13 4.0105209 11.6297626 14.03933859 9.1182125 16.6013583 V5
14 0.8267777 19.6671308 25.39573774 1.5730764 22.6813765 V3
15 16.0518859 7.9446867 5.52230477 6.9886905 31.3423870 V5
16 11.1804892 1.2474887 32.80866682 6.0927374 5.4666769 V3
17 1.9020065 0.8736180 0.76056537 6.2290362 22.8229062 V5
18 0.4354699 4.8834713 1.48728908 2.7705605 5.1947573 V5
19 13.9564746 0.4376033 32.46160917 33.5775243 3.6361463 V4
20 0.9488887 11.3126093 21.76888266 1.1800891 9.1619501 V3
21 0.4105029 30.8768108 6.77986834 6.4456033 3.3375528 V2
22 4.8383899 3.3213757 8.96105336 5.3539974 2.9596863 V3
23 23.5980692 0.8854953 15.77773701 17.3438544 3.6268837 V1
24 5.7302813 20.6837055 0.23557108 3.8622885 1.9313057 V2
25 23.7223308 1.6956027 1.29271191 3.6884809 3.7486600 V1
26 0.8390799 11.1018979 16.73708472 1.0896291 5.1491888 V3
27 6.4742757 15.4374730 8.76199843 0.3349979 2.2843753 V2
28 3.0712249 2.8939230 8.65244642 3.1096128 1.3245159 V3
29 8.4365271 30.2740673 30.79814652 5.8697589 1.8603535 V3
30 15.6024932 5.5718871 4.07631202 24.6346215 35.3187257 V5
31 3.7759064 1.6237925 13.80958004 7.4002858 10.5098296 V3
32 2.3559053 8.5405451 11.09127093 16.6616195 10.9618053 V4
33 21.7985378 18.3840789 1.24258382 32.7283077 1.8425573 V4
34 0.5718545 22.2466535 7.35903634 5.6994226 31.8928204 V5
35 0.8731764 11.4922204 1.36448644 0.2167550 8.1839797 V2
36 4.7162801 10.8743625 33.72675944 1.7916643 4.5028127 V3
37 13.7097611 16.1319530 0.84351757 8.1407995 5.7692484 V2
38 0.5347331 7.1313409 10.23327786 24.1837711 0.2850878 V4
39 0.3738863 12.0495186 4.61309257 6.2158783 5.7180108 V2
40 18.9056686 1.7171729 4.53560492 0.8193901 7.8306692 V1

dplyr mutate find max of n next values in column

We can use rollmax from zoo library with align="left", to instruct the window from the current observation along with the following two observations

library(zoo)
my_tbl %>%
mutate(max_3_next = rollmax(z,3, fill = NA, align = "left"))


# A tibble: 250 x 4
x y z max_3_next
<chr> <int> <dbl> <dbl>
1 a 1 45.9 64.4
2 a 2 52.3 84.5
3 a 3 64.4 84.5
4 a 4 84.5 84.5
5 a 5 42.1 86.7
6 a 6 83.9 86.7
7 a 7 86.7 86.7
8 a 8 69.7 69.7
9 a 9 67.8 67.8
10 a 10 33.7 42.3
# ... with 240 more rows

Sorry, I believe that I misunderstand the OP correctly. So here is the correct solution -inspired from Joshua Ulrich answer's at this question- I hope. I will keep the previous answer just in case needed by future readers.

my_tbl %>% 
mutate(max_3_next = rollapply(z, list((1:3)), max, fill=NA, align = "left", partial=TRUE))

# A tibble: 250 x 4
x y z max_3_next
<chr> <int> <dbl> <dbl>
1 a 1 45.9 84.5
2 a 2 52.3 84.5
3 a 3 64.4 84.5
4 a 4 84.5 86.7
5 a 5 42.1 86.7
6 a 6 83.9 86.7
7 a 7 86.7 69.7
8 a 8 69.7 67.8
9 a 9 67.8 42.3
10 a 10 33.7 71.2
# ... with 240 more rows

dplyr rowwise by some columns

You can combine tidyr's spread and gather with dplyr to get the following single pipeline:

x <- data.frame(A=LETTERS[1:5], as.data.frame(matrix(sample(0:5, 25, T), ncol=5)))

y <- x %>%
gather(V, val, -A) %>%
group_by(A) %>%
mutate(perc = val / sum(val)) %>%
select(-val) %>%
spread(V, perc)

With tidy data it's quite easy to get any group-wise sum (rows, columns or any nested index-level) and compute percentages. The spread and gather will get you to and from your input data format.

dplyr rowwise sum and other functions like max

In short: you are expecting the "sum" function to be aware of dplyr data structures like a data frame grouped by row. sum is not aware of it so it just takes the sum of the whole data.frame.

Here is a brief explanation. This:

select(iris, starts_with('Petal')) %>% rowwise() %>% sum()

Can be rewritten without using the pipe operator as the following:

data <- select(iris, starts_with('Petal'))
data <- rowwise(data)
sum(data)

As you can see you were constructing something called a tibble. Then the rowwise call adds additional information on this object and specifies that it should be grouped row-wise.

However only the functions aware of this grouping like summarize and mutate can work like intended. Base R functions like sum are not aware of these objects and treat them as any standard data.frames. And the standard approach for sum() is to sum the entire data frame.

Using mutate works:

select(iris, starts_with('Petal')) %>%
rowwise() %>%
mutate(sum = sum(Petal.Width, Petal.Length))

Result:

Source: local data frame [150 x 3]
Groups: <by row>

# A tibble: 150 x 3
Petal.Length Petal.Width sum
<dbl> <dbl> <dbl>
1 1.40 0.200 1.60
2 1.40 0.200 1.60
3 1.30 0.200 1.50
...


Related Topics



Leave a reply



Submit