Using mutate rowwise over a subset of columns
Here's one way to approach row-wise computation in the tidyverse
using purrr::pmap
. This is best used with functions that actually need to be run row by row; simple addition could probably be done a faster way. Basically we use select
to provide the input list to pmap
, which lets us use the select
helpers such as starts_with
or matches
if you need regex.
library(tidyverse)
df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)
df %>%
mutate(
SumA = pmap_dbl(
.l = select(., starts_with("A")),
.f = function(...) sum(...)
)
)
#> # A tibble: 3 x 5
#> ID A1 A2 A3 SumA
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one 1 2 3 6
#> 2 two 1 2 3 6
#> 3 three 1 2 3 6
Created on 2019-01-30 by the reprex package (v0.2.1)
Mutate, row-wise, based on matching strings or NA across subset of columns
A base R
option and an efficient vectorized option would be rowSums
on a logical matrix
nm1 <- startsWith(names(df), 'b')
df$b_is_na <- rowSums(df[nm1] == 'c'|is.na(df[nm1])) > 0
df$b_is_na
#[1] FALSE TRUE TRUE TRUE
It can be also used with the mutate
library(dplyr)
df %>%
mutate(b_is_na = rowSums(select(., starts_with('b')) ==
'c'|is.na(select(., starts_with('b')))) > 0)
# A tibble: 4 x 6
# a b1 b2 b3 b4 b_is_na
# <chr> <chr> <chr> <chr> <chr> <lgl>
#1 a b d e f FALSE
#2 a c <NA> <NA> <NA> TRUE
#3 a <NA> <NA> <NA> <NA> TRUE
#4 <NA> <NA> <NA> <NA> <NA> TRUE
NOTE: Using rowwise
would be an inefficient way
Or with c_across
, but it may not be that optimal
df %>%
rowwise %>%
mutate(b_is_na = {
tmp <- c_across(starts_with('b'))
any(is.na(tmp)|tmp == 'c') }) %>%
ungroup
# A tibble: 4 x 6
# a b1 b2 b3 b4 b_is_na
# <chr> <chr> <chr> <chr> <chr> <lgl>
#1 a b d e f FALSE
#2 a c <NA> <NA> <NA> TRUE
#3 a <NA> <NA> <NA> <NA> TRUE
#4 <NA> <NA> <NA> <NA> <NA> TRUE
Performing dplyr mutate on subset of columns
Am I missing something or would this work as expected:
cols <- paste0("X", c(2,4))
dd %>% mutate(evensum = rowSums(.[cols]), evenmean = rowMeans(.[cols]))
# id X1 X2 X3 X4 X5 evensum evenmean
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.4380811
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.8477439
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.8387535
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.5478768
Or are you specifically looking for a custom function to do this?
Not exactly what you are looking for but if you want to do it inside a pipe you could use select
explicitly inside mutate
like this:
dd %>% mutate(xy = select(., num_range("X", c(2,4))) %>% rowSums)
# id X1 X2 X3 X4 X5 xy
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535
However, it is a bit more complicated if you want to apply several functions. You could use a helper function along the lines of (..not thoroughly tested.. ):
f <- function(x, ...) {
n <- nrow(x)
x <- lapply(list(...), function(y) if (length(y) == 1L) rep(y, n) else y)
matrix(unlist(x), nrow = n, byrow = FALSE)
}
And then apply it like this:
dd %>% mutate(xy = select(., num_range("X", c(2,4))) %>% f(., rowSums(.), max(.)))
# id X1 X2 X3 X4 X5 xy.1 xy.2
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.9888592
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.9888592
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.9888592
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.9888592
Conditional rowwise sum of subset of columns in specific row dplyr
If you want to efficiently update a single row (or small subset of rows) I would use direct assignment, not dplyr
.
var_cols = grep(names(df), pattern = "var[0-9]+", value = T)
recalc_id = 2
df[df$ID %in% recalc_id, "var_total"] = apply(df[df$ID %in% recalc_id, var_cols], 1, \(x) sum(x[x > 0]))
As akrun points out in comments, if it is just a single row, the apply
can be skipped:
i = which(df$ID == recalc_id)
row = unlist(df[i, var_cols])
df$var_total[i] = sum(row[row > 0])
Here's the same thing with dplyr::case_when
, for a dplyr
solution:
df = df %>%
rowwise() %>%
mutate(var_total = case_when(
ID %in% 2 ~{
x <- c_across(starts_with('var[0-9]+'))
sum(x[x > 0])
},
TRUE ~ var_total
)
)
(Note that in both cases we need to change the column name pattern to not include var_total
in the sum.)
rowwise
breaks some vectorization and slows things down, so if you are so concerned about efficiency that recalculating the sum is "too slow", I'd strongly recommend the base
solution. You might even find a non-conditional base solution to be plenty fast enough for this row-wise operation.
Combine: rowwise(), mutate(), across(), for multiple functions
Using pmap()
from purrr
might be more preferable since you need to select the data just once and you can use the select helpers:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
Or with the addition of tidyr
:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)
Is there a way to use rowwise to get means across rows the correct way?
You need to wrap your columns into c_across
:
mydf %>%
rowwise() %>%
mutate(allmeanrow = mean(c_across(a1:a3))) %>%
ungroup()
which gives:
# A tibble: 10 x 4
# Rowwise:
a1 a2 a3 allmeanrow
<dbl> <dbl> <dbl> <dbl>
1 3 9 4 5.33
2 4 7 7 6
3 8 7 4 6.33
4 5 6 5 5.33
5 5 4 6 5
6 8 11 2 7
7 5 7 10 7.33
8 2 0 8 3.33
9 3 8 4 5
10 4 4 11 6.33
Note, i would always ungroup after the rowwise operation because rowwise groups your data by row, so any following action would still be performed rowwise.
See also here: https://dplyr.tidyverse.org/articles/rowwise.html
Rowwise average over increasing no. of columns using for loop inside mutate : dplyr R
You can use purrr::reduce
(or base::Reduce
) to do the iteration.
library(tidyverse)
reduce(2:4, ~ mutate(.x, !!paste0("col1to", .y) := mean(c_across(1:.y))), .init = rowwise(a))
# A tibble: 3 x 7
# Rowwise:
A B C D col1to2 col1to3 col1to4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 4 1.5 2 2.5
2 5 6 7 8 5.5 6 6.5
3 9 10 11 12 9.5 10 10.5
base::Reduce
version:
Reduce(\(x, y) mutate(x, !!paste0("col1to", y) := mean(c_across(1:y))), 2:4, init = rowwise(a))
To fix your for
loop, you need to set different column name to each new column. Otherwise, every new column will have the same name, i.e. "mean(c_across(1:i))"
, and overrides the former column.
b <- rowwise(a)
for(i in 2:4) {
b <- b %>% mutate(!!paste0("col1to", i) := mean(c_across(1:i)))
}
b
Another choice using tidyr::unnest_wider()
:
a %>%
rowwise() %>%
mutate(mean = list(cummean(c_across(1:4))[-1])) %>%
unnest_wider(mean, names_sep = "_")
Row-wise cor() on subset of columns using dplyr::mutate()
You could try
df %>%
rowwise() %>%
do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))
How to change the direction of application of mutate across from column-wise to row-wise?
Try this. (Note row 4 also sums across to 1.)
library(tidyverse)
data <-
data.frame(
To = c("A","B","C"),
A = c(1,2,4),
B = c(3,4,5),
C = c(5,6,7)
)
data <- data %>%
replace(is.na(.), 0) %>%
bind_rows(summarise_all(., ~(if(is.numeric(.)) sum(.) else "Sum")))
data <- cbind(data, Sum = rowSums(data[,-1]))
data %>%
rowwise() %>%
mutate(across(A:Sum, ~ sum(.) / Sum))
#> # A tibble: 4 × 5
#> # Rowwise:
#> To A B C Sum
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 0.111 0.333 0.556 1
#> 2 B 0.167 0.333 0.5 1
#> 3 C 0.25 0.312 0.438 1
#> 4 Sum 0.189 0.324 0.486 1
Created on 2022-05-04 by the reprex package (v2.0.1)
How to get the selected max/min value (i.e. second largest/smallest) across row by dplyr
You can use c_across
along with sort
. The use of rev
here reverses the sorted data, making it easy to select the largest value with index 1, the second-largest with index 2, etc.
Note that column "max2" in your example output makes errors in certain rows (I think you may have been including the "max1" column in some cases).
a %>%
rowwise() %>%
mutate(
max1 = max(gp1, gp2, gp3),
max2 = rev(sort(c_across(c(gp1, gp2, gp3))))[2]
)
gp1 gp2 gp3 max1 max2
<int> <int> <dbl> <dbl> <dbl>
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
Related Topics
How to Round Percentage to 2 Decimal Places in Ggplot2
Get Country (And Continent) from Longitude and Latitude Point in R
Convert Utf8 Code Point Strings Like <U+0161> to Utf8
How to Install The Fftw3 Package of R in Ubuntu 12.04
Error with New R 3.1.3 Version
How to Convert a Data Frame of Integer64 Values to Be a Matrix
Convert Latitude/Longitude to State Plane Coordinates
How to Combine Repelling Labels and Shadow or Halo Text in Ggplot2
Standard Error of Variance Component from The Output of Lmer
How to Manually Set Colours to a Categorical Variables Using Ggplot()
Adding an Image to Shiny Action Button