Mutating multiple columns in a data frame using dplyr
You are really close.
df2 <-
df %>%
mutate(v1v3 = v1 * v3,
v2v4 = v2 * v4)
such a beautifully simple language, right?
For more great tricks please see here.
EDIT:
Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.
# prep the product column names (also acting as row numbers)
df <-
df %>%
mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2))
# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <-
df %>%
gather(column, value, -prod_grp) %>%
mutate(column = as.numeric(sub("v", "", column)),
pair = column - 2) %>%
mutate(pair = if_else(pair < 1, pair + 2, pair))
# summarize the products for each column
prod_df <-
tidy_df %>%
group_by(prod_grp, pair) %>%
summarize(val = prod(value)) %>%
spread(prod_grp, val) %>%
mutate(pair = paste0("v", pair, "v", pair + 2)) %>%
rename(prod_grp = pair)
# put the original frame and summary frames together
final_df <-
df %>%
left_join(prod_df) %>%
select(-prod_grp)
Mutate multiple columns using the dplyr framework
You can use pivot_longer
to have just one column to modify, which is an alternative to mutate(across())
.
You can use case_when
to have multiple conditions, so you do not need to nest multiple if statements. The value will be the one of the first true statement.
library(tidyverse)
apcd_hud_ex <- structure(list(studyid = 1:5, SMOKEFREE_DATE = structure(c(
16283,
16283, 16071, 16071, 16648
), class = "Date"), x2014_03_15 = c(
1,
1, 1, 0, 1
), x2014_04_15 = c(1, 1, 1, 1, 1), x2014_05_15 = c(
1,
1, 1, 1, 1
), x2014_06_15 = c(1, 1, 1, 1, 1), x2014_07_15 = c(
1,
1, 1, 1, 1
), x2014_08_15 = c(1, 1, 1, 1, 1), x2014_09_15 = c(
1,
1, 1, 1, 1
), x2014_10_15 = c(1, 1, 1, 1, 1), x2014_11_15 = c(
1,
1, 1, 1, 1
), x2014_12_15 = c(1, 1, 1, 1, 1), x2015_01_15 = c(
1,
1, 1, 1, 1
)), row.names = c(NA, -5L), class = c(
"tbl_df", "tbl",
"data.frame"
))
apcd_hud_ex %>%
pivot_longer(starts_with("x")) %>%
mutate(
insDate = name %>% str_remove("^x") %>% str_replace_all("_", "-") %>% as.Date(),
value = case_when(
value == 0 ~ 0,
insDate < SMOKEFREE_DATE ~ 1,
insDate >= SMOKEFREE_DATE ~ 2
)
) %>%
select(-insDate) %>%
pivot_wider()
#> # A tibble: 5 × 13
#> studyid SMOKEFREE_DATE x2014_03_15 x2014_04_15 x2014_05_15 x2014_06_15
#> <int> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2014-08-01 1 1 1 1
#> 2 2 2014-08-01 1 1 1 1
#> 3 3 2014-01-01 2 2 2 2
#> 4 4 2014-01-01 0 2 2 2
#> 5 5 2015-08-01 1 1 1 1
#> # … with 7 more variables: x2014_07_15 <dbl>, x2014_08_15 <dbl>,
#> # x2014_09_15 <dbl>, x2014_10_15 <dbl>, x2014_11_15 <dbl>, x2014_12_15 <dbl>,
#> # x2015_01_15 <dbl>
Created on 2022-05-05 by the reprex package (v2.0.0)
Mutate across multiple columns using dplyr
Two possibilities using dplyr
:
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))
This returns
# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows
and without rowwise()
and using base R
s rowMeans()
:
mtcars %>%
mutate(varmean = rowMeans(across(mpg:vs)))
returns
mpg cyl disp hp drat wt qsec vs am gear carb varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750
Dplyr: add multiple columns with mutate/across from character vector
The !!
works for a single element
for(nm in add_cols) test <- test %>%
mutate(!! nm := NA)
-output
> test
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
Or another option is
test %>%
bind_cols(setNames(rep(list(NA), length(add_cols)), add_cols))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
In base R
, this is easier
test[add_cols] <- NA
Which can be used in a pipe
test %>%
`[<-`(., add_cols, value = NA)
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
across
works only if the columns are already present i.e. it is suggesting to loop across
the columns present in the data and do some modification/create new columns with .names
modification
We could make use add_column
from tibble
library(tibble)
library(janitor)
add_column(test, !!! add_cols) %>%
clean_names %>%
mutate(across(all_of(add_cols), ~ NA))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
How to write for loop to mutate several columns using dplyr?
When you want to mutate several columns the same way, the answer is across()
, not a loop. I'm having trouble matching your code/description with your desired output, so here's a small example that (almost) matches your desired output. The difference is that I kept the original data with the original column names and added _edited
to the modified values - it's easier that way.
df %>%
mutate(across(everything(),
~ coalesce(as.integer(.x > 0), 0),
.names = "{.col}_new"
)) %>%
mutate(across(!contains("new"), I, .names = "{.col}_backup"))
# q1_1 q1_2 q1_1_new q1_2_new q1_1_backup q1_2_backup
# 1 1 2 1 1 1 2
# 2 1 2 1 1 1 2
# 3 1 2 1 1 1 2
# 4 NA NA 0 0 NA NA
# 5 0 0 0 0 0 0
You can see how the new names are defined with {.col}
being the original column name.
The colwise vignette is a good read if you want to learn more about across()
.
Return multiple columns in dplyr mutate
Well, you don't have to modify your function. Just do this
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)$Treatment # see here
)
Output
# A tibble: 84 x 7
Plant Type Treatment conc uptake conc2 Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1
# ... with 74 more rows
For mutation across many columns,
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
bind_cols(as.list(across(starts_with("T"), one_hot)))
)
Output
# A tibble: 84 x 8
Plant Type Treatment conc uptake conc2 IsQuebec Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1 1
# ... with 74 more rows
Add two columns simulataneously via mutate
You can do this by having your function (or a wrapper function) return a data.frame
. When you call it in mutate
, don’t specify a column name (or else you’ll end up with a nested data.frame
column). If you want to specify names for the new columns, you can include them as function arguments as in the below.
library(dplyr)
n <- 1e2; M <- 1e3
variance <- 1
x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)
dat <- data.frame(s = s, x = x)
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}
ci_wrapper <- function(x, alpha = 0.05, names_out = c("ci_lower", "ci_upper")) {
ci <- ci_studclt(x, alpha = alpha)
out <- data.frame(ci[[1]], ci[[2]])
names(out) <- names_out
out
}
# original code was ci_studclt(x, variance)
# but ci_studclt() doesn't take a variance argument, so I omitted
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci_lower ci_upper
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
With specified column names:
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x, names_out = c("ci.lo", "ci.hi")))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci.lo ci.hi
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
R Dplyr mutate new column by calculating from other columns with conditionally replaced values
In base R
, we can subtract 1 from the data, use pmin
to restrict the value greater than 3 to 3 and get the rowSums
df$x3 <- rowSums(pmin(as.matrix(df-1), 3))
-output
> df
x1 x2 x3
1 1 5 3
2 2 4 4
3 3 3 4
4 4 2 4
5 5 1 3
Related Topics
Sort Matrix According to First Column in R
R Shiny - Disable/Able Shinyui Elements
Why Do Logicals (Booleans) in R Require 4 Bytes
R: Determine If a Script Is Running in Windows or Linux
How to Create Datatable with Complex Header in R Shiny
Shiny Selectinput to Select All from Dropdown
Add Annotation and Segments to Groups of Legend Elements
Save Object Using Variable with Object Name
How to Give Color to Each Class in Scatter Plot in R
Multiple Lines for Text Per Legend Label in Ggplot2
Ggplot2 - Shade Area Between Two Vertical Lines
Calling a Function from a Namespace
How to Display Widgets Inline in Shiny
Note in R Cran Check: No Repository Set, So Cyclic Dependency Check Skipped
R: Robust Se's and Model Diagnostics in Stargazer Table
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R