apply a function to several columns at once with mutate
I think it was nearly there. I've used mutate_at
(I think mutate_each
is deprecated) and included the variable names inside vars
:
library(dplyr)
x %>% mutate_at(vars(ADR:star_rating), funs(stringr::str_replace_all(., ",", "")))
#> X. ADR hotel_id city_id star_rating accommodation_type_name
#> 1 4,084 1099.69 2313076 9395 5 Hotel
#> 2 4,084 68.66 583666 17193 2 Bungalow
#> 3 4,084 232.72 1251372 5085 3 Hotel
#> 4 4,084 195.66 1545890 16808 4 Hotel
#> 5 4,084 98 298160 8584 4 Hotel
#> chain_hotel booking_date checkin_date checkout_date city
#> 1 chain 10/5/2016 10/27/2016 10/30/2016 A
#> 2 non-chain 12/4/2016 12/9/2016 12/12/2016 B
#> 3 non-chain 11/6/2016 11/18/2016 11/20/2016 C
#> 4 non-chain 10/22/2016 11/3/2016 11/4/2016 D
#> 5 non-chain 12/11/2016 12/11/2016 12/12/2016 E
How to mutate multiple columns as function of multiple columns systematically?
You can do something like this with mutate(across...
, however, for renaming columns there must be a shortcut.
df %>%
mutate(across(.cols = c(varA.t1, varA.t2),
.fns = ~ .x / get(glue::glue(str_replace(cur_column(), "varA", "varB"))),
.names = "V_{.col}")) %>%
rename_with(~str_replace(., "V_varA", "varC"), starts_with("V_"))
# A tibble: 2 x 7
id varA.t1 varA.t2 varB.t1 varB.t2 varC.t1 varC.t2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 row_1 5 10 2 4 2.5 2.5
2 row_2 20 50 4 6 5 8.33
If there is a long time series you can also create a vector for .cols
beforehand.
Use mutate_at with contains to apply function on multiple columns
Use matches
library(dplyr)
df %>%
mutate_at(vars(matches('a', 'b')), sqrt)
Or specify the match
as a character vector as the documentation says
match - A character vector. If length > 1, the union of the matches is taken.
df %>%
mutate_at(vars(contains(match = c('a', 'b'))), sqrt)
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20
_at/_all
are deprecated in favor of across
df %>%
mutate(across(matches('a', 'b'), sqrt))
-output
ab ba c dc
1 1.000000 2.449490 11 16
2 1.414214 2.645751 12 17
3 1.732051 2.828427 13 18
4 2.000000 3.000000 14 19
5 2.236068 3.162278 15 20
Using functions of multiple columns in a dplyr mutate_at call
This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.
The solution here is to use:
df %>%
mutate_at(.vars = vars(y, z),
.funs = list(~ ifelse(x, ., NA)))
You can also use the new across()
function with mutate()
, like so:
df %>%
mutate(across(c(y, z), ~ ifelse(x, ., NA)))
The use of the formula operator (as in ~ ifelse(...)
) here indicates that ifelse(x, ., NA)
is an anonymous function that is being defined within the call to mutate_at()
.
This works similarly to defining the function outside of the call to mutate_at()
, like so:
temp_fn <- function(input) ifelse(test = df[["x"]],
yes = input,
no = NA)
df %>%
mutate_at(.vars = vars(y, z),
.funs = temp_fn)
Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA))
, but the funs()
function is being deprecated and will soon be removed from dplyr.
Mutate multiple columns using the dplyr framework
You can use pivot_longer
to have just one column to modify, which is an alternative to mutate(across())
.
You can use case_when
to have multiple conditions, so you do not need to nest multiple if statements. The value will be the one of the first true statement.
library(tidyverse)
apcd_hud_ex <- structure(list(studyid = 1:5, SMOKEFREE_DATE = structure(c(
16283,
16283, 16071, 16071, 16648
), class = "Date"), x2014_03_15 = c(
1,
1, 1, 0, 1
), x2014_04_15 = c(1, 1, 1, 1, 1), x2014_05_15 = c(
1,
1, 1, 1, 1
), x2014_06_15 = c(1, 1, 1, 1, 1), x2014_07_15 = c(
1,
1, 1, 1, 1
), x2014_08_15 = c(1, 1, 1, 1, 1), x2014_09_15 = c(
1,
1, 1, 1, 1
), x2014_10_15 = c(1, 1, 1, 1, 1), x2014_11_15 = c(
1,
1, 1, 1, 1
), x2014_12_15 = c(1, 1, 1, 1, 1), x2015_01_15 = c(
1,
1, 1, 1, 1
)), row.names = c(NA, -5L), class = c(
"tbl_df", "tbl",
"data.frame"
))
apcd_hud_ex %>%
pivot_longer(starts_with("x")) %>%
mutate(
insDate = name %>% str_remove("^x") %>% str_replace_all("_", "-") %>% as.Date(),
value = case_when(
value == 0 ~ 0,
insDate < SMOKEFREE_DATE ~ 1,
insDate >= SMOKEFREE_DATE ~ 2
)
) %>%
select(-insDate) %>%
pivot_wider()
#> # A tibble: 5 × 13
#> studyid SMOKEFREE_DATE x2014_03_15 x2014_04_15 x2014_05_15 x2014_06_15
#> <int> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2014-08-01 1 1 1 1
#> 2 2 2014-08-01 1 1 1 1
#> 3 3 2014-01-01 2 2 2 2
#> 4 4 2014-01-01 0 2 2 2
#> 5 5 2015-08-01 1 1 1 1
#> # … with 7 more variables: x2014_07_15 <dbl>, x2014_08_15 <dbl>,
#> # x2014_09_15 <dbl>, x2014_10_15 <dbl>, x2014_11_15 <dbl>, x2014_12_15 <dbl>,
#> # x2015_01_15 <dbl>
Created on 2022-05-05 by the reprex package (v2.0.0)
Dplyr: add multiple columns with mutate/across from character vector
The !!
works for a single element
for(nm in add_cols) test <- test %>%
mutate(!! nm := NA)
-output
> test
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
Or another option is
test %>%
bind_cols(setNames(rep(list(NA), length(add_cols)), add_cols))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
In base R
, this is easier
test[add_cols] <- NA
Which can be used in a pipe
test %>%
`[<-`(., add_cols, value = NA)
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
across
works only if the columns are already present i.e. it is suggesting to loop across
the columns present in the data and do some modification/create new columns with .names
modification
We could make use add_column
from tibble
library(tibble)
library(janitor)
add_column(test, !!! add_cols) %>%
clean_names %>%
mutate(across(all_of(add_cols), ~ NA))
a col_1 col_2
1 1 NA NA
2 2 NA NA
3 3 NA NA
Mutate across multiple columns to create new variable sets
This might be easier in long format, but here's an option you can pursue as wide data.
Using the latest version of dplyr
you can mutate
across
and include .names
argument to define how your want your new columns to look.
library(tidyverse)
my_col <- c("var1", "var2", "var3", "var4")
df %>%
group_by(year) %>%
mutate(across(my_col, mean, .names = "mean_{col}")) %>%
mutate(across(my_col, .names = "relmean_{col}") / across(paste0("mean_", my_col)))
Output
year country var1 var2 var3 var4 mean_var1 mean_var2 mean_var3 mean_var4 relmean_var1 relmean_var2 relmean_var3 relmean_var4
<int> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1910 GER 1 4 10 6 3 5 9 7.5 0.333 0.8 1.11 0.8
2 1911 GER 2 3 11 7 1.5 3.5 10.5 8 1.33 0.857 1.05 0.875
3 1910 FRA 5 6 8 9 3 5 9 7.5 1.67 1.2 0.889 1.2
4 1911 FRA 1 4 10 9 1.5 3.5 10.5 8 0.667 1.14 0.952 1.12
Put multiple functions into a single dplyr mutate across everything e.g. change mulitple different strings
I guess the tidyverse way would be to chain multiple functions together using the pipe operator. This means you only need to call across
once.
df1 %>%
mutate(across(everything(), ~ str_replace(.,"(?i)bar", "open") %>%
str_replace("(?i)foo", "done")))
x1 x2
1 done done
2 open fix
3 done broke
4 open fix
5 done done
6 open fix
7 done broke
8 open fix
mutate(across) to generate multiple new columns in tidyverse
In this case, you can use cur_data()
and cur_column()
to take advantage that we are wanting to sum together columns that have the same suffix but just need to swap out the numbers.
library(dplyr)
df <- data.frame(
oldvar1_a = 1:3,
oldvar2_a = 4:6,
oldvar1_i = 7:9,
oldvar2_i = 10:12,
z = c(1,10,20)
)
mutate(
df,
across(
starts_with("oldvar1"),
~ (.x + cur_data()[gsub("1", "2", cur_column())]) - z,
.names = "{col}_new"
)
)
#> oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar2_a oldvar2_i
#> 1 1 4 7 10 1 4 16
#> 2 2 5 8 11 10 -3 9
#> 3 3 6 9 12 20 -11 1
If you want to use with case_when
, just make sure to index using [[
, you can read more here.
df <- data.frame(
oldvar1_a = 1:3,
oldvar2_a = 4:6,
oldvar1_i = 7:9,
oldvar2_i = 10:12,
z = c(1,2,0)
)
mutate(
df,
across(
starts_with("oldvar1"),
~ case_when(
z == 1 ~ .x,
z == 2 ~ cur_data()[[gsub("1", "2", cur_column())]],
TRUE ~ NA_integer_
),
.names = "{col}_new"
)
)
#> oldvar1_a oldvar2_a oldvar1_i oldvar2_i z oldvar1_a_new oldvar1_i_new
#> 1 1 4 7 10 1 1 7
#> 2 2 5 8 11 2 5 11
#> 3 3 6 9 12 0 NA NA
Related Topics
Convert List to Data Frame While Keeping List-Element Names
Changing Format of Some Axis Labels in Ggplot2 According to Condition
Problems Formatting Date into Format "%Y-%M"
Is There an Error in Round Function in R
R Shiny, How to Make Datatable React to Checkboxes in Datatable
How to Change Angle of Line in Customized Legend in Ggplot2
Remove Text After Final Period in String
Handle Continuous Missing Values in Time-Series Data
Unknown Timezone Name in R Strptime/As.Posixct
How to Put Values on a Boxplot for Median, 1St Quartile and Last Quartile
Dynamic Linking with Rpath Not Working Under Ubuntu 17.10
Create Barplot from Data.Frame
Aligning Data Frame with Missing Values
Time Series Plot with X Axis in "Year"-"Month" in R
Show Content for Menuitem When Menusubitems Exist in Shiny Dashboard