Create New Variables With Mutate_At While Keeping the Original Ones

Create new variables with mutate_at while keeping the original ones

Update 2020-06 for dplyr 1.0.0

Starting in dplyr 1.0.0, the across() function supersedes the "scoped variants" of functions such as mutate_at(). The code should look pretty familiar within across(), which is nested inside mutate().

Adding a name to the function(s) you give in the list adds the function name as a suffix.

dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2))) )

# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2

Changing the new columns names is a little easier in 1.0.0 with the .names argument in across(). Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.

dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2)),
.names = "{fn}_{col}" ) )

# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2

Original answer with mutate_at()

Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs() is deprecated and list() with ~ should be used instead.

You can give names to the functions to the list you pass to .funs to make new variables with the names as suffixes attached.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))

# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2

If you want it as a prefix instead, you could then use rename_at to change the names.

dataframe %>% 
mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )

# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl> <dbl> <dbl> <int> <int>
1 1 1 200 1 1
2 2 1 400 1 1
3 3 1 120 1 1
4 4 2 300 2 2
5 5 2 100 2 2
6 6 2 100 2 2

Previous code with funs() from earlier versions of dplyr:

dataframe %>% 
mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )

Keeping original variables with dplyr::mutate_at & varying length of dynamic variables

EDIT

Since , the strings are there in a variable, we cannot hardcode it.

varlist <- c('helloo', 'ooooHH')

we can modify the function to rename varlist object instead.

dataframe %>% 
mutate_at(
vars(varlist),
.funs = funs(cat = ntile(., 2))
) %>%
rename_at(vars(grep("^cat$", names(.))),
funs(sub("cat", paste0(varlist, "_cat"), .)))

Original Answer

A hackish way from the same link would be using rename_at to replace only when we find an exact match for "cat"

library(dplyr)
dataframe %>%
mutate_at(
vars('ooooHH'),
.funs = funs(cat = ntile(., 2))
) %>%
rename_at(vars(grep("^cat$", names(.))), funs(sub("cat", "ooooHH_cat", .)))

# helloo ooooHH ahaaa ooooHH_cat
# <dbl> <dbl> <dbl> <int>
#1 1.00 1.00 200 1
#2 2.00 1.00 400 1
#3 3.00 1.00 120 1
#4 4.00 2.00 300 2
#5 5.00 2.00 100 2
#6 6.00 2.00 100 2

It would not impact when there are more than one column and the renaming is already applied.

dataframe %>%
mutate_at(
vars(contains("oo")),
.funs = funs(cat = ntile(., 2))
) %>%
rename_at(vars(grep("^cat$", names(.))), funs(sub("cat", "ooh_cat", .)))


# helloo ooooHH ahaaa helloo_cat ooooHH_cat
# <dbl> <dbl> <dbl> <int> <int>
#1 1.00 1.00 200 1 1
#2 2.00 1.00 400 1 1
#3 3.00 1.00 120 1 1
#4 4.00 2.00 300 2 2
#5 5.00 2.00 100 2 2
#6 6.00 2.00 100 2 2

mutate_at does not create variable suffixes in some cases?

Here is another idea. We can add setNames(sub("^sqrt$", "other_sqrt", names(.))) after the mutate_at call. The idea is to replace the column name sqrt with other_sqrt. The pattern ^sqrt$ should only match the derived column sqrt if there is only one column named other, which is demonstrated in Example 1. If there are more than one columns with other, such as Example 2, the setNames would not change the column names.

library(dplyr)

# Example 1
df <- data.frame(var1 = 1:2, var2 = 4:5, other = 9)

df %>%
mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
setNames(sub("^sqrt$", "other_sqrt", names(.)))
# var1 var2 other other_sqrt
# 1 1 4 9 3
# 2 2 5 9 3

# Example 2
df2 <- data.frame(var1 = 1:2, var2 = 4:5, other1 = 9, other2 = 16)

df2 %>%
mutate_at(vars(contains("other")), funs("sqrt" = sqrt(.))) %>%
setNames(sub("^sqrt$", "other_sqrt", names(.)))
# var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1 1 4 9 16 3 4
# 2 2 5 9 16 3 4

Or we can design a function to check how many columns contain the string other before manipulating the data frame.

mutate_sqrt <- function(df, string){
string_col <- grep(string, names(df), value = TRUE)
df2 <- df %>% mutate_at(vars(contains(string)), funs("sqrt" = sqrt(.)))
if (length(string_col) == 1){
df2 <- df2 %>% setNames(sub("^sqrt$", paste(string_col, "sqrt", sep = "_"), names(.)))
}
return(df2)
}

mutate_sqrt(df, "other")
# var1 var2 other other_sqrt
# 1 1 4 9 3
# 2 2 5 9 3

mutate_sqrt(df2, "other")
# var1 var2 other1 other2 other1_sqrt other2_sqrt
# 1 1 4 9 16 3 4
# 2 2 5 9 16 3 4

Why is mutate_at not creating a different name for new column when I pass it only one column in vars()?

I don't think this is the expected behavior (or at least shouldn't be), and the good news is that the newest version of dplyr gets rid of this behavior. Currently you can install it using remotes::install_github('tidyverse/dplyr'), but should be on CRAN in the coming month or 2.

mutate_at (and other scoped verbs like mutate_if, summarize_all, etc.) has been replaced by the use of across within existing verbs, and this provides the behavior you are looking for.

library(dplyr)

variables <- c("var1", "var2")

df1 %>%
mutate(across(all_of(variables), .fns = list(cat = as.factor)))
#> # A tibble: 6 x 5
#> var1 var2 var3 var1_cat var2_cat
#> <dbl> <dbl> <dbl> <fct> <fct>
#> 1 1 1 10 1 1
#> 2 2 1 30 2 1
#> 3 3 1 50 3 1
#> 4 4 2 70 4 2
#> 5 5 2 90 5 2
#> 6 6 2 110 6 2

variables <- c("var1")

df1 %>%
mutate(across(all_of(variables), .fns = list(cat = as.factor)))
#> # A tibble: 6 x 4
#> var1 var2 var3 var1_cat
#> <dbl> <dbl> <dbl> <fct>
#> 1 1 1 10 1
#> 2 2 1 30 2
#> 3 3 1 50 3
#> 4 4 2 70 4
#> 5 5 2 90 5
#> 6 6 2 110 6

Session info

sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17763)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_0.8.99.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.3 knitr_1.28 magrittr_1.5 tidyselect_1.0.0
#> [5] R6_2.4.1 rlang_0.4.5.9000 fansi_0.4.1 stringr_1.4.0
#> [9] highr_0.8 tools_3.6.3 xfun_0.12 utf8_1.1.4
#> [13] cli_2.0.2 htmltools_0.4.0 ellipsis_0.3.0 assertthat_0.2.1
#> [17] yaml_2.2.1 digest_0.6.25 tibble_2.1.3 lifecycle_0.2.0
#> [21] crayon_1.3.4 purrr_0.3.3 vctrs_0.2.99.9010 glue_1.3.2
#> [25] evaluate_0.14 rmarkdown_2.1 stringi_1.4.6 compiler_3.6.3
#> [29] pillar_1.4.3 pkgconfig_2.0.3

Mutate_at , add column by specifying column names

You can use :

library(dplyr)
iris %>% mutate_at(vars(contains('Sepal')), list(log = ~log(.)))

However, mutate_at has been deprecated, use across from dplyr 1.0.0

iris %>% mutate(across(contains('Sepal'), log, .names = 'log_{col}'))

# Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_Sepal.Length log_Sepal.Width
#1 5.1 3.5 1.4 0.2 setosa 1.629241 1.252763
#2 4.9 3.0 1.4 0.2 setosa 1.589235 1.098612
#3 4.7 3.2 1.3 0.2 setosa 1.547563 1.163151
#4 4.6 3.1 1.5 0.2 setosa 1.526056 1.131402
#5 5.0 3.6 1.4 0.2 setosa 1.609438 1.280934
#6 5.4 3.9 1.7 0.4 setosa 1.686399 1.360977
#...
#...

Using mutate_at to create new columns by using other columns in the same data frame

You could directly do

x[myvars]/x[dvars]

# b c d
#1 0.4000000 0.5000000 0.5714286
#2 0.5000000 0.5714286 0.6250000
#3 0.5714286 0.6250000 0.6666667

I am not sure if you could do this with mutate_at however, you can use map2_df from purrr

purrr::map2_df(x[myvars], x[dvars], ~ .x/.y)

OR with mapply

mapply("/", x[myvars], x[dvars])

Mutate multiple variable to create multiple new variables

Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns

tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1

To create new columns, instead of overwriting existing ones, we can give name to funs

# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

# remove suffix, add prefix
tb %>%
mutate_at(vars(starts_with("y")), funs(mod = . - z)) %>%
rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

Edit: In dplyr 0.8.0 or higher versions, funs() will be deprecated (source1 & source2), need to use list() instead

tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1

tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

tb %>%
mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>%
rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

Edit 2: dplyr 1.0.0+ has across() function which simplifies this task even further

Basic usage

across() has two primary arguments:

  • The first argument, .cols, selects the columns you want to operate on.
    It uses tidy selection (like select()) so you can pick variables by
    position, name, and type.
  • The second argument, .fns, is a function or list of functions to apply to
    each column. This can also be a purrr style formula (or list of formulas)
    like ~ .x / 2. (This argument is optional, and you can omit it if you just want
    to get the underlying data; you'll see that technique used in
    vignette("rowwise").)
# Control how the names are created with the `.names` argument which 
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>%
mutate(
across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

tb %>%
mutate(
across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1

### Multiple functions
tb %>%
mutate(
across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 10
#> x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 3 3 0 2 4
#> 2 2 1 2 3 3 3 3 -2 -1 0
#> 3 3 6 4 2 1 3 3 5 3 1

Created on 2018-10-29 by the reprex package (v0.2.1)



Related Topics



Leave a reply



Submit