Automatically Generate New Variable Names Using Dplyr Mutate

Weird things with Automatically generate new variable names using dplyr mutate

scale() returns a matrix, and dplyr/tibble isn't automatically coercing it to a vector. By changing your mutate_all() call to the below, we can have it return a vector. I identified this is what was happening by calling class(df1$speed_scaled) and seeing the result of "matrix".

library(tidyverse)
link <- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/computers.csv"
df <- read_csv(link)
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   price = col_double(),
#>   speed = col_double(),
#>   hd = col_double(),
#>   ram = col_double(),
#>   screen = col_double(),
#>   cd = col_character(),
#>   multi = col_character(),
#>   premium = col_character(),
#>   ads = col_double(),
#>   trend = col_double()
#> )

df %>% discard(is.character) %>%
  select(-X1) %>% 
  mutate_all(
    list("scaled" = function(x) scale(x)[[1]]) 
  ) 
#> # A tibble: 6,259 x 14
#>    price speed    hd   ram screen   ads trend price_scaled speed_scaled
#>    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>        <dbl>        <dbl>
#>  1  1499    25    80     4     14    94     1        -1.24        -1.28
#>  2  1795    33    85     2     14    94     1        -1.24        -1.28
#>  3  1595    25   170     4     15    94     1        -1.24        -1.28
#>  4  1849    25   170     8     14    94     1        -1.24        -1.28
#>  5  3295    33   340    16     14    94     1        -1.24        -1.28
#>  6  3695    66   340    16     14    94     1        -1.24        -1.28
#>  7  1720    25   170     4     14    94     1        -1.24        -1.28
#>  8  1995    50    85     2     14    94     1        -1.24        -1.28
#>  9  2225    50   210     8     14    94     1        -1.24        -1.28
#> 10  2575    50   210     4     15    94     1        -1.24        -1.28
#> # ... with 6,249 more rows, and 5 more variables: hd_scaled <dbl>,
#> #   ram_scaled <dbl>, screen_scaled <dbl>, ads_scaled <dbl>,
#> #   trend_scaled <dbl>

Automatically generate new variable names using dplyr mutate

You can use mutate_all (or mutate_at for specific columns) then prepend lag_ to the column names.

data(iris)
library(dplyr) 

lag_iris <- iris %>%
  group_by(Species) %>%
  mutate_all(funs(lag(.))) %>%
  ungroup
colnames(lag_iris) <- paste0('lag_', colnames(lag_iris))

head(lag_iris)

  lag_Sepal.Length lag_Sepal.Width lag_Petal.Length lag_Petal.Width lag_Species
             <dbl>           <dbl>            <dbl>           <dbl>      <fctr>
1               NA              NA               NA              NA      setosa
2              5.1             3.5              1.4             0.2      setosa
3              4.9             3.0              1.4             0.2      setosa
4              4.7             3.2              1.3             0.2      setosa
5              4.6             3.1              1.5             0.2      setosa
6              5.0             3.6              1.4             0.2      setosa

Use dynamic name for new column/variable in `dplyr`

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.

dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
  mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name

meanofcol <- function(df, col) {
  mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)

dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").

dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}

Mutate with dynamic variable names

Another simple thing to do is to use .data to reference the data from the pipe. You can then select out the variable as usual with [[.

df <- data.frame(x = 1:150, y = 1:150)

variable <- "x"

df %>%
  mutate(x = lag(.data[[variable]], 1))

Dynamic variable names to mutate variables in for-loop

As we are passing string, convert to symbol and evaluate (!!)

func <- function(i) {
   
   mutate(df1, !!i := case_when(!is.na(!! rlang::ensym(i)) ~ as.character(!! rlang::ensym(i)),
                              is.na(!!rlang::ensym(i)) & var0 != '1' ~ '4444',
                              TRUE ~ '0'))
 }

-testing

for(i in vars) {
   df1 <- func(i)
 }
df1
  var0 var1 var2 var3
1    1    0    1   NA
2    2    1 4444    1
3    2    0    0    0
4    1    1    0 4444
5    1    0    1    1
6    2 4444 4444   NA
7    2 4444 4444    1

We may do this with across as well

df1 %>%
    mutate(across(all_of(vars), 
    ~ case_when(!is.na(.) ~ as.character(.), 
       is.na(.) & var0 != '1' ~ '4444', TRUE ~ '0')))
  var0 var1 var2 var3
1    1    0    1   NA
2    2    1 4444    1
3    2    0    0    0
4    1    1    0 4444
5    1    0    1    1
6    2 4444 4444   NA
7    2 4444 4444    1

Manipulating dynamically created variable names in `dplyr`

You need sym:

tibble(
  !!name_v1 := c(1, 2),
  !!name_v2 := c(3, 4),
  !!name_v3 := !!sym(name_v1) / !!sym(name_v2))
)

# A tibble: 2 x 3
#   first_variable second_variable third_variable
#            <dbl>           <dbl>          <dbl>
# 1              1               3          0.333
# 2              2               4          0.5

How to get dplyr::mutate() to work with variable names when called inside a function?

Thanks to the helpful comments I was able to learn all about non-standard evaluation and figure out a solution:

label <- function(data, variable, lookup) {
  variable <- enquo(variable)
  data %>%
    mutate(!!variable := factor(!!variable, 
                                 levels = read_csv(path(lookup))$id,
                                 labels = read_csv(path(lookup))$identifier))
}

The key features are enquo(), which acts as a "quasiquote", !!, which "unquotes" the variable so it can be interpreted through the argument, and :=, which allows for unquoting on the both sides.

I tried and failed to implement a solution that avoided dplyr entirely, but at least this works.

Mutate across multiple columns to create new variable sets

This might be easier in long format, but here's an option you can pursue as wide data.

Using the latest version of dplyr you can mutate across and include .names argument to define how your want your new columns to look.

library(tidyverse)

my_col <- c("var1", "var2", "var3", "var4")

df %>%
  group_by(year) %>%
  mutate(across(my_col, mean, .names = "mean_{col}")) %>%
  mutate(across(my_col, .names = "relmean_{col}") / across(paste0("mean_", my_col)))

Output

   year country  var1  var2  var3  var4 mean_var1 mean_var2 mean_var3 mean_var4 relmean_var1 relmean_var2 relmean_var3 relmean_var4
  <int> <chr>   <int> <int> <int> <int>     <dbl>     <dbl>     <dbl>     <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1  1910 GER         1     4    10     6       3         5         9         7.5        0.333        0.8          1.11         0.8  
2  1911 GER         2     3    11     7       1.5       3.5      10.5       8          1.33         0.857        1.05         0.875
3  1910 FRA         5     6     8     9       3         5         9         7.5        1.67         1.2          0.889        1.2  
4  1911 FRA         1     4    10     9       1.5       3.5      10.5       8          0.667        1.14         0.952        1.12