Weird things with Automatically generate new variable names using dplyr mutate
scale()
returns a matrix, and dplyr/tibble isn't automatically coercing it to a vector. By changing your mutate_all()
call to the below, we can have it return a vector. I identified this is what was happening by calling class(df1$speed_scaled)
and seeing the result of "matrix".
library(tidyverse)
link <- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/computers.csv"
df <- read_csv(link)
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#> X1 = col_double(),
#> price = col_double(),
#> speed = col_double(),
#> hd = col_double(),
#> ram = col_double(),
#> screen = col_double(),
#> cd = col_character(),
#> multi = col_character(),
#> premium = col_character(),
#> ads = col_double(),
#> trend = col_double()
#> )
df %>% discard(is.character) %>%
select(-X1) %>%
mutate_all(
list("scaled" = function(x) scale(x)[[1]])
)
#> # A tibble: 6,259 x 14
#> price speed hd ram screen ads trend price_scaled speed_scaled
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1499 25 80 4 14 94 1 -1.24 -1.28
#> 2 1795 33 85 2 14 94 1 -1.24 -1.28
#> 3 1595 25 170 4 15 94 1 -1.24 -1.28
#> 4 1849 25 170 8 14 94 1 -1.24 -1.28
#> 5 3295 33 340 16 14 94 1 -1.24 -1.28
#> 6 3695 66 340 16 14 94 1 -1.24 -1.28
#> 7 1720 25 170 4 14 94 1 -1.24 -1.28
#> 8 1995 50 85 2 14 94 1 -1.24 -1.28
#> 9 2225 50 210 8 14 94 1 -1.24 -1.28
#> 10 2575 50 210 4 15 94 1 -1.24 -1.28
#> # ... with 6,249 more rows, and 5 more variables: hd_scaled <dbl>,
#> # ram_scaled <dbl>, screen_scaled <dbl>, ads_scaled <dbl>,
#> # trend_scaled <dbl>
Automatically generate new variable names using dplyr mutate
You can use mutate_all
(or mutate_at
for specific columns) then prepend lag_
to the column names.
data(iris)
library(dplyr)
lag_iris <- iris %>%
group_by(Species) %>%
mutate_all(funs(lag(.))) %>%
ungroup
colnames(lag_iris) <- paste0('lag_', colnames(lag_iris))
head(lag_iris)
lag_Sepal.Length lag_Sepal.Width lag_Petal.Length lag_Petal.Width lag_Species
<dbl> <dbl> <dbl> <dbl> <fctr>
1 NA NA NA NA setosa
2 5.1 3.5 1.4 0.2 setosa
3 4.9 3.0 1.4 0.2 setosa
4 4.7 3.2 1.3 0.2 setosa
5 4.6 3.1 1.5 0.2 setosa
6 5.0 3.6 1.4 0.2 setosa
Use dynamic name for new column/variable in `dplyr`
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue
package when naming parameters when using :=
. So here the {}
in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}}
in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr
starting with version 0.7 allows you to use :=
to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr
(>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
So here, the answer is to use mutate_()
rather than mutate()
and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr
that existed when the question was originally posed. It requires careful use of quote
and setName
:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
Mutate with dynamic variable names
Another simple thing to do is to use .data
to reference the data from the pipe. You can then select out the variable as usual with [[
.
df <- data.frame(x = 1:150, y = 1:150)
variable <- "x"
df %>%
mutate(x = lag(.data[[variable]], 1))
Dynamic variable names to mutate variables in for-loop
As we are passing string, convert to sym
bol and evaluate (!!
)
func <- function(i) {
mutate(df1, !!i := case_when(!is.na(!! rlang::ensym(i)) ~ as.character(!! rlang::ensym(i)),
is.na(!!rlang::ensym(i)) & var0 != '1' ~ '4444',
TRUE ~ '0'))
}
-testing
for(i in vars) {
df1 <- func(i)
}
df1
var0 var1 var2 var3
1 1 0 1 NA
2 2 1 4444 1
3 2 0 0 0
4 1 1 0 4444
5 1 0 1 1
6 2 4444 4444 NA
7 2 4444 4444 1
We may do this with across
as well
df1 %>%
mutate(across(all_of(vars),
~ case_when(!is.na(.) ~ as.character(.),
is.na(.) & var0 != '1' ~ '4444', TRUE ~ '0')))
var0 var1 var2 var3
1 1 0 1 NA
2 2 1 4444 1
3 2 0 0 0
4 1 1 0 4444
5 1 0 1 1
6 2 4444 4444 NA
7 2 4444 4444 1
Manipulating dynamically created variable names in `dplyr`
You need sym
:
tibble(
!!name_v1 := c(1, 2),
!!name_v2 := c(3, 4),
!!name_v3 := !!sym(name_v1) / !!sym(name_v2))
)
# A tibble: 2 x 3
# first_variable second_variable third_variable
# <dbl> <dbl> <dbl>
# 1 1 3 0.333
# 2 2 4 0.5
How to get dplyr::mutate() to work with variable names when called inside a function?
Thanks to the helpful comments I was able to learn all about non-standard evaluation and figure out a solution:
label <- function(data, variable, lookup) {
variable <- enquo(variable)
data %>%
mutate(!!variable := factor(!!variable,
levels = read_csv(path(lookup))$id,
labels = read_csv(path(lookup))$identifier))
}
The key features are enquo()
, which acts as a "quasiquote", !!
, which "unquotes" the variable so it can be interpreted through the argument, and :=
, which allows for unquoting on the both sides.
I tried and failed to implement a solution that avoided dplyr
entirely, but at least this works.
Mutate across multiple columns to create new variable sets
This might be easier in long format, but here's an option you can pursue as wide data.
Using the latest version of dplyr
you can mutate
across
and include .names
argument to define how your want your new columns to look.
library(tidyverse)
my_col <- c("var1", "var2", "var3", "var4")
df %>%
group_by(year) %>%
mutate(across(my_col, mean, .names = "mean_{col}")) %>%
mutate(across(my_col, .names = "relmean_{col}") / across(paste0("mean_", my_col)))
Output
year country var1 var2 var3 var4 mean_var1 mean_var2 mean_var3 mean_var4 relmean_var1 relmean_var2 relmean_var3 relmean_var4
<int> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1910 GER 1 4 10 6 3 5 9 7.5 0.333 0.8 1.11 0.8
2 1911 GER 2 3 11 7 1.5 3.5 10.5 8 1.33 0.857 1.05 0.875
3 1910 FRA 5 6 8 9 3 5 9 7.5 1.67 1.2 0.889 1.2
4 1911 FRA 1 4 10 9 1.5 3.5 10.5 8 0.667 1.14 0.952 1.12
Related Topics
Cumulative Number of Unique Values in a Column Up to Current Row
Weird Case with Data Tables in R, Column Names Are Mixed
Add New Value to New Column Based on If Value Exists in Other Dataframe in R
Stacked Bar Chart with Group by and Facet
Grouped Bar Graph Custom Colours
How to Merge Two Data Frame Based on Partial String Match with R
R: How to Create Grid-Graphics
Reshape Data from Long to Wide Format - More Than One Variable
Why Does Nls Function Not Work in Ggplot2
Drop Columns That Take Less Than N Values
Install R Packages in Azure Ml
Align Points and Error Bars in Ggplot When Using 'Jitterdodge'
R: Ggplot2 Setting the Last Plot in the Midle with Facet_Wrap
R Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'