mutate_each / summarise_each in dplyr: how do I select certain columns and give new names to mutated columns?
Update for dplyr >= 0.4.3.9000
In the dplyr development version 0.4.3.9000 (at time of writing), naming inside mutate_each
and summarise_each
has been simplified as noted in the News:
The naming behaviour of
summarise_each()
andmutate_each()
has been
tweaked so that you can force inclusion of both the function and the
variable name:summarise_each(mtcars, funs(mean = mean), everything())
This is mainly important if you want to apply only 1 function inside mutate_each
/ summarise_each
and you want to give those column new names.
To show the difference, here's the output from dplyr 0.4.3.9000 using the new naming functionality, in contrast to option a.2 below:
library(dplyr) # >= 0.4.3.9000
iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mysum Sepal.Width_mysum
#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6
#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6
#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6
#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6
#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6
#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6
# Petal.Length_mysum Petal.Width_mysum
#1 563.7 179.9
#2 563.7 179.9
#3 563.7 179.9
#4 563.7 179.9
#5 563.7 179.9
#6 563.7 179.9
If you don't supply new names and you only supply 1 function, dplyr will change the existing columns (as it did in previous versions):
iris %>% mutate_each(funs(sum), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 876.5 458.6 563.7 179.9 setosa
#2 876.5 458.6 563.7 179.9 setosa
#3 876.5 458.6 563.7 179.9 setosa
#4 876.5 458.6 563.7 179.9 setosa
#5 876.5 458.6 563.7 179.9 setosa
#6 876.5 458.6 563.7 179.9 setosa
I assume that this new functionality will be available via CRAN in the next release version 0.4.4.
dplyr verions <= 0.4.3:
How can I give these new columns appropriate names, like I can in
mutate?
a) 1 function applied in mutate_each
/summarise_each
If you apply only 1 function inside the mutate_each
or summarise_each
, the existing columns will be transformed and the names will be kept as they used to be, unless you supply a named vector to mutate_each_
/summarise_each_
(see option a.4)
Here are some examples:
a.1 only 1 function -> will keep the existing names
iris %>% mutate_each(funs(sum), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 876 459 564 180 setosa
#2 876 459 564 180 setosa
#3 876 459 564 180 setosa
#4 876 459 564 180 setosa
#5 876 459 564 180 setosa
#6 876 459 564 180 setosa
a.2 also if you specify a new column name extension:
iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 876 459 564 180 setosa
#2 876 459 564 180 setosa
#3 876 459 564 180 setosa
#4 876 459 564 180 setosa
#5 876 459 564 180 setosa
#6 876 459 564 180 setosa
a.3 Manually specify a new name per column (but only practical for few columns):
iris %>% mutate_each(funs(sum), SLsum = Sepal.Length,SWsum = Sepal.Width, -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species SLsum SWsum
#1 5.1 3.5 1.4 0.2 setosa 876 459
#2 4.9 3.0 1.4 0.2 setosa 876 459
#3 4.7 3.2 1.3 0.2 setosa 876 459
#4 4.6 3.1 1.5 0.2 setosa 876 459
#5 5.0 3.6 1.4 0.2 setosa 876 459
#6 5.4 3.9 1.7 0.4 setosa 876 459
a.4 Use a named vector to create additional columns with new names:
case 1: keep original columns
In contrast to options a.1, a.2 and a.3, dplyr will keep the existing columns unchanged and create new columns in this approach. The names of the new columns equal the names of the named vector you create in advance (vars
in this case).
vars <- names(iris)[1:2] # choose which columns should be mutated
vars <- setNames(vars, paste0(vars, "_sum")) # create new column names
iris %>% mutate_each_(funs(sum), vars) %>% head
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6
#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6
#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6
#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6
#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6
#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6
case 2: remove original columns
As you can see, this approach keeps the existing columns unchanged and adds new columns with specified names. In case you don't want to keep the original columns, but just the newly created columns (and the other columns) you can just add a select
statement afterwards:
iris %>% mutate_each_(funs(sum), vars) %>% select(-one_of(vars)) %>% head
# Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1 1.4 0.2 setosa 876.5 458.6
#2 1.4 0.2 setosa 876.5 458.6
#3 1.3 0.2 setosa 876.5 458.6
#4 1.5 0.2 setosa 876.5 458.6
#5 1.4 0.2 setosa 876.5 458.6
#6 1.7 0.4 setosa 876.5 458.6
b) more than 1 function applied in mutate_each
/summarise_each
b.1 Let dplyr figure out new names
If you applied more than 1 function, you can let dplyr figure out names by itself (and it will keep the existing columns):
iris %>% mutate_each(funs(sum, mean), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum Petal.Length_sum
#1 5.1 3.5 1.4 0.2 setosa 876 459 564
#2 4.9 3.0 1.4 0.2 setosa 876 459 564
#3 4.7 3.2 1.3 0.2 setosa 876 459 564
#4 4.6 3.1 1.5 0.2 setosa 876 459 564
#5 5.0 3.6 1.4 0.2 setosa 876 459 564
#6 5.4 3.9 1.7 0.4 setosa 876 459 564
# Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
#1 180 5.84 3.06 3.76 1.2
#2 180 5.84 3.06 3.76 1.2
#3 180 5.84 3.06 3.76 1.2
#4 180 5.84 3.06 3.76 1.2
#5 180 5.84 3.06 3.76 1.2
#6 180 5.84 3.06 3.76 1.2
b.2 Manually specify new column names
Another option, when using more than 1 function, is to specify the column name extension on your own:
iris %>% mutate_each(funs(MySum = sum(.), MyMean = mean(.)), -Species) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum
#1 5.1 3.5 1.4 0.2 setosa 876 459 564
#2 4.9 3.0 1.4 0.2 setosa 876 459 564
#3 4.7 3.2 1.3 0.2 setosa 876 459 564
#4 4.6 3.1 1.5 0.2 setosa 876 459 564
#5 5.0 3.6 1.4 0.2 setosa 876 459 564
#6 5.4 3.9 1.7 0.4 setosa 876 459 564
# Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean
#1 180 5.84 3.06 3.76 1.2
#2 180 5.84 3.06 3.76 1.2
#3 180 5.84 3.06 3.76 1.2
#4 180 5.84 3.06 3.76 1.2
#5 180 5.84 3.06 3.76 1.2
#6 180 5.84 3.06 3.76 1.2
How can I select certain columns that I wish to mutate, like I did
with select in the first case?
You can do that by referencing the columns to be mutated (or left out) by giving their names like here (mutate Sepal.Length, but not Species):
iris %>% mutate_each(funs(sum), Sepal.Length, -Species) %>% head()
In addition, you can use special functions to select columns to be mutated, all columns that start with or contain a certain word etc by using for example:
iris %>% mutate_each(funs(sum), contains("Sepal"), -Species) %>% head()
For more information of those functions, see ?mutate_each
and ?select
.
Edit 1 after comment:
If you want to use standard evaluation, dplyr supplies SE-versions of most functions ending with an addtional "_". So in this case you would use:
x <- c("Sepal.Width", "Sepal.Length") # vector of column names
iris %>% mutate_each_(funs(sum), x) %>% head()
Notice the mutate_each_
I used here.
Edit 2: updated with option a.4
Selecting specific columns when using mutate_each function from dplyr
For these cases, matches
would be more appropriate
df %>%
mutate_each(funs(.*Freq), matches("^[A-Z]\\.", ignore.case=FALSE))
Here, I am assuming that you wanted to select
only column names that start with a capital letter (^[A-Z]
) followed by a .
. We have to escape the .
(\\.
), otherwise it will be considered as any single character.
I am not changing anything except in the starts_with
part. In the mutate_each
if we need to pass a function, it can be passed inside a funs
call. In the above code, we are multiplying each of the columns (.
) selected by the matches
with the 'Freq' column.
According to ?select
‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’
EDIT: Added @docendodiscimus comment's
pmin of columns with names matching a pattern
You could use do.call
with pmin
after selecting necessary columns by the given name pattern (using startsWith
)
> transform(df, p.min = do.call(pmin, df[startsWith(names(df), "gwas_p")]))
Suggested.Symbol gwas_p.onset gwas_p.dc14 gwas_p.tfc6 gwas_p.tms30 p.min
1 CCT4 9.378e-01 3.975e-01 2.078e-01 5.724e-01 2.078e-01
2 DHRS2 5.983e-01 3.707e-01 8.960e-01 3.409e-01 3.409e-01
3 PMS2 7.674e-10 6.117e-17 7.388e-19 4.594e-13 7.388e-19
4 FARSB 9.781e-02 2.975e-01 5.896e-01 2.403e-01 9.781e-02
5 RPL31 5.495e-01 4.443e-01 3.043e-01 1.357e-01 1.357e-01
6 ASNS 7.841e-01 7.661e-01 6.696e-01 3.422e-01 3.422e-01
R Dplyr mutate new column by calculating from other columns with conditionally replaced values
In base R
, we can subtract 1 from the data, use pmin
to restrict the value greater than 3 to 3 and get the rowSums
df$x3 <- rowSums(pmin(as.matrix(df-1), 3))
-output
> df
x1 x2 x3
1 1 5 3
2 2 4 4
3 3 3 4
4 4 2 4
5 5 1 3
Forcing mutate_each to create new column names for a subset of columns evaluate via ifelse
You can create additional columns when using only a single function in the funs
argument if you supply a named vector to the vars
or ...
argument within mutate_each
. Here's an example using setNames
:
mutate_each(df, funs(ifelse(. == 1, "val1",
ifelse(. == 10, "val10", NA))),
setNames(contains("1"), c("x", "y")))
# columnA1 columnB1 columnB99 x y
#1 1 1 0.26550866 val1 val1
#2 2 2 0.37212390 <NA> <NA>
#3 3 3 0.57285336 <NA> <NA>
#4 4 4 0.90820779 <NA> <NA>
#5 5 5 0.20168193 <NA> <NA>
#6 6 6 0.89838968 <NA> <NA>
#7 7 7 0.94467527 <NA> <NA>
#8 8 8 0.66079779 <NA> <NA>
#9 9 9 0.62911404 <NA> <NA>
#10 10 10 0.06178627 val10 val10
This is also described in another Q&A.
Can I mutate many columns according to many other columns with mutate() and across()?
We can get
the value of the corresponding column after replacing the substring of column name (cur_column()
) using str_replace
library(dplyr)
library(stringr)
df %>%
mutate(across(starts_with("Y_"),
~ case_when(get(str_replace(cur_column(), "Y_", "X_")) == "No" ~
"C", TRUE ~ .x)))
-output
X_home X_school Y_home Y_school
1 Yes No A C
2 Yes Yes B A
3 No No C C
mutate_each for selected columns
As we need to create a new column, and the function phase_diff
utilizes all the columns in the input dataset as argument, the OP may need mutate
instead of mutate_each. The
mutate_each` is for applying to each column in the dataset.
res <- df %>%
mutate(phase = phase_diff(y1, y2, time))
head(res,2)
# y1 y2 time phase
# 1 0.01398857 0.010296090 1 -0.1349023
# 2 0.01334217 0.009990988 2 -0.1349023
The above operation can be done using base R
phase_diff(df$y1, df$y2, df$time)
Related Topics
How to Update R Packages in Default Library on Windows 7
Can't Print to PDF Ggplot Charts
Animated Sorted Bar Chart with Bars Overtaking Each Other
Differencebetween Assign() and <<- in R
Geometric Mean: Is There a Built-In
Interpretation of Ordered and Non-Ordered Factors, VS. Numerical Predictors in Model Summary
Use Merge() to Update a Data Frame with Values from a Second Data Frame
What's the Difference Between '1L' and '1'
Creating "Radar Chart" (A.K.A. Star Plot; Spider Plot) Using Ggplot2 in R
How to Get a Reversed, Log10 Scale in Ggplot2
Handling Java.Lang.Outofmemoryerror When Writing to Excel from R