dplyr mutate rowSums calculations or custom functions
You can use rowwise()
function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across()
uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup
at the end to exit from rowwise.
How do I use custom functions after the dplyr %% operator?
I've adjusted the function to include an argument for the data frame, added the necessary packages within the function definitions and converted the inputs to characters rather than numbers. This could also be added to the function definition if required.
library(dplyr)
library(stringr)
State <- function(df, x, y){
dplyr::mutate(
df,
Account = stringr::str_remove_all(Account, "-"),
Account = case_when(
startsWith(Account, y) ~ stringr::str_c(stringr::str_c("WC", x), "Policy #"),
startsWith(Account, x) ~ stringr::str_c("WC", "Policy #")
)
)
}
df %>% State("32", "90")
Custom function with dplyr mutate or summarise for different levels within a factor?
Your example code is most of the way there. You can do:
df1 %>%
mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])
Or:
df1 %>%
summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])
Logical subsetting still works in mutate()
and summarise()
calls like with any other vector.
Note that this works because after your summarise()
call in your example code, df1
is still grouped by cyl
, otherwise you would need to do a group_by()
call to create the correct grouping.
Using mutate in custom function with mutation condition as argument
If your formula is always like origianl = do_something_original(), this may helps.(for dplyr
version >= 1.0)
library(dplyr)
library(stringr)
update_mut <- function(df, mutation){
xx <- word(mutation, 1)
df %>%
mutate("{xx}" := eval(parse(text = mutation)))
}
update_mut(gapminder, "year = 2*year")
country continent year lifeExp pop gdpPercap
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Afghanistan Asia 3904 28.8 8425333 779.
2 Afghanistan Asia 3914 30.3 9240934 821.
3 Afghanistan Asia 3924 32.0 10267083 853.
4 Afghanistan Asia 3934 34.0 11537966 836.
5 Afghanistan Asia 3944 36.1 13079460 740.
6 Afghanistan Asia 3954 38.4 14880372 786.
7 Afghanistan Asia 3964 39.9 12881816 978.
8 Afghanistan Asia 3974 40.8 13867957 852.
9 Afghanistan Asia 3984 41.7 16317921 649.
10 Afghanistan Asia 3994 41.8 22227415 635.
Conditional mutate in a custom function to change a character column in R
Try using the below function :
library(dplyr)
my_function <- function(date1, date2, variable, quota, monthly_business_days) {
value <- deparse(substitute(variable))
my_data %>%
filter(between(DATE, ymd(date1), ymd(date2))) %>%
summarize(total = sum({{variable}})) %>%
add_row(total = quota, .before = 1) %>%
rbind(.$total[[2]]/bizdays(date1, date2)*monthly_business_days) %>%
mutate(indicator = if(value == 'UNITS') c("Quota (Units)", "Sales (Units)", "Forecast (Units)")
else c("Quota (USD)", "Sales (USD)", "Forecast (USD)"))
}
R mutate() with rowSums()
The difference in result might be due to the fact that part_langs
is a grouped dataframe, as can be seen from the output of str
shown in your post:
grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame).
If this is the reason, then ungroup
first and rerun your code:
library(dplyr)
part_langs <- part_langs %>% ungroup
Writing a custom function that works inside dplyr::mutate()
We can place the ...
at the end
rowwise_sum <- function(data, na.rm = FALSE,...) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
cars %>%
mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
# ... with 40 more rows
It would also work without changing the position of ...
(though in general it is recommended). Here the main issue is the data
(which is .
) is not specified in the argument list within in mutate
.
It would be easier to create the whole flow in the function instead of doing a part
rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(sum = rowSums(., na.rm = TRUE)) %>%
bind_cols(data, .)
}
rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
How to use custom functions in mutate (dplyr)?
Your problem seems to be binom.test
instead of dplyr
, binom.test
is not vectorized, so you can not expect it work on vectors; You can use mapply
on the two columns with mutate
:
table %>%
mutate(Ratio = mapply(function(x, y) binom.test.p(c(x,y)),
ref_SG1_E2_1_R1_Sum,
alt_SG1_E2_1_R1_Sum))
# geneId ref_SG1_E2_1_R1_Sum alt_SG1_E2_1_R1_Sum Ratio
#1 a 10 10 1
#2 b 20 20 1
#3 c 10 10 1
#4 d 15 15 1
As for the last one, you need mutate_at
instead of mutate
:
table %>%
mutate_at(.vars=c(2:3), .funs=funs(sum=sum(.)))
mutate with across, apply two functions in a row
You can supply custom functions as well as built-ins to across
:
diamonds %>%
group_by(cut) %>%
summarise(across(x:z, function(x) round(mean(x))), .groups = 'drop')
# A tibble: 5 x 4
cut x y z
* <ord> <dbl> <dbl> <dbl>
1 Fair 6 6 4
2 Good 6 6 4
3 Very Good 6 6 4
4 Premium 6 6 4
5 Ideal 6 6 3
R: Custom Function - Mutate Existing Column
I think you can achieve this more simply using with the following:
library(dplyr)
clean_func <- function(df){
df %>% mutate(across(everything(), ~gsub(" & ", " and ", .) %>%
gsub("[[:punct:]]$", "", .)))
}
df1 <- clean_func(df1)
df2 <- clean_func(df2)
You can make updates to the function by adding additional gsub
, str_replace
, or other calls as needed.
Edit:
Based on update, you can do something like this to target your variables specifically:
add_symbol <- function(col.name){
gsub(" & ", " and ", col.name)
}
rm_trail_punc <- function(col.name){
gsub("[[:punct:]]$", "", col.name)
}
standardise_col <- function(df, col.name){
col.name <- enquo(col.name)
df %>%
mutate(!!col.name := add_symbol(!!col.name),
!!col.name := rm_trail_punc(!!col.name))
}
Your code won't ever work as written, but you could do something like this:
new_df <- standardise_col(df1, a) %>%
left_join(., standardise_col(df2, c), by = c("a"="c"))
Which gives us:
# A tibble: 3 x 3
a b d
<chr> <chr> <chr>
1 apple and pear cat car
2 kiwi dog bike
3 plum cow truck
You can read up on tidy evaluation here: https://tidyeval.tidyverse.org/dplyr.html
Related Topics
Combine Multiple Columns into Tidy Data
Select Groups With More Than One Distinct Value
Painless Way to Install a New Version of R
Better Explanation of When to Use Imports/Depends
How to Use an Image as a Point in Ggplot
How R Formats Posixct With Fractional Seconds
Pass Arguments to Dplyr Functions
How to Replace Na With Mean by Group/Subset
Order Stacked Bar Graph in Ggplot
Changing Column Names in a List of Data Frames in R
How to Swap Values Between Two Columns
How to Sum a Numeric List Elements
How to Add Layers in Ggplot Using a For-Loop
Multiple Use of the Positional '$' Operator to Update Nested Arrays
Sample from Vector of Varying Length (Including 1)