Why is R dplyr::mutate inconsistent with custom functions
sin
and ^
are vectorized, so they natively operate on each individual value, rather than on the whole vector of values. f
is not vectorized. But you can do f = Vectorize(f)
and it will operate on each individual value as well.
y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1
a asq fout gout
1 0 0 3640.889 0.0000000
2 10 100 3640.889 -0.5440211
3 100 10000 3640.889 -0.5063656
f = Vectorize(f)
y1a <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1a
a asq fout gout
1 0 0 10.88874 0.0000000
2 10 100 1010.88874 -0.5440211
3 100 10000 10010.88874 -0.5063656
Some additional info on vectorization here, here, and here.
mutate/transform in R dplyr (Pass custom function)
With transform your function has to operate on the vector. You can use ifelse
instead, which works on vectors:
isOdd <- function(x){ ifelse(x %% 2 == 0, "even", "odd") }
Alternatively you can apply the function to every value in the column with one of the apply
functions:
isOdd <- function(x){
sapply(x, function(x){
if(x %% 2 == 0){
return("even")
}else{
return("odd")
}
})}
Unexpected values while applying custom function in dplyr::mutate
You can group rowwise so the function gets evaluated separately for each row:
df %>%
rowwise() %>%
mutate(test = fun_b(x = 1, y = y_val, z = z_val, times = 1))
## Source: local data frame [5 x 3]
## Groups: <by row>
##
## # A tibble: 5 × 3
## y_val z_val test
## <dbl> <dbl> <dbl>
## 1 2 4 10.12500
## 2 5 3 3.15000
## 3 8 2 1.40625
## 4 1 1 6.75000
## 5 9 3 1.75000
or edit fun_b
to get so it's vectorized, or just let R:
df %>% mutate(test = Vectorize(fun_b)(x = 1, y = y_val, z = z_val, times = 1))
## # A tibble: 5 × 3
## y_val z_val test
## <dbl> <dbl> <dbl>
## 1 2 4 10.12500
## 2 5 3 3.15000
## 3 8 2 1.40625
## 4 1 1 6.75000
## 5 9 3 1.75000
dplyr inconsistent behaviour inside function
You can't do that with dplyr
, which is heavily into "non-standard evaluation" (NSE). Inside your function, dplyr
sees coldf1 = 1
and assigns a new column, just like you can do df1 %>% mutate(somethingnew = 3.1415)
.
You need to use either rlang
's escaping mechanisms (with :=
) ...
fun <- function(df1, coldf1) {
df1 %>% mutate(!!coldf1 := 1)
}
data1
# a x1
# 1 1 2
# 2 2 3
fun(data1, "a")
# a x1
# 1 1 2
# 2 1 3
or basic R :
fun <- function(df1, coldf1) { df1[[coldf1]] <- 1; df1; }
fun(data1, "a")
# a x1
# 1 1 2
# 2 1 3
(though I'm assuming your example is simplified, where this might not be as simple)
Regardless, look into "programming with dplyr", https://dplyr.tidyverse.org/articles/programming.html.
R: Use dplyr::mutate/dplyr::transmute with a function which acts on an entire row
I think you're incurring in a dimension error.
If I do
library(dplyr)
transmute(head(women, n=10),
some_index=calc_some_index(head(women,10)))
Then it works (the error in your code complained about sizes differing)
Alternatively, you could use the pipe and it works:
head(women, 10) %>%
transmute(calc_some_index(.))
Why can't I apply a function to create a new column with mutate() using dplyr?
As pointed out +
and sum()
differ in behaviour. Consider:
> sum(1:10,1:10)
[1] 110
> `+`(1:10,1:10)
[1] 2 4 6 8 10 12 14 16 18 20
If you really want to sum()
the variables along each row you want rowwise()
:
library(dplyr)
df <- data_frame(w = letters[1:3], x=1:3, y = x^2, z = y - x)
# Source: local data frame [3 x 4]
#
# w x y z
# 1 a 1 1 0
# 2 b 2 4 2
# 3 c 3 9 6
df %>% rowwise() %>% mutate(result = sum(x, y, z))
# Source: local data frame [3 x 5]
# Groups: <by row>
#
# w x y z result
# 1 a 1 1 0 2
# 2 b 2 4 2 8
# 3 c 3 9 6 18
Compare this to:
df %>% mutate(result = x + y + z)
# Source: local data frame [3 x 5]
#
# w x y z result
# 1 a 1 1 0 2
# 2 b 2 4 2 8
# 3 c 3 9 6 18
df %>% mutate(result = sum(x, y, z)) # sums over all of x, y and z and recycles the result!
# Source: local data frame [3 x 5]
#
# w x y z result
# 1 a 1 1 0 28
# 2 b 2 4 2 28
# 3 c 3 9 6 28
dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?
Wrap it all in as.numeric
to force the output format so the NA
s, which are logical
by default, don't override the class of the output variable:
df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )
# crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA
Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:
out <- df1[1:2,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"
default arguments not being recognized in custom function using dplyr
We could do this with missing
and cur_data_all
foo <- function(x = cyl){
if(missing(x)) x <- cur_data_all()[["cyl"]]
case_when(
x == 6 ~ TRUE,
x == 8 ~ FALSE,
x == 4 ~ NA
)
}
-testing
> out1 <- mtcars %>%
+ mutate(cyl_refactor = foo(cyl)) %>%
+ select(cyl, cyl_refactor)
> out2 <- mtcars %>%
+ mutate(cyl_refactor = foo()) %>%
+ select(cyl, cyl_refactor)
>
> identical(out1, out2)
[1] TRUE
User Defined Function not working in dplyr pipe
It is the cause that your UDF can't treat vector.
vectorized_extractInfo <- Vectorize(extractInfo, "GInumber")
DataGranulomeTidy %>%
mutate(NewVar = vectorized_extractInfo(GIaccessionNumber))
Related Topics
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
Split Data.Frame into Groups by Column Name
Geom_Rect on Some Panels of a Facet_Wrap
How to Write an Xts Object Using Write.CSV in R
Combining Vector and Bitmap Graphics in a PDF
Calculating the Distance Between Polygon and Point in R
Different Y-Limits on Ggplot Facet Grid Bar Graph
Using Override.Aes() in Ggplot2 with Layered Symbols (R)
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Ggplot2: Group X Axis Discrete Values into Subgroups
Plot Curved Lines Between Two Locations in Ggplot2
Obtaining Percent Scales Reflective of Individual Facets with Ggplot2
Change Background Colour of Knitr::Kable Headers