Why Is R Dplyr::Mutate Inconsistent with Custom Functions

Why is R dplyr::mutate inconsistent with custom functions

sin and ^ are vectorized, so they natively operate on each individual value, rather than on the whole vector of values. f is not vectorized. But you can do f = Vectorize(f) and it will operate on each individual value as well.

y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1
    a   asq     fout       gout
1 0 0 3640.889 0.0000000
2 10 100 3640.889 -0.5440211
3 100 10000 3640.889 -0.5063656
f = Vectorize(f)

y1a <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1a
    a   asq        fout       gout
1 0 0 10.88874 0.0000000
2 10 100 1010.88874 -0.5440211
3 100 10000 10010.88874 -0.5063656

Some additional info on vectorization here, here, and here.

mutate/transform in R dplyr (Pass custom function)

With transform your function has to operate on the vector. You can use ifelse instead, which works on vectors:

 isOdd <- function(x){ ifelse(x %% 2 == 0, "even", "odd") }

Alternatively you can apply the function to every value in the column with one of the apply functions:

 isOdd <- function(x){
sapply(x, function(x){
if(x %% 2 == 0){
return("even")
}else{
return("odd")
}
})}

Unexpected values while applying custom function in dplyr::mutate

You can group rowwise so the function gets evaluated separately for each row:

df %>% 
rowwise() %>%
mutate(test = fun_b(x = 1, y = y_val, z = z_val, times = 1))

## Source: local data frame [5 x 3]
## Groups: <by row>
##
## # A tibble: 5 × 3
## y_val z_val test
## <dbl> <dbl> <dbl>
## 1 2 4 10.12500
## 2 5 3 3.15000
## 3 8 2 1.40625
## 4 1 1 6.75000
## 5 9 3 1.75000

or edit fun_b to get so it's vectorized, or just let R:

df %>% mutate(test = Vectorize(fun_b)(x = 1, y = y_val, z = z_val, times = 1))

## # A tibble: 5 × 3
## y_val z_val test
## <dbl> <dbl> <dbl>
## 1 2 4 10.12500
## 2 5 3 3.15000
## 3 8 2 1.40625
## 4 1 1 6.75000
## 5 9 3 1.75000

dplyr inconsistent behaviour inside function

You can't do that with dplyr, which is heavily into "non-standard evaluation" (NSE). Inside your function, dplyr sees coldf1 = 1 and assigns a new column, just like you can do df1 %>% mutate(somethingnew = 3.1415).

You need to use either rlang's escaping mechanisms (with :=) ...

fun <- function(df1, coldf1) {
df1 %>% mutate(!!coldf1 := 1)
}

data1
# a x1
# 1 1 2
# 2 2 3
fun(data1, "a")
# a x1
# 1 1 2
# 2 1 3

or basic R :

fun <- function(df1, coldf1) { df1[[coldf1]] <- 1; df1; }
fun(data1, "a")
# a x1
# 1 1 2
# 2 1 3

(though I'm assuming your example is simplified, where this might not be as simple)

Regardless, look into "programming with dplyr", https://dplyr.tidyverse.org/articles/programming.html.

R: Use dplyr::mutate/dplyr::transmute with a function which acts on an entire row

I think you're incurring in a dimension error.

If I do

library(dplyr)
transmute(head(women, n=10),
some_index=calc_some_index(head(women,10)))

Then it works (the error in your code complained about sizes differing)

Alternatively, you could use the pipe and it works:

head(women, 10) %>%
transmute(calc_some_index(.))

Why can't I apply a function to create a new column with mutate() using dplyr?

As pointed out + and sum() differ in behaviour. Consider:

> sum(1:10,1:10)
[1] 110
> `+`(1:10,1:10)
[1] 2 4 6 8 10 12 14 16 18 20

If you really want to sum() the variables along each row you want rowwise():

library(dplyr)
df <- data_frame(w = letters[1:3], x=1:3, y = x^2, z = y - x)

# Source: local data frame [3 x 4]
#
# w x y z
# 1 a 1 1 0
# 2 b 2 4 2
# 3 c 3 9 6

df %>% rowwise() %>% mutate(result = sum(x, y, z))

# Source: local data frame [3 x 5]
# Groups: <by row>
#
# w x y z result
# 1 a 1 1 0 2
# 2 b 2 4 2 8
# 3 c 3 9 6 18

Compare this to:

df %>% mutate(result = x + y + z)
# Source: local data frame [3 x 5]
#
# w x y z result
# 1 a 1 1 0 2
# 2 b 2 4 2 8
# 3 c 3 9 6 18
df %>% mutate(result = sum(x, y, z)) # sums over all of x, y and z and recycles the result!
# Source: local data frame [3 x 5]
#
# w x y z result
# 1 a 1 1 0 28
# 2 b 2 4 2 28
# 3 c 3 9 6 28

dplyr error: strange issue when combining group_by, mutate and ifelse. Is it a bug?

Wrap it all in as.numeric to force the output format so the NAs, which are logical by default, don't override the class of the output variable:

df1 %>%
group_by(group.id) %>%
mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )

# crawl.id group.id hits.diff hits.consumed
#1 1 1 NA NA
#2 1 2 NA NA
#3 2 2 0 0
#4 1 3 NA NA
#5 1 3 NA NA
#6 1 3 NA NA

Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:

out <- df1[1:2,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>% mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"

default arguments not being recognized in custom function using dplyr

We could do this with missing and cur_data_all

foo <- function(x = cyl){
if(missing(x)) x <- cur_data_all()[["cyl"]]

case_when(
x == 6 ~ TRUE,
x == 8 ~ FALSE,
x == 4 ~ NA
)
}

-testing

> out1 <- mtcars %>% 
+ mutate(cyl_refactor = foo(cyl)) %>%
+ select(cyl, cyl_refactor)
> out2 <- mtcars %>%
+ mutate(cyl_refactor = foo()) %>%
+ select(cyl, cyl_refactor)
>
> identical(out1, out2)
[1] TRUE

User Defined Function not working in dplyr pipe

It is the cause that your UDF can't treat vector.

vectorized_extractInfo <- Vectorize(extractInfo, "GInumber")

DataGranulomeTidy %>%
mutate(NewVar = vectorized_extractInfo(GIaccessionNumber))


Related Topics



Leave a reply



Submit