dplyr: Error in n(): function should not be called directly
I presume you have dplyr
and plyr
loaded in the same session. dplyr
is not plyr
. ddply
is not a function in the dplyr
package.
Both dplyr
and plyr
have the functions summarise
/summarize
.
Look at the results of conflicts()
to see masked objects.
dplyr::n() returns Error: This function should not be called directly
So, I do not really have a problem, I can just avoid [writing
dplyr::n()
], but I'm curious about why it even happens.
Here's the source code for dplyr::n
in dplyr 0.5.0:
function () {
stop("This function should not be called directly")
}
That's why the fully qualified form raises this error: the function always returns an error. (My guess is that the error-throwing function dplyr::n
exists so that n()
could have a typical documentation page with examples.)
Inside of filter
/mutate
/summarise
statements, n()
is not calling this function. Instead, some internal function calculates the group sizes for the expression n()
. That's why the following works when dplyr is not loaded:
n()
#> Error: could not find function "n"
library(magrittr)
iris %>%
dplyr::group_by(Species) %>%
dplyr::summarise(n = n())
#> # A tibble: 3 × 2
#> Species n
#> <fctr> <int>
#> 1 setosa 50
#> 2 versicolor 50
#> 3 virginica 50
Here n()
cannot be mapped to a function, so we get an error. But when used it inside of a dplyr verb, n()
does map to something and returns group sizes.
dplyr error with summarise_ and n()
First, as explained in the comments, you mixed standard evaluation and non-standard evaluation. n()
is not found because you can't use it like that in *_
functions. In dplyr
before 0.7.0
, you would use ~n()
in summarise_
.
However things have changed in the tidyverse
world.
Since version 0.7.0
, dplyr
uses now a new system for programming with dplyr, called tidy evaluation, or tidy eval for short. All function with *_
are now deprecated and should not be used in new code, unless you want to keep a dependency on an old dplyr
version. I'll advice to use tidy eval now. I will not explained it here, you could see the Programming vignette
For example, now you would do something like this with dplyr (>= 0.7.0)
:
library(dplyr)
# quo is a tidy eval concept for quoting
grp_var <-quo(Species)
voi <- quo(Sepal.Length)
# use !! another tidy eval concept to unquote
dmp <- iris %>%
select(!! grp_var, !! voi) %>%
group_by(!! grp_var) %>%
summarise(Median_Value = median( !! voi ), Count = n())
dmp
#> # A tibble: 3 x 3
#> Species Median_Value Count
#> <fctr> <dbl> <int>
#> 1 setosa 5.0 50
#> 2 versicolor 5.9 50
#> 3 virginica 6.5 50
How does dplyr::n function work?
As far as I understand, dplyr
uses hybrid evaluation. That means it will evaluate some parts of the expression in C++ and others in R. n()
is one of the functions that always gets handled by C++. This is why the function doesn't do anything in R
directly, except for returning an error, since the function is never evaluated by R.
The relevant C++
code can be found on github.
dplyr: Error in n(): function should not be called directly
I presume you have dplyr
and plyr
loaded in the same session. dplyr
is not plyr
. ddply
is not a function in the dplyr
package.
Both dplyr
and plyr
have the functions summarise
/summarize
.
Look at the results of conflicts()
to see masked objects.
Why assigning dplyr's n() function makes it unexecutable within summarise and mutate?
With mutate()
you use n()
but with mutate_()
you use ~n()
So either use
data %>% group_by(z) %>% mutate(n = n())
or
data %>% group_by_(~z) %>% mutate_(n = ~n())
Related Topics
Multiple Boxplots Using Ggplot
Installing R 3.5.0 with --Enable-R-Shlib
How to Do Conditional Grouping of Data in R
R: How to Draw a Line with Multiple Arrows in It
How to Run Lm Regression for Every Column in R
Grouping & Visualizing Cumulative Features in R
How to Remove Rows That Have Only 1 Combination for a Given Id
Replace Na with Groups Mean in a Non Specified Number of Columns
Generate Numbers with Specific Correlation
How to Get Pixel Data from an Image Using R
How to Conditionally Replace Values in R Data Frame Using If/Then Statement
Extracting Value Based on Another Column