dplyr bug for select function?
This is probably caused by masking issues. That is, some other package you are using has a function with the same name ("select"), so if you do not specify which one you want to use, R will select the latest defined name.
A very common example of this error that you can reproduce:
library(dplyr)
library(stats)
df <- tibble(A = c(1, 2, 3), B = c(4, 5 ,6))
query <- filter(df, df$A > 2)
This will throw an error, because we are trying to use the dplyr filter function and it is masked by stats filter function.
There are different ways to avoid this problem, but the fastest and easiest solution is to specify the package before calling the function.
Use this:
df%>%
group_by(ID) %>%
slice(start = 1, end = (which.max(Var1)-1)) %>%
top_n(n = 1, wt = Var2) %>%
dplyr::select(ID, Var2)
Note that you may encounter this very same issue with other functions, so you should consider always specifying the package associated with the function you're calling. Of course there are cases where this is not necessarily the best practice, but this is out of this question's scope.
Error with dplyr functions requiring dplyr:: prefix even when package is loaded
It is because select
function occurs in different packages and this could mask the dplyr::select
if those packages are loaded after dplyr
. When we specify the ::
, then it gets the correct function. So either,
df %>%
dplyr::select(starts_with("a"))
Or create a new name and call it
dpselect <- dplyr::select
df %>%
dpselect(starts_with("a"))
In base R
, we can find the functions that have some conflicts
conflicts()
dplyr::select function clashes with MASS::select
As Pascal said, the following works
require(MASS)
require(dplyr)
mtcars %>%
dplyr::select(mpg)
Error using select function in R
In your chained sequence of dplyr
operations, the summarise
call will produce two columns: the grouping variable and the result of the summary function.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount))
# Source: local data frame [5 x 2]
#
# userId one
# 1 A 85
# 2 B 84
# 3 C 18
# 4 D 72
# 5 E 65
When you then try to select
the songID variable from the data frame generated by summarise
, the songID variable is not found.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount)) %.%
select(userId, songId, playCount)
# Error in eval(expr, envir, enclos) : object 'songId' not found
A more suitable dplyr
function in this case is filter
. Here we select rows where the condition playCount == max(playCount)
is TRUE
within each group.
df %.%
group_by(userId) %.%
filter(
playCount == max(playCount))
# Source: local data frame [5 x 3]
# Groups: userId
#
# userId songId playCount
# 1 A 568r 85
# 2 C 34n 18
# 3 E 454j 65
# 4 D 663a 72
# 5 B 35d 84
You find several nice dplyr examples here.
dplyr::select Object not found in self-made function
There are two issues with your function. The first error arises because calendario
is no column of the df
passed to the function. Simply remove the df$
when specifying the aesthetics. Second. Even when removing the df$
you set the y-aesthetic equal the string in variable dato
, i.e. "indice_covid" in your example. That is for every date you have the same value "indice_covid". That's why you get a flat line. To tell ggplot2 that you want a the column dato
of the df you have to convert it to a symbol using sym
and the bang-bang-operator !!
, i.e. !!sym(dato)
. Try this:
library(ggplot2)
library(dplyr)
plot_by_reg <- function(df, reg, dato) {
df %>%
dplyr::filter(denominazione_regione == reg) %>%
dplyr::mutate(calendario = format(as.Date(paste(mese,giorno , sep = "-" ) , format = "%m-%d" ), "%m-%d")) %>%
dplyr::select(c(denominazione_regione, calendario, all_of(dato))) %>%
#ggplot(aes(x=df$calendario, y=df$dato)) +
ggplot(aes(x = calendario, y = !!sym(dato))) +
geom_line(aes(group = 1)) +
theme_dark()
}
plot_by_reg(df = data.moving, reg = "Toscana", dato = "indice_covid")
Created on 2020-05-25 by the reprex package (v0.3.0)
Related Topics
How to Install Roracle Package on Windows 7
Simple Examples of Filter Function, Recursive Option Specifically
Horizontal Dendrogram in R with Labels
Creating Multi Column Legend in Ggplot
No Visible Global Function Definition for 'Median'
Solving Simultaneous Equations with R
Row Sums Over Columns with a Certain Pattern in Their Name
Can You Specify Different Geoms for Different Facets in a Ggplot
Possible to Create Rd Help Files for Objects Not in a Package
Setting Work Directory in Knitr Using Opts_Chunk$Set(Root.Dir = ...) Doesn't Work
Create New Column Based on 4 Values in Another Column
Split Time Series Data into Time Intervals (Say an Hour) and Then Plot the Count