Loading Dplyr After Plyr Is Causing Issues

Have to call dplyr:: before dplyr functions to get correct output

This is a common problem, but finding the culprit can be problematic. Some techniques to remedy the problem:

  • Knowing which version of the function you are trying to use is likely the first step, as it identifies which package is in conflict. This is a common problem with me with summarize (dplyr, plyr, and Hmisc), lag (dplyr and stats), and less-often filter (same two). You can see which version you are using by either typing the function and watching the function-body scroll by, or you can just type:

    environment(summarise)
    # <environment: namespace:dplyr>
  • Sometimes, the package load was just bad-luck timing based on trying new code, attempting branches of analysis, etc ... and something that is in the search path just isn't needed anymore. Finding this culprit can be annoying, but if you know you don't need a package anymore, you can always detach("package:___") and try again1.

  • Occasionally, just restarting the R session is sufficient. This is hampered sometimes by the auto-saving of the workspace, in that attached packages and such are restored (with conflicts still in place). So restarting to a fresh R session can be helpful. If you rely on the data-saving aspects of saving the workspace, then try an explicit saving of the variables you need (this can be painstaking if a lot) and reloading just those variables ...

  • If none of the above works (or you don't want to restart R), you can find which of your currently-loaded packages are causing the culprit-conflict by doing something like this:

    Filter(function(p) any(grepl("\\bplyr\\b", packageDescription(p)[c("Imports","Depends","Suggests","LinkingTo")])),
    gsub("package:", "", grep("^package:", search(), value = TRUE)))

    If nothing currently in the search() path (of packages ... not directories of executables) has "legitimately"2 caused the import of your conflict-causing package, then you'll see character(0) ... otherwise, you'll see the packages to blame.

  • From here, the order that packages are loaded is key: whichever is loaded last wins: think about loading packages as adding layers on top of each other: the one that is on top is the version that you will see. You can always access the version of a conflicted package directly with the double-colon notation (e.g., dplyr::lag instead of just lag), and in fact that is often the preferred method to reference non-base packages when writing your own package. (It is not always required, but its verbosity can be both declarative and unambiguous.)

Notes:

  1. Some packages do not detach well, often due to DLL loading. You can try detach(..., force=TRUE), but sometimes the safest way is to just restart R.

  2. it is always possible to cause the loading of another package from functions, even if the DESCRIPTION file does not mention this act. I believe CRAN is pretty good about catching and preventing this behavior, but side-loading packages (e.g., from GitHub) can easily bypass this safety feature.

  3. The conflicts between dplyr and plyr have been known for years (e.g., https://github.com/tidyverse/dplyr/issues/347 from 2014, and loading dplyr after plyr is causing issues from 2015). That is why the plyr warning in dplyr was added to dplyr startup messages in 2014.

R: Are there any known issues when plyr/dplyr/data.table and plm packages used together

It seems like in your data (maybe due to the merging process) you have individuals which have the same value in the time index more than once (or more than one NA).
You could either look at your data or try table(index(your_pdataframe), useNA = "ifany") to find out which.

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

mean low high min max sd
1 150 105 195 100 200 50

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

Source: local data frame [4 x 8]
Groups: DRUG

DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

Having trouble using dplyr in R to group by then mutate and generate statistic by group

What you are likely looking for is the summarise function which is also part of dplyr. A quick distinction between mutate and summarise is below.

mutate() either changes an existing column or adds a new one.

summarise() calculates a single value (per group).

iris %>% 
group_by(Species) %>%
summarise(max_sepal = max(Sepal.Length, na.rm = TRUE))

# A tibble: 3 x 2
Species max_sepal
<fct> <dbl>
1 setosa 5.8
2 versicolor 7
3 virginica 7.9

You can some more examples of this below

https://community.rstudio.com/t/what-is-difference-between-mutate-and-summarise/23103/3



Related Topics



Leave a reply



Submit