Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

mean low high min max sd
1 150 105 195 100 200 50

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

Source: local data frame [4 x 8]
Groups: DRUG

DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN

dplyr: group_by + summarize not working as expected

We are extracting the whole column with $ instead we can just use the unquoted column name to get only the values of the 'frequency' with in each 'Category'

library(dplyr)
table %>%
group_by(Category) %>%
summarize(meanfrequency = mean(Frequency))
# A tibble: 3 x 2
# Category meanfrequency
# <chr> <dbl>
#1 First 2
#2 Second 4.33
#3 Third 1.5

If we do table$Frequency inside the chain, it is similar to that we do outside. Also, R is case-sensitive, so need table$Frequency instead of table$frequency

mean(table$Frequency) 

Also, table is a function/class name, so it is better not to name objects with those names

data

table <- structure(list(Category = c("First", "First", "Second", "First", 
"Third", "Third", "Second", "First", "Second"), Frequency = c(1L,
4L, 6L, 1L, 1L, 2L, 6L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-9L))

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

dplyr issues when using group_by(multiple variables)

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))

df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)

tidyverse-dplyr summarise not operating as expected

As said in the comments, the problem is that the plyr version of summarise is loaded after dplyr so when you call summarise you are getting the wrong one. You should try to load plyr first (or much better, try not to load it at all), but you can also play safe by being explicit which version of summarise you want.

library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))

# bad:
DF %>% group_by(COLUMN_NAME) %>%
plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))

# good:
DF %>% group_by(COLUMN_NAME) %>%
dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))

If you really need plyr loaded as well as dplyr it would be a good idea to do it this way, and also with other key conflicts like mutate. But better is to avoid having both loaded at once.

group_by doesn't work properly on retrosheet data

This is because you are using dplyr and plyr packages simultaneously.

summarize function is masked from dplyr by plyr package.

Try this:

ll_data_frame %>%
group_by(DayOfWeek) %>%
dplyr::summarize(R = sum(HomeRunsScore))

ll_data_frame %>%
group_by(VisitingTeam) %>%
dplyr::summarize(R = sum(HomeRunsScore))

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

mean low high min max sd
1 150 105 195 100 200 50

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

Source: local data frame [4 x 8]
Groups: DRUG

DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN


Related Topics



Leave a reply



Submit