Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

mean low high min max sd
1 150 105 195 100 200 50

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))

Source: local data frame [4 x 8]
Groups: DRUG

DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN

My dplyr code not working all of a sudden

It could be that the package plyr was also loaded along with dplyr and the mutate from plyrmasked the other mutate. An option is to specify dplyr:: or do this on a fresh R session with only dplyr loaded

library(dplyr)
New_promo_store%>%
dplyr::mutate(MiniTotal = rowSums(.[4:17], na.rm = TRUE)) %>%
group_by(`ITEM#`) %>%
dplyr::mutate(Total = sum(MiniTotal, na.rm = TRUE))

dplyr issues when using group_by(multiple variables)

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))

df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)

Why does a mutate following a group_by(year, month) seem to miss a row?

When you use group_by with summarise by default only last level of grouping is dropped.

So at this stage your data is still grouped by year.

tibble(
date = ymd("2002-12-31") + c(0:60),
index = 406 * exp(cumsum(rnorm(61,0,0.01)))
) %>% mutate(
year = year(date),
month = month(date)
) %>% group_by(year, month) %>% summarise(
date = last(date),
month.close = last(index))

# A tibble: 4 x 4
# Groups: year [2] # <- Notice this
# year month date month.close
# <int> <int> <date> <dbl>
#1 2002 12 2002-12-31 411.
#2 2003 1 2003-01-31 393.
#3 2003 2 2003-02-28 406.
#4 2003 3 2003-03-01 398.

To overcome this behavior you can specify .groups = 'drop' or use ungroup() after above step.

tibble(
date = ymd("2002-12-31") + c(0:60),
index = 406 * exp(cumsum(rnorm(61,0,0.01)))
) %>% mutate(
year = year(date),
month = month(date)
) %>% group_by(year, month) %>% summarise(
date = last(date),
month.close = last(index), .groups = 'drop',
) %>% mutate(
month.change = log(month.close / lag(month.close))
)

# year month date month.close month.change
# <int> <int> <date> <dbl> <dbl>
#1 2002 12 2002-12-31 399. NA
#2 2003 1 2003-01-31 380. -0.0510
#3 2003 2 2003-02-28 381. 0.00257
#4 2003 3 2003-03-01 381. 0.000673

For the second step since your data is grouped by only one key it is dropped after summarise and you get expected output.

group_by variable and sum in dplyr

It could be a case of plyr::mutate masking dplyr::mutate when both the packages are loaded. We can specify dplyr::<functionname> to correct this

library(dplyr)
mtcars%>%
group_by(cyl) %>%
dplyr::mutate(sum_hp = sum(hp))
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 856
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 856
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 909
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 856
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2929
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 856
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 2929
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 909
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 909
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 856
# … with 22 more rows

If we use plyr::mutate, the OP's output can be reproduced

mtcars%>%
group_by(cyl) %>%
plyr::mutate(
sum_hp = sum(hp)
)
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 4694
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 4694
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4694
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 4694
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4694
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4694
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 4694
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 4694
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 4694
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4694
# … with 22 more rows

tidyverse-dplyr summarise not operating as expected

As said in the comments, the problem is that the plyr version of summarise is loaded after dplyr so when you call summarise you are getting the wrong one. You should try to load plyr first (or much better, try not to load it at all), but you can also play safe by being explicit which version of summarise you want.

library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))

# bad:
DF %>% group_by(COLUMN_NAME) %>%
plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))

# good:
DF %>% group_by(COLUMN_NAME) %>%
dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))

If you really need plyr loaded as well as dplyr it would be a good idea to do it this way, and also with other key conflicts like mutate. But better is to avoid having both loaded at once.

group_by function is not working with another group_by

Since both the groups are same no need to calculate them differently, you can combine them and calculate hr_rain and RAINFALL together.

library(dplyr)

df %>%
group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>%
mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE),
RAINFALL = hr_rain - lag(hr_rain, default = 0))

data

df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19",
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05",
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"),
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA,
-24L), class = "data.frame")


Related Topics



Leave a reply



Submit