Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise
is masking dplyr's function summarise
. When that happens you get this warning:
library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------
Attaching package: ‘plyr’
The following objects are masked from ‘package:dplyr’:
arrange, desc, failwith, id, mutate, summarise, summarize
So in order for your code to work, either detach plyr detach(package:plyr)
or restart R and load plyr first and then dplyr (or load only dplyr):
library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group
group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11
Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:
dfx %>% group_by(group, sex) %>%
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)
I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.
This is what happens with plyr loaded last.
library(dplyr)
library(plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
mean low high min max sd
1 150 105 195 100 200 50
Now remove plyr and try again and you get the grouped summary.
detach(package:plyr)
df %>%
group_by(DRUG,FED) %>%
summarize(mean=mean(AUC0t, na.rm=TRUE),
low = CI90lo(AUC0t),
high= CI90hi(AUC0t),
min=min(AUC0t, na.rm=TRUE),
max=max(AUC0t,na.rm=TRUE),
sd= sd(AUC0t, na.rm=TRUE))
Source: local data frame [4 x 8]
Groups: DRUG
DRUG FED mean low high min max sd
1 0 0 150 150 150 150 150 NaN
2 0 1 NaN NA NA NA NA NaN
3 1 0 100 100 100 100 100 NaN
4 1 1 200 200 200 200 200 NaN
My dplyr code not working all of a sudden
It could be that the package plyr
was also loaded along with dplyr
and the mutate
from plyr
masked the other mutate
. An option is to specify dplyr::
or do this on a fresh R
session with only dplyr
loaded
library(dplyr)
New_promo_store%>%
dplyr::mutate(MiniTotal = rowSums(.[4:17], na.rm = TRUE)) %>%
group_by(`ITEM#`) %>%
dplyr::mutate(Total = sum(MiniTotal, na.rm = TRUE))
dplyr issues when using group_by(multiple variables)
Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use
mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)
Note that this will give a different answer if you use group_by(gear, cyl)
in the second line.
And to get your first attempt working:
df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))
df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)
Why does a mutate following a group_by(year, month) seem to miss a row?
When you use group_by
with summarise
by default only last level of grouping is dropped.
So at this stage your data is still grouped by year
.
tibble(
date = ymd("2002-12-31") + c(0:60),
index = 406 * exp(cumsum(rnorm(61,0,0.01)))
) %>% mutate(
year = year(date),
month = month(date)
) %>% group_by(year, month) %>% summarise(
date = last(date),
month.close = last(index))
# A tibble: 4 x 4
# Groups: year [2] # <- Notice this
# year month date month.close
# <int> <int> <date> <dbl>
#1 2002 12 2002-12-31 411.
#2 2003 1 2003-01-31 393.
#3 2003 2 2003-02-28 406.
#4 2003 3 2003-03-01 398.
To overcome this behavior you can specify .groups = 'drop'
or use ungroup()
after above step.
tibble(
date = ymd("2002-12-31") + c(0:60),
index = 406 * exp(cumsum(rnorm(61,0,0.01)))
) %>% mutate(
year = year(date),
month = month(date)
) %>% group_by(year, month) %>% summarise(
date = last(date),
month.close = last(index), .groups = 'drop',
) %>% mutate(
month.change = log(month.close / lag(month.close))
)
# year month date month.close month.change
# <int> <int> <date> <dbl> <dbl>
#1 2002 12 2002-12-31 399. NA
#2 2003 1 2003-01-31 380. -0.0510
#3 2003 2 2003-02-28 381. 0.00257
#4 2003 3 2003-03-01 381. 0.000673
For the second step since your data is grouped by only one key it is dropped after summarise
and you get expected output.
group_by variable and sum in dplyr
It could be a case of plyr::mutate
masking dplyr::mutate
when both the packages are loaded. We can specify dplyr::<functionname>
to correct this
library(dplyr)
mtcars%>%
group_by(cyl) %>%
dplyr::mutate(sum_hp = sum(hp))
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 856
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 856
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 909
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 856
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2929
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 856
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 2929
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 909
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 909
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 856
# … with 22 more rows
If we use plyr::mutate
, the OP's output can be reproduced
mtcars%>%
group_by(cyl) %>%
plyr::mutate(
sum_hp = sum(hp)
)
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 4694
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 4694
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4694
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 4694
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4694
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4694
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 4694
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 4694
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 4694
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4694
# … with 22 more rows
tidyverse-dplyr summarise not operating as expected
As said in the comments, the problem is that the plyr
version of summarise
is loaded after dplyr
so when you call summarise
you are getting the wrong one. You should try to load plyr
first (or much better, try not to load it at all), but you can also play safe by being explicit which version of summarise
you want.
library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"),
DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
# bad:
DF %>% group_by(COLUMN_NAME) %>%
plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
# good:
DF %>% group_by(COLUMN_NAME) %>%
dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) &
!(all(grepl("char", DATA_TYPE)))))
If you really need plyr
loaded as well as dplyr
it would be a good idea to do it this way, and also with other key conflicts like mutate
. But better is to avoid having both loaded at once.
group_by function is not working with another group_by
Since both the groups are same no need to calculate them differently, you can combine them and calculate hr_rain
and RAINFALL
together.
library(dplyr)
df %>%
group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>%
mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE),
RAINFALL = hr_rain - lag(hr_rain, default = 0))
data
df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19",
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05",
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"),
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA,
-24L), class = "data.frame")
Related Topics
How to Generate the First N Terms in the Series:
Converting Year and Month ("Yyyy-Mm" Format) to a Date
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Linear Regression and Group by in R
Is the "*Apply" Family Really Not Vectorized
Combine Legends For Color and Shape into a Single Legend
How to Do Vlookup and Fill Down (Like in Excel) in R
Select Groups Based on Number of Unique/Distinct Values
Generate List of All Possible Combinations of Elements of Vector
Replacing Character Values With Na in a Data Frame
Filter Data.Frame Rows by a Logical Condition
Subset Rows Corresponding to Max Value by Group Using Data.Table
Extract the Maximum Value Within Each Group in a Dataframe
Finding Local Maxima and Minima
How to Convert Excel Date Format to Proper Date in R
How to Succinctly Write a Formula With Many Variables from a Data Frame