Group by using base R
Here's another base R solution using by
do.call(rbind, by(df, df[, 1:3],
function(x) cbind(x[1, 1:3], sum(x$sales), mean(x$units))))
Or using "split\apply\combine" theory
t(sapply(split(df, df[, 1:3], drop = TRUE),
function(x) c(sumSales = sum(x$sales), meanUnits = mean(x$units))))
Or similarly
do.call(rbind, lapply(split(df, df[, 1:3], drop = TRUE),
function(x) c(sumSales = sum(x$sales), meanUnits = mean(x$units))))
Edit: it seems like df
is of class data.table
(but you for some reason asked for base R solution only), here's how you would do it with your data.table
object
df[, .(sumSales = sum(sales), meanUnits = mean(units)), keyby = .(year, quarter, Channel)]
# year quarter Channel sumSales meanUnits
# 1: 2013 Q1 AAA 4855 15.0
# 2: 2013 Q1 BBB 2231 12.0
# 3: 2013 Q2 AAA 4004 17.5
# 4: 2013 Q2 BBB 2057 23.0
# 5: 2013 Q3 AAA 2558 21.0
# 6: 2013 Q3 BBB 4807 21.0
# 7: 2013 Q4 AAA 4291 12.0
# 8: 2013 Q4 BBB 1128 25.0
# 9: 2014 Q1 AAA 2169 23.0
# 10: 2014 Q1 CCC 3912 16.5
# 11: 2014 Q2 AAA 2613 21.0
# 12: 2014 Q2 BBB 1533 11.0
# 13: 2014 Q2 CCC 2114 23.0
# 14: 2014 Q3 BBB 5219 13.0
# 15: 2014 Q3 CCC 1614 15.0
# 16: 2014 Q4 AAA 2695 14.0
# 17: 2014 Q4 BBB 4177 15.0
Base R instead of dplyr: group and summarise the data?
One way to do this is to use aggregate
. This is the most straightforward base
method, I think. You can use other functions as well, but this one is the easiest to follow.
aggregate(Sport ~ Sex + Season, data = data,
FUN = function(x) length(unique(x)) )
Sex Season Sport
1 F Summer 40
2 M Summer 49
3 F Winter 14
4 M Winter 17
What is the Base R equivalent of this dplyr group_by code?
We could use proportions
on the table
output after subset
ting to remove the NA
(complete.cases
) and select
ing the columns
The data is from forcats
package. So, load the package and get the data
library(forcats)
data(gss_cat)
Use the table/proportions
as mentioned above
by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age),
select = c(age, marital))), 1)
-output
head(by_age2_base, 3)
marital
age No answer Never married Separated Divorced Widowed Married
18 0.000000000 0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
19 0.000000000 0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
20 0.000000000 0.904382470 0.003984064 0.007968127 0.000000000 0.083665339
-compare with the OP's output
head(by_age2, 3)
# A tibble: 3 x 4
# Groups: age [2]
age marital n prop
<int> <fct> <int> <dbl>
1 18 Never married 89 0.978
2 18 Married 2 0.0220
3 19 Never married 234 0.940
If we need the output in 'long' format, convert the table
to data.frame
with as.data.frame
by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)
Or another option is aggregate/ave
(use R 4.1.0
)
subset(gss_cat, complete.cases(age), select = c(age, marital)) |>
{\(dat) aggregate(cbind(n = age) ~ age + marital,
data = dat, FUN = length)}() |>
transform(prop = ave(n, age, FUN = \(x) x/sum(x)))
How can I group variables in when dplyr and base R functions don't work?
If you are just looking for the unique rows of MUN_RESID and V16 - you can use the duplicated function
months0606[ !duplicated( months0606[ , c( "MUN_RESID","V16")]) , ]
since you are dealing with a large data set you could consider data.table but you need to decide what operations you are doing by your groups. I took the means, in your example it matches the duplicated function, but wouldn't if there were differences in either of the X08 vars
library( data.table )
months0606 <- data.table( months0606 )
months0606[ , .(
X08.2005_P=mean(X08.2005_P),
X09.2005_P=mean( X09.2005_P)
),
by=c("MUN_RESID" , "V16" )]
Question Using group_by/summarise or group_by/mutate in Base R
In base R
an option is by
by(test, test[c('ctr_n', 'yr', 'mn', 'pty')], FUN = function(x) ineq(x$vote.shares, NULL, type = "Gini", na.rm = TRUE))
Or another option is split
out <- do.call(rbind, lapply(split(test, test[c('ctr_n', 'yr', 'mn', 'pty')],
drop = TRUE), function(x) data.frame(x[1,],
giniI = ineq(x$vote.shares, NULL, type = "Gini", na.rm = TRUE))))
row.names(out) <- NULL
[[base R]] Add T/F column for whether it's the minimum value for each group
df1 <- transform(df, cheapest = ave(weight, item, FUN = min) == weight)
df1
item weight cheapest
1 apple 700 FALSE
2 apple 500 TRUE
3 orange 500 FALSE
4 peach 200 TRUE
5 apple 900 FALSE
6 orange 200 TRUE
Running multiple T-Test on variables with groupings in R (not using rstatix)
The error relates to the number of observations in 'Grouping'. There is a case of having 1 observation. With base R
, we can do this as
lapply(split(df, df$Item), function(x) if(any(table(x$Grouping) < 2))
NA else t.test(Cost ~ Grouping, data = x))
-output
$`Book A`
Welch Two Sample t-test
data: Cost by Grouping
t = -1.3416, df = 1.4706, p-value = 0.3499
alternative hypothesis: true difference in means between group A and group B is not equal to 0
95 percent confidence interval:
-8.418523 5.418523
sample estimates:
mean in group A mean in group B
6.5 8.0
$`Book B`
[1] NA
$`Book C`
Welch Two Sample t-test
data: Cost by Grouping
t = 1.3868, df = 1.8989, p-value = 0.3059
alternative hypothesis: true difference in means between group A and group B is not equal to 0
95 percent confidence interval:
-5.666332 10.666332
sample estimates:
mean in group A mean in group B
5.5 3.0
$`Book D`
Welch Two Sample t-test
data: Cost by Grouping
t = -0.42857, df = 1, p-value = 0.7422
alternative hypothesis: true difference in means between group A and group B is not equal to 0
95 percent confidence interval:
-45.97172 42.97172
sample estimates:
mean in group A mean in group B
4.0 5.5
Or getting the pvalue
stack(lapply(split(df, df$Item), function(x) if(any(table(x$Grouping) < 2))
NA else t.test(Cost ~ Grouping, data = x)$p.value))[2:1]
ind values
1 Book A 0.3498856
2 Book B NA
3 Book C 0.3058987
4 Book D 0.7422379
The same approach can be done with dplyr
library(dplyr)
df %>%
add_count(Item, Grouping) %>%
group_by(Item) %>%
summarise(out = list(if(any(n < 2)) NA else t.test(Cost ~ Grouping)))
-output
# A tibble: 4 × 2
Item out
<fct> <list>
1 Book A <htest>
2 Book B <lgl [1]>
3 Book C <htest>
4 Book D <htest>
If it needs only the pvalue
df %>%
add_count(Item, Grouping) %>%
group_by(Item) %>%
summarise(out = if(any(n < 2)) NA_real_ else t.test(Cost ~ Grouping)$p.value)
# A tibble: 4 × 2
Item out
<fct> <dbl>
1 Book A 0.350
2 Book B NA
3 Book C 0.306
4 Book D 0.742
Remove groups with only one individual in R without using dplyr package
Or another option is with tidyverse
- after grouping by 'group', filter
the rows where the number of distinct (n_distinct
) elements in 'individualID' is greater than 1
library(dplyr)
df1 %>%
group_by(group) %>%
filter(n_distinct(individualID) > 1) %>%
ungroup
# A tibble: 8 × 3
group individualID X
<dbl> <dbl> <int>
1 1 1 0
2 1 1 0
3 1 2 1
4 1 2 1
5 3 5 0
6 3 5 0
7 3 6 1
8 3 6 0
Or with subset
and ave
from base R
subset(df1, ave(individualID, group, FUN = function(x) length(unique(x))) > 1)
group individualID X
1 1 1 0
2 1 1 0
3 1 2 1
4 1 2 1
7 3 5 0
8 3 5 0
9 3 6 1
10 3 6 0
How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
Related Topics
How to Extract Just the Number from a Named Number (Without the Name)
How to Use a List as a Hash in R? If So, Why Is It So Slow
Multiple Ggplots of Different Sizes
Knitr Gets Tricked by Data.Table ':=' Assignment
Insert a Logo in Upper Right Corner of R Markdown PDF Document
How to Use Map from Purrr with Dplyr::Mutate to Create Multiple New Columns Based on Column Pairs
Controlling Order of Facet_Grid/Facet_Wrap in Ggplot2
How to 'Print' or 'Cat' When Using Parallel
How to Index an Element of a List Object in R
Case-Insensitive Search of a List in R
Analyzing Daily/Weekly Data Using Ts in R
Cut() Error - 'Breaks' Are Not Unique
MAC Os X R Error "Ld: Warning: Directory Not Found for Option"
Adding a Company Logo to Shinydashboard Header
Randomly Insert Nas into Dataframe Proportionaly