How to group by all but one columns?
You can do this using standard evaluation (group_by_
instead of group_by
):
# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
other=sample(c("red","green","black"),1000,replace=TRUE))
dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
g1 g2 g3 other meanValue
<fctr> <fctr> <int> <fctr> <dbl>
1 A a 2 green 0.89281475
2 A b 2 red -0.03558775
3 A b 5 black -1.79184218
4 A c 10 black 0.17518610
5 A e 5 black 0.25830392
...
See this vignette for more on standard vs. non-standard evaluation in dplyr
.
UPDATE for dplyr
0.7.0
To address @ÖmerAn's comment: It looks like group_by_at
is the way to go in dplyr
0.7.0 (someone please correct me if I'm wrong about this). For example:
dat %>%
group_by_at(setdiff(names(dat), "value")) %>%
summarise(meanValue=mean(value))
# Groups: g1, g2, g3 [?]
g1 g2 g3 other meanValue
<fctr> <fctr> <int> <fctr> <dbl>
1 A a 2 green 0.89281475
2 A b 2 red -0.03558775
3 A b 5 black -1.79184218
4 A c 10 black 0.17518610
5 A e 5 black 0.25830392
6 A e 5 red -0.81879788
7 A e 7 green 0.30836054
8 A f 2 green 0.05537047
9 A g 1 black 1.00156405
10 A g 10 black 1.26884303
# ... with 949 more rows
Let's confirm both methods give the same output (in dplyr
0.7.0):
new = dat %>%
group_by_at(setdiff(names(dat), "value")) %>%
summarise(meanValue=mean(value))
old = dat %>%
group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
identical(old, new)
# [1] TRUE
How to group by a dataframe by all columns except one column (data frame has more than 50 columns)
We can use dplyr::across
in group_by
:
library(dplyr)
df %>%
group_by(across(-logg_overall_assess_current)) %>%
summarise(mean = mean(logg_overall_assess_current, na.rm = TRUE))
#> `summarise()` has grouped output by 'Bedroom', 'TypeDwel1.2.Duplex',
#> 'TypeDwelApartment.Condo', 'TypeDwelOther', 'TypeDwelTownhouse', 'Age_new.
#> 70', 'Age_new0.1', 'Age_new16.40', 'Age_new2.5', 'Age_new41.70',
#> 'Age_new6.15', 'LandFreehold', 'LandLeasehold.prepaid', 'LandOthers',
#> 'cluster_K_mean.1', 'cluster_K_mean.2', 'cluster_K_mean.3', 'cluster_K_mean.
#> 4', 'Sold.Year.2018', 'Sold.Year.2019', 'Sold.Year.2020', 'S.A.Cambie',
#> 'S.A.Champlain.Heights', 'S.A.Coal.Harbour', 'S.A.Collingwood.VE',
#> 'S.A.Downtown.VE', 'S.A.Downtown.VW', 'S.A.Dunbar', 'S.A.Fairview.VW',
#> 'S.A.False.Creek', 'S.A.Fraser.VE', 'S.A.Fraserview.VE', 'S.A.Grandview.VE',
#> 'S.A.Grandview.Woodland', 'S.A.Hastings', 'S.A.Hastings.Sunrise',
#> 'S.A.Kerrisdale', 'S.A.Killarney.VE', 'S.A.Kitsilano', 'S.A.Knight',
#> 'S.A.Main', 'S.A.Marpole', 'S.A.Mount.Pleasant.VE', 'S.A.Mount.Pleasant.VW',
#> 'S.A.Oakridge.VW', 'S.A.Point.Grey', 'S.A.Quilchena', 'S.A.Renfrew.VE',
#> 'S.A.S.W..Marine', 'S.A.Shaughnessy', 'S.A.South.Cambie', 'S.A.South.Granville',
#> 'S.A.South.Marine', 'S.A.South.Vancouver', 'S.A.Strathcona',
#> 'S.A.University.VW', 'S.A.Victoria.VE', 'S.A.West.End.VW', 'S.A.Yaletown'. You
#> can override using the `.groups` argument.
#> # A tibble: 2 x 61
#> # Groups: Bedroom, TypeDwel1.2.Duplex, TypeDwelApartment.Condo,
#> # TypeDwelOther, TypeDwelTownhouse, Age_new.70, Age_new0.1, Age_new16.40,
#> # Age_new2.5, Age_new41.70, Age_new6.15, LandFreehold, LandLeasehold.prepaid,
#> # LandOthers, cluster_K_mean.1, cluster_K_mean.2, cluster_K_mean.3,
#> # cluster_K_mean.4, Sold.Year.2018, Sold.Year.2019, Sold.Year.2020,
#> # S.A.Cambie, S.A.Champlain.Heights, S.A.Coal.Harbour, S.A.Collingwood.VE,
#> # S.A.Downtown.VE, S.A.Downtown.VW, S.A.Dunbar, S.A.Fairview.VW,
#> # S.A.False.Creek, S.A.Fraser.VE, S.A.Fraserview.VE, S.A.Grandview.VE,
#> # S.A.Grandview.Woodland, S.A.Hastings, S.A.Hastings.Sunrise, S.A.Kerrisdale,
#> # S.A.Killarney.VE, S.A.Kitsilano, S.A.Knight, S.A.Main, S.A.Marpole,
#> # S.A.Mount.Pleasant.VE, S.A.Mount.Pleasant.VW, S.A.Oakridge.VW,
#> # S.A.Point.Grey, S.A.Quilchena, S.A.Renfrew.VE, S.A.S.W..Marine,
#> # S.A.Shaughnessy, S.A.South.Cambie, S.A.South.Granville, S.A.South.Marine,
#> # S.A.South.Vancouver, S.A.Strathcona, S.A.University.VW, S.A.Victoria.VE,
#> # S.A.West.End.VW, S.A.Yaletown [2]
#> Bedroom TypeDwel1.2.Duplex TypeDwelApartment.C… TypeDwelOther TypeDwelTownhou…
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 1 0 0
#> 2 2 0 1 0 0
#> # … with 56 more variables: Age_new.70 <dbl>, Age_new0.1 <dbl>,
#> # Age_new16.40 <dbl>, Age_new2.5 <dbl>, Age_new41.70 <dbl>,
#> # Age_new6.15 <dbl>, LandFreehold <dbl>, LandLeasehold.prepaid <dbl>,
#> # LandOthers <dbl>, cluster_K_mean.1 <dbl>, cluster_K_mean.2 <dbl>,
#> # cluster_K_mean.3 <dbl>, cluster_K_mean.4 <dbl>, Sold.Year.2018 <dbl>,
#> # Sold.Year.2019 <dbl>, Sold.Year.2020 <dbl>, S.A.Cambie <dbl>,
#> # S.A.Champlain.Heights <dbl>, S.A.Coal.Harbour <dbl>,
#> # S.A.Collingwood.VE <dbl>, S.A.Downtown.VE <dbl>, S.A.Downtown.VW <dbl>,
#> # S.A.Dunbar <dbl>, S.A.Fairview.VW <dbl>, S.A.False.Creek <dbl>,
#> # S.A.Fraser.VE <dbl>, S.A.Fraserview.VE <dbl>, S.A.Grandview.VE <dbl>,
#> # S.A.Grandview.Woodland <dbl>, S.A.Hastings <dbl>,
#> # S.A.Hastings.Sunrise <dbl>, S.A.Kerrisdale <dbl>, S.A.Killarney.VE <dbl>,
#> # S.A.Kitsilano <dbl>, S.A.Knight <dbl>, S.A.Main <dbl>, S.A.Marpole <dbl>,
#> # S.A.Mount.Pleasant.VE <dbl>, S.A.Mount.Pleasant.VW <dbl>,
#> # S.A.Oakridge.VW <dbl>, S.A.Point.Grey <dbl>, S.A.Quilchena <dbl>,
#> # S.A.Renfrew.VE <dbl>, S.A.S.W..Marine <dbl>, S.A.Shaughnessy <dbl>,
#> # S.A.South.Cambie <dbl>, S.A.South.Granville <dbl>, S.A.South.Marine <dbl>,
#> # S.A.South.Vancouver <dbl>, S.A.Strathcona <dbl>, S.A.University.VW <dbl>,
#> # S.A.Victoria.VE <dbl>, S.A.West.End.VW <dbl>, S.A.Yaletown <dbl>,
#> # S.A.Other <dbl>, mean <dbl>
Created on 2022-01-15 by the reprex package (v0.3.0)
How to group by all but except one column
you can use
Select Name,Title,Date from Books group by Name,Title,Date;
or
Select distinct Name,Title,Date from Books;
for the same purpose. It wont make any difference as the dates are same.
if dates are not same then you can use any aggregate function to achieve the same. like:Select Name,Title,max(Date)--any aggregate function based on logic
from Books group by Name,Title,Date;
Grouping by all columns except one to count distinct values on the remaining column
One other way to do what you want would be to select distinct rows first, then apply the windowed COUNT() function:
WITH distinctrows AS (
SELECT DISTINCT
Column1,
Column2,
Column3,
Column4
FROM TableA
)
SELECT
Column1,
Column2,
Column3,
Column4,
COUNT(Column4) OVER (PARTITION BY Column1, Column2, Column3) AS Column5
FROM distinctrows
;
Select multiple columns from a table, but group by one
I use this trick to group by one column when I have a multiple columns selection:
SELECT MAX(id) AS id,
Nume,
MAX(intrare) AS intrare,
MAX(iesire) AS iesire,
MAX(intrare-iesire) AS stoc,
MAX(data) AS data
FROM Produse
GROUP BY Nume
ORDER BY Nume
This works.
Select all columns with GROUP BY one column
distinct on
select distinct on (key) *
from t
order by key, name
Notice that the order by
clause determines which row will win the ties.
Pandas Groupby and Sum Only One Column
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
Give this a try:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False
option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
df.groupby(['A','C'], as_index=False)['B'].sum()
How to use multiple columns but only one column in GROUP BY and ORDER BY
I think this is what you want:
SELECT
r.id_rekrutmen,
r.judul_rekrutmen,
MAX(rw.tanggal_rekrutmen) AS tanggal_rekrutmen
FROM rekrutmen r
INNER JOIN rekrutmen_waktu rw
ON r.id_rekrutmen = rw.id_rekrutmen
GROUP BY
r.id_rekrutmen,
r.judul_rekrutmen,
How do I summarise all columns except one(s) I specify?
Edit:
Modified versions of the two methods below for dplyr version >= 1, since summarise_at
is superseded
df %>%
summarise(across(where(is.numeric) & !Registered, sum))
df %>%
summarise(across(-Registered, sum))
Original Answer:
I would use summarise_at
, and just make a logical vector which is FALSE
for non-numeric columns and Registered
and TRUE
otherwise, i.e.
df %>%
summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)
If you wanted to just summarise all but one column you could do
df %>%
summarise_at(vars(-Registered), sum)
but in this case you have to check if it's numeric also.
Notes:
factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace
sapply(df, is.numeric)
withsapply(df, function(x) is.numeric(x) & !is.factor(x))
If your data is big I think it is faster to use
sapply(df[1,], is.numeric)
instead ofsapply(df, is.numeric)
. (Someone please correct me if I'm wrong)
Related Topics
How to Solve Prcomp.Default(): Cannot Rescale a Constant/Zero Column to Unit Variance
Reversed Order After Coord_Flip in R
Plotting Multiple Curves Same Graph and Same Scale
Reading Excel File: How to Find the Start Cell in Messy Spreadsheets
How to Set Attributes for a Variable in R
Read.Table() and Read.CSV Both Error in Rmd
Adding Lists Names as Plot Titles in Lapply Call in R
How to Read Data from Cassandra with R
R Shiny Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'
How to Adjust Facet Size Manually
Data.Table Alternative for Dplyr Case_When
Sendmailr (Part2): Sending Files as Mail Attachments
R: Xtable Caption (Or Comment)
Generate Matrix with Iid Normal Random Variables Using R