Pandas - dataframe groupby - how to get sum of multiple columns
By using apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
Pandas groupby.sum for all columns
You can filter first and then pass df['group']
instead group
to groupby
, last add sum
column by DataFrame.assign
:
df1 = (df.filter(regex=r'_name$')
.groupby(df['group']).sum()
.assign(sum = lambda x: x.sum(axis=1)))
ALternative is filter columns names and pass after groupby
:
cols = df.filter(regex=r'_name$').columns
df1 = df.groupby('group')[cols].sum()
Or:
cols = df.columns[df.columns.str.contains(r'_name$')]
df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))
print (df1)
a_name b_name q_name sum
group
a 7 13 10 30
b 10 6 10 26
c 10 2 5 17
How to calculate the sum of all columns based on a grouped variable and remove NA
Just the tilde ~
is missing:
data %>%
group_by(ID) %>%
summarise(across(everything(), ~sum(., na.rm = T)))
# A tibble: 4 x 3
ID var1 var2
* <dbl> <dbl> <dbl>
1 1 3 1
2 2 15 1
3 3 28 1
4 4 1 30
In case one ID
group has only NA
values you can do this:
data %>%
group_by(ID) %>%
summarise(across(everything(), ~ifelse(all(is.na(.)), NA, sum(., na.rm = T))))
How to sum a variable by group
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
data.table sum of all colums by group
I think the code you're looking for is likely:
TestData[, .(a = sum(.SD)), by = .(id, year), .SDcols = Kattegori_Henter("Medicine")]
How do I Pandas group-by to get sum?
Use GroupBy.sum
:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()
Get sum of all columns greater than 0 except the highest values
Try this:
df = df.set_index('Group')
df.where(df.ne(df.max(axis=1), axis=0) & (df > 0)).sum(1)
Output:
Group
A 16.0
B 74.0
C 24.0
D 49.0
dtype: float64
Details:
Find max on each row, look for values in that row that are not equal to that max AND values greater than zero then sum with axis=1.
Or..
df.mask(df<0).sum(1) - df.max(1)
Output:
Group
A 16.0
B 74.0
C 24.0
D 49.0
dtype: float64
Related Topics
Read CSV with Two Headers into a Data.Frame
Extent of Boundary of Text in R Plot
How to Change Name of Factor Levels
How to Add Legend to Geom_Smooth in Ggplot in R
Stacked Bar Chart, Reorder by Total (Sum Up of Values) Instead of Value Ggplot2 + Dplyr
R: Adding a "Tool Tip" to Interactive Plot (Plotly)
"Unpacking" a Factor List from a Data.Frame
Plot with Ggplot in For-Loop Doesn't Work
R Ggplot Ordering Bars in "Barplot-Like " Plot
Export Both Image and Data from R to an Excel Spreadsheet
In R, How to Plot into a Memory Buffer Instead of a File
Fastest Way to Sort Each Row of a Large Matrix in R