Summing All Columns by Group

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Pandas groupby.sum for all columns

You can filter first and then pass df['group'] instead group to groupby, last add sum column by DataFrame.assign:

df1 = (df.filter(regex=r'_name$')
.groupby(df['group']).sum()
.assign(sum = lambda x: x.sum(axis=1)))

ALternative is filter columns names and pass after groupby:

cols = df.filter(regex=r'_name$').columns

df1 = df.groupby('group')[cols].sum()

Or:

cols = df.columns[df.columns.str.contains(r'_name$')]

df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))


print (df1)
a_name b_name q_name sum
group
a 7 13 10 30
b 10 6 10 26
c 10 2 5 17

How to calculate the sum of all columns based on a grouped variable and remove NA

Just the tilde ~ is missing:

data %>%
group_by(ID) %>%
summarise(across(everything(), ~sum(., na.rm = T)))
# A tibble: 4 x 3
ID var1 var2
* <dbl> <dbl> <dbl>
1 1 3 1
2 2 15 1
3 3 28 1
4 4 1 30

In case one ID group has only NA values you can do this:

data %>%
group_by(ID) %>%
summarise(across(everything(), ~ifelse(all(is.na(.)), NA, sum(., na.rm = T))))

How to sum a variable by group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))

data.table sum of all colums by group

I think the code you're looking for is likely:

TestData[, .(a = sum(.SD)), by = .(id, year), .SDcols = Kattegori_Henter("Medicine")]

How do I Pandas group-by to get sum?

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1

To specify the column to sum, use this: df.groupby(['Name', 'Fruit'])['Number'].sum()

Get sum of all columns greater than 0 except the highest values

Try this:

df = df.set_index('Group')
df.where(df.ne(df.max(axis=1), axis=0) & (df > 0)).sum(1)

Output:

Group
A 16.0
B 74.0
C 24.0
D 49.0
dtype: float64

Details:

Find max on each row, look for values in that row that are not equal to that max AND values greater than zero then sum with axis=1.

Or..

df.mask(df<0).sum(1) - df.max(1)

Output:

Group
A 16.0
B 74.0
C 24.0
D 49.0
dtype: float64


Related Topics



Leave a reply



Submit