How to Sum Values in a Column That Match a Given Condition Using Pandas

How do I sum values in a column that match a given condition using pandas?

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

Boolean indexing

Arguably the most common way to select the values is to use Boolean indexing.

With this method, you find out where column 'a' is equal to 1 and then sum the corresponding rows of column 'b'. You can use loc to handle the indexing of rows and columns:

>>> df.loc[df['a'] == 1, 'b'].sum()
15

The Boolean indexing can be extended to other columns. For example if df also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()

Query

Another way to select the data is to use query to filter the rows you're interested in, select column 'b' and then sum:

>>> df.query("a == 1")['b'].sum()
15

Again, the method can be extended to make more complicated selections of the data:

df.query("a == 1 and c == 2")['b'].sum()

Note this is a little more concise than the Boolean indexing approach.

Groupby

The alternative approach is to use groupby to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

>>> df.groupby('a')['b'].sum()[1]
15

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a:

>>> df.groupby('a')['b'].sum()
a
1 15
2 8

python sum a column's value with condition

To get the sum of positive values in the column, use the appropriate condition

import pandas as pd

df = pd.DataFrame({'price': [12, 14, 15, 10, 2, 4, -5, -4, -3, -5, 16, 15]})
total = df.loc[df['price'] > 0, 'price'].sum()
print(total) # 88

That isn't a good idea to set a column with values not relative to the other row param, here one single value. But to get the logic

# you need to pad with zeros, if you not you'll have 88 at every row
df['total'] = [total] + [0] * (len(df) - 1)
print(df)
    price  total
0 12 88
1 14 0
2 15 0
3 10 0
4 2 0
5 4 0
6 -5 0
7 -4 0
8 -3 0
9 -5 0
10 16 0
11 15 0

Sum column based on another column in Pandas DataFrame

I ended up using this script:

dff = df.groupby(["SINID","EXTRA"]).MONTREGL.sum().reset_index()

And it works in this test and production.

Python: sum values in column where condition is met

You can first group by "exchange", then apply np.cumsum and finally assign the result where type is "deposit".

import pandas as pd
import numpy as np

df.loc[df["type"]=="deposit", "balance"] = df.loc[df["type"]=="deposit"].groupby("exchange", sort=False)["value"].apply(np.cumsum)

Finally you can fill missing value with the forward-fill as you have mentioned.

df = df.fillna(method='ffill')

Python sum values in column given a condition

One can use Groupby to do this efficiently

Assuming that the dataframe is df

ans = df.groupby(df['Item Code'])['Units Sold'].sum()

This is the output .

Item Code
179 3
180 5
190 8
Name: Units Sold, dtype: int64

Hope this helps!

How to sum over some columns based on condition in pandas

You can use mask. The idea is to create a boolean mask with the w columns, and use it to filter the relevant w columns and sum:

df['top_p'] = df.filter(like='p').mask(df.filter(like='w').isin(['CUSTOM_MASK','CUSTOM_UNKNOWN']).to_numpy()).sum(axis=1)

Output:

    p1   p2    p3    p4   p5      w1    w2           w3              w4           w5  top_p
0 0.1 0.2 0.10 0.11 0.3 cancel good thanks CUSTOM_MASK CUSTOM_MASK 0.40
1 0.2 0.1 0.90 0.20 0.1 hello bad CUSTOM_MASK CUSTOM_UNKNOWN CUSTOM_MASK 0.30
2 0.3 0.3 0.01 0.40 0.5 hi ugly great trible job 1.51

Before summing, the output of mask looks like:

    p1   p2    p3   p4   p5
0 0.1 0.2 0.10 NaN NaN
1 0.2 0.1 NaN NaN NaN
2 0.3 0.3 0.01 0.4 0.5

Pandas: How to sum columns based on conditional of other column values?

The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
A B C
0 0.197334 0.707852 -0.443475
1 -1.063765 -0.914877 1.585882
2 0.899477 1.064308 1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858 0.777523 -0.453747

In [73]:
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000

Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000

How do I sum up values in a column into groups that match a given condition by date in pandas?

You can get first numeric value by Series.str.extract, compare by 60 and set by np.where to 2 groups:

m = df['AgeGroup'].str.extract('(\d+)', expand=False).astype(int) < 60
df['AgeGroup'] = np.where(m, '18 - 59', '60+')

df1 = df.groupby(['Date', 'AgeGroup'])['Quantity'].sum()
print (df1)
Date AgeGroup
2020-12-08 18 - 59 7
60+ 6
2020-12-09 18 - 59 5
60+ 5
Name: Quantity, dtype: int64


Related Topics



Leave a reply



Submit