How to Sum Data.Frame Column Values

Get total of Pandas column

You should use sum:

Total = df['MyColumn'].sum()
print(Total)
319

Then you use loc with Series, in that case the index should be set as the same as the specific column you need to sum:

df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index=['MyColumn'])
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN

because if you pass scalar, the values of all rows will be filled:

df.loc['Total'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0

Two other solutions are with at, and ix see the applications below:

df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN


df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN

Note: Since Pandas v0.20, ix has been deprecated. Use loc or iloc instead.

How to sum data.frame column values?

You can just use sum(people$Weight).

sum sums up a vector, and people$Weight retrieves the weight column from your data frame.

Note - you can get built-in help by using ?sum, ?colSums, etc. (by the way, colSums will give you the sum for each column).

How do I sum values in a column that match a given condition using pandas?

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

Boolean indexing

Arguably the most common way to select the values is to use Boolean indexing.

With this method, you find out where column 'a' is equal to 1 and then sum the corresponding rows of column 'b'. You can use loc to handle the indexing of rows and columns:

>>> df.loc[df['a'] == 1, 'b'].sum()
15

The Boolean indexing can be extended to other columns. For example if df also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()

Query

Another way to select the data is to use query to filter the rows you're interested in, select column 'b' and then sum:

>>> df.query("a == 1")['b'].sum()
15

Again, the method can be extended to make more complicated selections of the data:

df.query("a == 1 and c == 2")['b'].sum()

Note this is a little more concise than the Boolean indexing approach.

Groupby

The alternative approach is to use groupby to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

>>> df.groupby('a')['b'].sum()[1]
15

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a:

>>> df.groupby('a')['b'].sum()
a
1 15
2 8

How to sum over some columns based on condition in pandas

You can use mask. The idea is to create a boolean mask with the w columns, and use it to filter the relevant w columns and sum:

df['top_p'] = df.filter(like='p').mask(df.filter(like='w').isin(['CUSTOM_MASK','CUSTOM_UNKNOWN']).to_numpy()).sum(axis=1)

Output:

    p1   p2    p3    p4   p5      w1    w2           w3              w4           w5  top_p
0 0.1 0.2 0.10 0.11 0.3 cancel good thanks CUSTOM_MASK CUSTOM_MASK 0.40
1 0.2 0.1 0.90 0.20 0.1 hello bad CUSTOM_MASK CUSTOM_UNKNOWN CUSTOM_MASK 0.30
2 0.3 0.3 0.01 0.40 0.5 hi ugly great trible job 1.51

Before summing, the output of mask looks like:

    p1   p2    p3   p4   p5
0 0.1 0.2 0.10 NaN NaN
1 0.2 0.1 NaN NaN NaN
2 0.3 0.3 0.01 0.4 0.5

Sum rows based on columns inside pandas dataframe

You can replace values of tuple by first value of tuple in Series.mask and then aggregate sum:

tup = (1, 2)

df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0])
#alternative
#df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin'])
df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum()
print (df)
idrun idbasin time q
0 -192541 1 0 0.0
1 -192541 1 1 1.5
2 -192541 3 0 0.0
3 -192541 3 1 1.0
4 -192540 1 0 0.0
5 -192540 1 1 1.5
6 -192540 3 0 0.0
7 -192540 3 1 1.0

Sum values of a coulmn in specific rows in a dataframe

data[(data.index >= '2020-01-01 00:00:16') & (data.index <= '2020-01-01 00:00:17')].sum(axis=0)

simply, use axis = 0 for each coulmn sum, and check by a.index >= '2020-01-01 00:00:16' and equivalent for upper bound

if you want to use datetime module:

from datetime import datetime
data[(data.index >= datetime(2020, 1, 1, 0, 0, 16)) & (data.index <= datetime(2020, 1, 1, 0, 0, 17))].sum(axis=0)

How to sum values of one column based on other columns in pandas?

.groupby and .sum() for the home team and then do the same for the away team and add the two together:

df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()

output:

England     12
Scotland 34
Wales 1

More detailed explanation (per comment):

  1. You need to only .groupby one column home_team. In your answer, you were grouping by ['home_team', 'home_score'] Your goal (no pun intended) is to get the .sum() of the home_score -- so you should NOT .groupby() it. As you can see ['home_score'] is after the part where I use .groupby, so that I can get the .sum() of it. That gets you set for the home teams.
  2. Then, you do the same for the away_team.
  3. At that point python / pandas is smart enough that since the results of the home_team and away_team groups have the same values for countries, you can simply add them together...


Related Topics



Leave a reply



Submit