Get total of Pandas column
You should use sum
:
Total = df['MyColumn'].sum()
print(Total)
319
Then you use loc
with Series
, in that case the index should be set as the same as the specific column you need to sum:
df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index=['MyColumn'])
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
because if you pass scalar, the values of all rows will be filled:
df.loc['Total'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0
Two other solutions are with at
, and ix
see the applications below:
df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print(df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Note: Since Pandas v0.20, ix
has been deprecated. Use loc
or iloc
instead.
How to sum data.frame column values?
You can just use sum(people$Weight)
.
sum
sums up a vector, and people$Weight
retrieves the weight column from your data frame.
Note - you can get built-in help by using ?sum
, ?colSums
, etc. (by the way, colSums
will give you the sum for each column).
How do I sum values in a column that match a given condition using pandas?
The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.
Boolean indexing
Arguably the most common way to select the values is to use Boolean indexing.
With this method, you find out where column 'a' is equal to 1
and then sum the corresponding rows of column 'b'. You can use loc
to handle the indexing of rows and columns:
>>> df.loc[df['a'] == 1, 'b'].sum()
15
The Boolean indexing can be extended to other columns. For example if df
also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:
df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()
Query
Another way to select the data is to use query
to filter the rows you're interested in, select column 'b' and then sum:
>>> df.query("a == 1")['b'].sum()
15
Again, the method can be extended to make more complicated selections of the data:
df.query("a == 1 and c == 2")['b'].sum()
Note this is a little more concise than the Boolean indexing approach.
Groupby
The alternative approach is to use groupby
to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:
>>> df.groupby('a')['b'].sum()[1]
15
This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a
:
>>> df.groupby('a')['b'].sum()
a
1 15
2 8
How to sum over some columns based on condition in pandas
You can use mask
. The idea is to create a boolean mask with the w
columns, and use it to filter the relevant w
columns and sum
:
df['top_p'] = df.filter(like='p').mask(df.filter(like='w').isin(['CUSTOM_MASK','CUSTOM_UNKNOWN']).to_numpy()).sum(axis=1)
Output:
p1 p2 p3 p4 p5 w1 w2 w3 w4 w5 top_p
0 0.1 0.2 0.10 0.11 0.3 cancel good thanks CUSTOM_MASK CUSTOM_MASK 0.40
1 0.2 0.1 0.90 0.20 0.1 hello bad CUSTOM_MASK CUSTOM_UNKNOWN CUSTOM_MASK 0.30
2 0.3 0.3 0.01 0.40 0.5 hi ugly great trible job 1.51
Before sum
ming, the output of mask
looks like:
p1 p2 p3 p4 p5
0 0.1 0.2 0.10 NaN NaN
1 0.2 0.1 NaN NaN NaN
2 0.3 0.3 0.01 0.4 0.5
Sum rows based on columns inside pandas dataframe
You can replace values of tuple by first value of tuple in Series.mask
and then aggregate sum
:
tup = (1, 2)
df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0])
#alternative
#df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin'])
df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum()
print (df)
idrun idbasin time q
0 -192541 1 0 0.0
1 -192541 1 1 1.5
2 -192541 3 0 0.0
3 -192541 3 1 1.0
4 -192540 1 0 0.0
5 -192540 1 1 1.5
6 -192540 3 0 0.0
7 -192540 3 1 1.0
Sum values of a coulmn in specific rows in a dataframe
data[(data.index >= '2020-01-01 00:00:16') & (data.index <= '2020-01-01 00:00:17')].sum(axis=0)
simply, use axis = 0 for each coulmn sum, and check by a.index >= '2020-01-01 00:00:16' and equivalent for upper bound
if you want to use datetime module:
from datetime import datetime
data[(data.index >= datetime(2020, 1, 1, 0, 0, 16)) & (data.index <= datetime(2020, 1, 1, 0, 0, 17))].sum(axis=0)
How to sum values of one column based on other columns in pandas?
.groupby
and .sum()
for the home team and then do the same for the away team and add the two together:
df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()
output:
England 12
Scotland 34
Wales 1
More detailed explanation (per comment):
- You need to only
.groupby
one columnhome_team
. In your answer, you were grouping by['home_team', 'home_score']
Your goal (no pun intended) is to get the.sum()
of thehome_score
-- so you should NOT.groupby()
it. As you can see['home_score']
is after the part where I use.groupby
, so that I can get the.sum()
of it. That gets you set for the home teams. - Then, you do the same for the
away_team
. - At that point python / pandas is smart enough that since the results of the
home_team
andaway_team
groups have the same values for countries, you can simply add them together...
Related Topics
How to Replace Multiple Strings with the Same in R
Regex; Eliminate All Punctuation Except
What Does < Stand for in Data.Table Joins with On=
R 3.0.3 Rbind Multiple CSV Files
Keep Only Groups of Data with Multiple Observations
Why Does Subsetting a Column from a Data Frame VS. a Tibble Give Different Results
Create a Reactive Function Outside the Shiny App
How to Make a Barplot with R from a Table
Highlight a Line in Ggplot with Multiple Lines
Get Dates of a Certain Weekday from a Year in R
What Is the Practical Difference Between Data.Frame and Data.Table in R
Adding Multiple Lag Variables Using Dplyr and for Loops
How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable
Counting Occurrence of Particular Letter in Vector of Words in R