Pandas: sum DataFrame rows for given columns
You can just sum
and set param axis=1
to sum the rows, this will ignore none numeric columns:
In [91]:
df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4], 'c':['dd','ee','ff'], 'd':[5,9,1]})
df['e'] = df.sum(axis=1)
df
Out[91]:
a b c d e
0 1 2 dd 5 8
1 2 3 ee 9 14
2 3 4 ff 1 8
If you want to just sum specific columns then you can create a list of the columns and remove the ones you are not interested in:
In [98]:
col_list= list(df)
col_list.remove('d')
col_list
Out[98]:
['a', 'b', 'c']
In [99]:
df['e'] = df[col_list].sum(axis=1)
df
Out[99]:
a b c d e
0 1 2 dd 5 3
1 2 3 ee 9 5
2 3 4 ff 1 7
Sum rows based on columns inside pandas dataframe
You can replace values of tuple by first value of tuple in Series.mask
and then aggregate sum
:
tup = (1, 2)
df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0])
#alternative
#df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin'])
df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum()
print (df)
idrun idbasin time q
0 -192541 1 0 0.0
1 -192541 1 1 1.5
2 -192541 3 0 0.0
3 -192541 3 1 1.0
4 -192540 1 0 0.0
5 -192540 1 1 1.5
6 -192540 3 0 0.0
7 -192540 3 1 1.0
Summing values of a pandas data frame given a list of columns
You can use subset of df
and sum
:
print df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 3 4 5 6 3 3
2 1 2 3 6 1 2
print df[['x1', 'x3', 'x4']]
x1 x3 x4
0 1 3 4
1 3 5 6
2 1 3 6
li = ['x1', 'x3', 'x4']
print df[li]
x1 x3 x4
0 1 3 4
1 3 5 6
2 1 3 6
print df[li].sum()
x1 5
x3 11
x4 16
dtype: int64
print df[li].sum(axis=1)
0 8
1 14
2 10
dtype: int64
Sum Rows at Bottom of Pandas Dataframe
You can sum the columns of interest only:
## recreate your data
df = pd.DataFrame({'name':['joe','jane'],'age':[25,55],'sales':[100,40],'commissions':[10,4]})
df.loc['Total'] = df[['sales','commissions']].sum()
Result:
>>> df
name age sales commissions
0 joe 25.0 100.0 10.0
1 jane 55.0 40.0 4.0
Total NaN NaN 140.0 14.0
If you don't want the NaN to appear, you can replace them with an empty string: df = df.fillna('')
Result:
>>> df
name age sales commissions
0 joe 25.0 100.0 10.0
1 jane 55.0 40.0 4.0
Total 140.0 14.0
For each day get the sum of all rows in a very large Pandas DataFrame which match in two specific columns
Similarly as previous response with using groupby and agg but make the sum on unique key combination:
result = my_df.groupby(['day', my_df.pair.apply(set).apply(tuple)])[['amount']].agg('sum').reset_index()
With a random 5000 length DataFrame, making a loop on days with your function take for me 4.38 s ± 204 ms and now, I'm at 9.86 ms ± 185 µs
sum rows in dataframe based on different columns
Try
df.groupby(['acccount','currency'])['sum'].sum().reset_index()
sum of specific rows pandas dataframe
You could make a separate DataFrane and append it back to the original DataFrame, something like this (this code is untested):
# Filter to the desired attributes
sum_yz = df[df['attribute'].isin(['y', 'z'])]
# Set the new 'attribute' value
sum_yz['attribute'] = 'sum_yz'
# Group by and sum
sum_yz = sum_yz.groupby(['prod', 'attribute']).sum().reset_index()
# Add it the end of the data frame
df = pd.concat([df, sum_yz])
Related Topics
Getting S3 Objects' Last Modified Datetimes With Boto
Python Read File Determined by Separator \R\N
Python Flask Threaded True Not Working
How to Extract Data from Dictionary in the List
_Corrupt_Record Error When Reading a Json File into Spark
Append Dataframes Together in for Loop
Pip Install Not Working With Jenkins
How to Remove Carriage Return in a Dataframe
Testing Whether a String Has Repeated Characters
Selecting Specific Rows of CSV Based on a Column'S Value in Python
Python - Using Regex to Find Multiple Matches and Print Them Out
Most Efficient Way to Forward-Fill Nan Values in Numpy Array
Django Model Choice Option as a Multi Select Box
Best Practices for Adding .Gitignore File for Python Projects
Extract Values from Column of Dictionaries Using Pandas