Pandas: Sum Dataframe Rows for Given Columns

Pandas: sum DataFrame rows for given columns

You can just sum and set param axis=1 to sum the rows, this will ignore none numeric columns:

In [91]:

df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4], 'c':['dd','ee','ff'], 'd':[5,9,1]})
df['e'] = df.sum(axis=1)
df
Out[91]:
a b c d e
0 1 2 dd 5 8
1 2 3 ee 9 14
2 3 4 ff 1 8

If you want to just sum specific columns then you can create a list of the columns and remove the ones you are not interested in:

In [98]:

col_list= list(df)
col_list.remove('d')
col_list
Out[98]:
['a', 'b', 'c']
In [99]:

df['e'] = df[col_list].sum(axis=1)
df
Out[99]:
a b c d e
0 1 2 dd 5 3
1 2 3 ee 9 5
2 3 4 ff 1 7

Sum rows based on columns inside pandas dataframe

You can replace values of tuple by first value of tuple in Series.mask and then aggregate sum:

tup = (1, 2)

df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0])
#alternative
#df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin'])
df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum()
print (df)
idrun idbasin time q
0 -192541 1 0 0.0
1 -192541 1 1 1.5
2 -192541 3 0 0.0
3 -192541 3 1 1.0
4 -192540 1 0 0.0
5 -192540 1 1 1.5
6 -192540 3 0 0.0
7 -192540 3 1 1.0

Summing values of a pandas data frame given a list of columns

You can use subset of df and sum:

print df
x1 x2 x3 x4 x5 x6
0 1 2 3 4 5 6
1 3 4 5 6 3 3
2 1 2 3 6 1 2

print df[['x1', 'x3', 'x4']]
x1 x3 x4
0 1 3 4
1 3 5 6
2 1 3 6

li = ['x1', 'x3', 'x4']
print df[li]
x1 x3 x4
0 1 3 4
1 3 5 6
2 1 3 6

print df[li].sum()
x1 5
x3 11
x4 16
dtype: int64

print df[li].sum(axis=1)
0 8
1 14
2 10
dtype: int64

Sum Rows at Bottom of Pandas Dataframe

You can sum the columns of interest only:

## recreate your data
df = pd.DataFrame({'name':['joe','jane'],'age':[25,55],'sales':[100,40],'commissions':[10,4]})

df.loc['Total'] = df[['sales','commissions']].sum()

Result:

>>> df
name age sales commissions
0 joe 25.0 100.0 10.0
1 jane 55.0 40.0 4.0
Total NaN NaN 140.0 14.0

If you don't want the NaN to appear, you can replace them with an empty string: df = df.fillna('')

Result:

>>> df
name age sales commissions
0 joe 25.0 100.0 10.0
1 jane 55.0 40.0 4.0
Total 140.0 14.0

For each day get the sum of all rows in a very large Pandas DataFrame which match in two specific columns

Similarly as previous response with using groupby and agg but make the sum on unique key combination:

result = my_df.groupby(['day', my_df.pair.apply(set).apply(tuple)])[['amount']].agg('sum').reset_index()

With a random 5000 length DataFrame, making a loop on days with your function take for me 4.38 s ± 204 ms and now, I'm at 9.86 ms ± 185 µs

sum rows in dataframe based on different columns

Try

df.groupby(['acccount','currency'])['sum'].sum().reset_index()

sum of specific rows pandas dataframe

You could make a separate DataFrane and append it back to the original DataFrame, something like this (this code is untested):

# Filter to the desired attributes
sum_yz = df[df['attribute'].isin(['y', 'z'])]
# Set the new 'attribute' value
sum_yz['attribute'] = 'sum_yz'
# Group by and sum
sum_yz = sum_yz.groupby(['prod', 'attribute']).sum().reset_index()

# Add it the end of the data frame
df = pd.concat([df, sum_yz])


Related Topics



Leave a reply



Submit