Pandas: Sum Dataframe Rows for Given Columns

Pandas: sum DataFrame rows for given columns

You can just sum and set param axis=1 to sum the rows, this will ignore none numeric columns:

In [91]:

df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4], 'c':['dd','ee','ff'], 'd':[5,9,1]})
df['e'] = df.sum(axis=1)
df
Out[91]:
   a  b   c  d   e
0  1  2  dd  5   8
1  2  3  ee  9  14
2  3  4  ff  1   8

If you want to just sum specific columns then you can create a list of the columns and remove the ones you are not interested in:

In [98]:

col_list= list(df)
col_list.remove('d')
col_list
Out[98]:
['a', 'b', 'c']
In [99]:

df['e'] = df[col_list].sum(axis=1)
df
Out[99]:
   a  b   c  d  e
0  1  2  dd  5  3
1  2  3  ee  9  5
2  3  4  ff  1  7

Sum rows based on columns inside pandas dataframe

You can replace values of tuple by first value of tuple in Series.mask and then aggregate sum:

tup = (1, 2)

df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0])
#alternative
#df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin'])
df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum()
print (df)
    idrun  idbasin  time    q
0 -192541        1     0  0.0
1 -192541        1     1  1.5
2 -192541        3     0  0.0
3 -192541        3     1  1.0
4 -192540        1     0  0.0
5 -192540        1     1  1.5
6 -192540        3     0  0.0
7 -192540        3     1  1.0

Summing values of a pandas data frame given a list of columns

You can use subset of df and sum:

print df
   x1  x2  x3  x4  x5  x6
0   1   2   3   4   5   6
1   3   4   5   6   3   3
2   1   2   3   6   1   2

print df[['x1', 'x3', 'x4']]
   x1  x3  x4
0   1   3   4
1   3   5   6
2   1   3   6

li =  ['x1', 'x3', 'x4']
print df[li]
   x1  x3  x4
0   1   3   4
1   3   5   6
2   1   3   6

print df[li].sum()
x1     5
x3    11
x4    16
dtype: int64

print df[li].sum(axis=1)
0     8
1    14
2    10
dtype: int64

Sum Rows at Bottom of Pandas Dataframe

You can sum the columns of interest only:

## recreate your data
df = pd.DataFrame({'name':['joe','jane'],'age':[25,55],'sales':[100,40],'commissions':[10,4]})

df.loc['Total'] = df[['sales','commissions']].sum()

Result:

>>> df
       name   age  sales  commissions
0       joe  25.0  100.0         10.0
1      jane  55.0   40.0          4.0
Total   NaN   NaN  140.0         14.0

If you don't want the NaN to appear, you can replace them with an empty string: df = df.fillna('')

Result:

>>> df
       name   age  sales  commissions
0       joe  25.0  100.0         10.0
1      jane  55.0   40.0          4.0
Total              140.0         14.0

For each day get the sum of all rows in a very large Pandas DataFrame which match in two specific columns

Similarly as previous response with using groupby and agg but make the sum on unique key combination:

result = my_df.groupby(['day', my_df.pair.apply(set).apply(tuple)])[['amount']].agg('sum').reset_index()

With a random 5000 length DataFrame, making a loop on days with your function take for me 4.38 s ± 204 ms and now, I'm at 9.86 ms ± 185 µs

sum rows in dataframe based on different columns

Try

df.groupby(['acccount','currency'])['sum'].sum().reset_index()

sum of specific rows pandas dataframe

You could make a separate DataFrane and append it back to the original DataFrame, something like this (this code is untested):

# Filter to the desired attributes
sum_yz = df[df['attribute'].isin(['y', 'z'])]
# Set the new 'attribute' value
sum_yz['attribute'] = 'sum_yz'
# Group by and sum
sum_yz = sum_yz.groupby(['prod', 'attribute']).sum().reset_index()

# Add it the end of the data frame
df = pd.concat([df, sum_yz])

Pandas: Sum Dataframe Rows for Given Columns