Pandas Sum by Groupby, But Exclude Certain Columns

Pandas sum by groupby, but exclude certain columns

You can select the columns of a groupby:

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()
Out[11]:
                       Y1961  Y1962  Y1963
Country     Item_Code
Afghanistan 15            10     20     30
            25            10     20     30
Angola      15            30     40     50
            25            30     40     50

Note that the list passed must be a subset of the columns otherwise you'll see a KeyError.

pandas groupby excluding when a column takes some value

Use:

print (df)
   ID Company  Cost
0   1      Us     2
1   1    Them     1
2   1    Them     1
3   2      Us     1
4   2    Them     2
5   2    Them     1
6   3      Us     1 <- added new row for see difference

If need filter first and not matched groups (if exist) are not important use:

df1 = df[df.Company!="Us"].groupby('ID', as_index=False).Cost.sum()
print (df1)
   ID  Cost
0   1     2
1   2     3

df1 = df.query('Company!="Us"').groupby('ID', as_index=False).Cost.sum()
print (df1)
   ID  Cost
0   1     2
1   2     3

If need all groups ID with Cost=0 for Us first set Cost to 0 and then aggregate:

df2 = (df.assign(Cost = df.Cost.where(df.Company!="Us", 0))
         .groupby('ID', as_index=False).Cost
         .sum())
print (df2)
   ID  Cost
0   1     2
1   2     3
2   3     0

Groupby in pandas by including the columns which are in group by condition

you need to specify the arg as_index=False

df.groupby(['Country', 'Item_Code'],as_index=False)[["Y1961", "Y1962", "Y1963"]].sum()

       Country  Item_Code  Y1961  Y1962  Y1963
0  Afghanistan         15     10     20     30
1  Afghanistan         25     10     20     30
2       Angola         15     30     40     50
3       Angola         25     30     40     50

df.columns

Index(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit', 'Y1961',
       'Y1962', 'Y1963'],
      dtype='object')

you could also do

df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum().reset_index()

How to ignore specific column in dataframe when doing an aggregation

Yes, indeed you can use first for the name column:

df.groupby('car_id').agg({'name':'first',
                          'aa':'sum',
                          'bb':'sum',
                          'cc':'sum'})

Output:

          name     aa     bb   cc
car_id                           
100     buicks  0.001  0.004  0.0
101      chevy  0.002  0.000  0.0
102       olds  0.003  0.006  0.0
103     nissan  0.000  0.140  0.1

Pandas Groupby and Sum Only One Column

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

Exclude date column from groupby dataframe with sum function on it

Use aggregation : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.aggregate.html

In : df = pd.DataFrame([[1, 2, 3],
                        [4, 5, 6],
                        [1, 5, 7]],
                        columns=['A', 'B', 'C'])                   

In : df
Out: 
   A  B  C
0  1  2  3
1  4  5  6
2  1  5  7

In : df.groupby('A').agg({'B':np.sum, 'C':'first'})
Out: 
   B  C
A      
1  7  3
4  5  6

Hence you can decide which operation to use on each column. You just have to say what you want for the 'date' column (first might be ok).

Find the sum of a column by grouping two columns

Given the response you have back to @berkayln, I think you want to project that column back to your original dataframe...
Does this suit your need ?

df['sumPerYearLengthGroupPortOfLanding']=df.groupby(['Year','Length Group','Port of Landing'])['Value(£)'].transform(lambda x: x.sum())

Python for sum operation by groupby, but exclude the non-numeric data

I think you need to_numeric with parameter errors='coerce' for convert non numeric to NaNs, then groupby + sum omit this rows:

df = (pd.to_numeric(df['#Line_Changed'], errors='coerce')
       .groupby(df['filename'])
       .sum()
       .to_frame()
       .add_prefix('SUM ')
       .reset_index())

print (df)
               filename  SUM #Line_Changed
0  analyze/dir_list.txt               20.0
1  metrics/metrics1.csv               22.0
2  metrics/metrics2.csv               19.0

Or assign to new column which is used for groupby:

df['SUM #Line_Changed'] = pd.to_numeric(df['#Line_Changed'], errors='coerce')
df = df.groupby('filename', as_index=False)['SUM #Line_Changed'].sum()

print (df)
               filename  SUM #Line_Changed
0  analyze/dir_list.txt               20.0
1  metrics/metrics1.csv               22.0
2  metrics/metrics2.csv               19.0

Detail:

df['SUM #Line_Changed'] = pd.to_numeric(df['#Line_Changed'], errors='coerce')
print (df)
   id              filename #Line_Changed  SUM #Line_Changed
0   1  analyze/dir_list.txt            16               16.0
1   2  metrics/metrics1.csv            11               11.0
2   3  metrics/metrics2.csv            15               15.0
3   4  analyze/dir_list.txt            =>                NaN
4   5  metrics/metrics1.csv            11               11.0
5   6  metrics/metrics2.csv           bin                NaN
6   7  metrics/metrics2.csv             4                4.0
7   8  analyze/dir_list.txt             4                4.0

EDIT:

If want drop non numeric rows from original DataFrame:

df['#Line_Changed'] = pd.to_numeric(df['#Line_Changed'], errors='coerce')
df = df.dropna(subset=['#Line_Changed'])
print (df)
   id              filename  #Line_Changed
0   1  analyze/dir_list.txt           16.0
1   2  metrics/metrics1.csv           11.0
2   3  metrics/metrics2.csv           15.0
4   5  metrics/metrics1.csv           11.0
6   7  metrics/metrics2.csv            4.0
7   8  analyze/dir_list.txt            4.0

Pandas Sum by Groupby, But Exclude Certain Columns