Add a New Column of the Sum by Group

Groupby & Sum - Create new column with added If Condition

We can use Series.where to replace the values that don't match the condition with NaN, then just groupby transform 'sum' since NaN values are ignored by 'sum' by default:

df['Overspend Total'] = (
df['Variance'].where(df['Variance'] > 0).groupby(df['ID']).transform('sum')
)

Or explicitly replace with the additive identity (0) which will not affect the sum:

df['Overspend Total'] = (
df['Variance'].where(df['Variance'] > 0, 0)
.groupby(df['ID']).transform('sum')
)

Or with a lambda inside groupby transform:

df['Overspend Total'] = df.groupby('ID')['Variance'].transform(
lambda s: s[s > 0].sum()
)

In any case df is:

    ID      Start        End  Variance  Overspend Total
0 1 100000.00 120000.00 20000.00 20000.0
1 1 1.00 0.00 -1.00 20000.0
2 1 7815.58 7815.58 0.00 20000.0
3 1 5261.00 5261.00 0.00 20000.0
4 1 138783.20 89969.37 -48813.83 20000.0
5 1 2459.92 2459.92 0.00 20000.0
6 2 101421.99 93387.45 -8034.54 3000.0
7 2 940.04 940.04 0.00 3000.0
8 2 63.06 63.06 0.00 3000.0
9 2 2454.86 2454.86 0.00 3000.0
10 2 830.00 830.00 0.00 3000.0
11 2 299.00 299.00 0.00 3000.0
12 2 14000.00 12000.00 2000.00 3000.0
13 2 1500.00 500.00 1000.00 3000.0

Power Query Sum of column by group as new column

Here are two other options. The examples assume your source table is named Table1. Here's how mine looks at its source in Excel:

Sample Image

Note it does not have a Group Sum column. The query will derive that.

Option 1.

Click Add Column then Custom Column and fill out the screen like this and click OK:

Sample Image

You should see a table like this:

Sample Image

Then just click the table in the first row of the Custom column and you should get a table that looks like this:

Sample Image

Then you can merge this new table with the original source table (Table1). Click Home > Merge Queries and fill out the information for the merge like this and click OK. (Note that the same query "Table1" is being merged to itself at this point, and only the Group column is selected for each entry.)

Sample Image

You should see a table like this:

Sample Image

Then, in the formula bar above that table, where you see = Table.NestedJoin(Custom, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter), change the first instance of Custom to Source, so the line reads = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter) instead.

That is, change it from:

Sample Image

To:

Sample Image

Then expand the new Custom column by clicking the Sample Image button, only selecting the Group Sum column, clearing the checkbox beside "Use original column name as prefix," and clicking OK:

Sample Image

You should get this result:

Sample Image

Here's the M code:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Table.Group(Source, {"Group"}, {{"Group Sum", each List.Sum([Number]), type nullable number}})),
Custom = #"Added Custom"{0}[Custom],
#"Merged Queries" = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter),
#"Expanded Custom" = Table.ExpandTableColumn(#"Merged Queries", "Custom", {"Group Sum"}, {"Group Sum"})
in
#"Expanded Custom"

(You can replace Table1, Source and Changed Type with Tablelle1, Quelle, and #"Geänderter Typ", respectively throughout the code above to align with Max's language.)

Option 2.

Click Transform then Group By and fill out the screen like this and click OK:

Sample Image

Then expand the AllData column with only the Gender and Number columns selected like this:

Sample Image

The result:

Sample Image

Here's the M code:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Group"}, {{"AllData", each _, type table [Group=text, Gender=text, Number=number]}, {"Group Sum", each List.Sum([Number]), type number}}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Gender", "Number"}, {"Gender", "Number"})
in
#"Expanded AllData"

Add a new column of the sum by group

I agree with @mnel at least on his first point. I didn't see ave demonstrated in the answers he cited and I think it's the "simplest" base-R method. Using that data.frame(cbind( ...)) construction should be outlawed and teachers who demonstrate it should be stripped of their credentials.

set.seed(123)
df<-data.frame(y=sample( c("A","B","C"), 10, T),
X=sample(c (1,2,3), 10, T))
df<-df[order(df$y),] # that step is not necessary for success.
df

df$sum <- ave(df$X, df$y, FUN=sum)
df
y X sum
1 A 3 6
6 A 3 6
3 B 3 8
7 B 1 8
9 B 1 8
10 B 3 8
2 C 2 6
4 C 2 6
5 C 1 6
8 C 1 6

How to add a new column that summarize rows

I think this will work:

Select Year(Modifieddate) AS Year,
coalesce(convert(varchar(255), month(modifieddate)), 'Grand Total') as Month,
Sum(linetotal) as Sum_price,
sum(sum(linetotal)) over (partition by Year(Modifieddate)
order by coalesce(month(modifieddate), 100)
) as ytd_sum_price
from Sales.SalesOrderDetail
Group by rollup( Year(Modifieddate), month(modifieddate))

The coalesce() in the order by is to put the summary row last for the cumulative sum.

Sum multiple variables by group and create new column with their sum

You can use mutate after summarize:

data %>% 
group_by(group) %>%
summarise_all(sum) %>%
mutate(tt1 = n1 + n2)

# A tibble: 3 x 4
# group n1 n2 tt1
# <fctr> <int> <int> <int>
#1 a 3 5 8
#2 b 3 4 7
#3 c 9 11 20

If need to sum all numeric columns, you can use rowSums with select_if (to select numeric columns) to sum columns up:

data %>% 
group_by(group) %>%
summarise_all(sum) %>%
mutate(tt1 = rowSums(select_if(., is.numeric)))

# A tibble: 3 x 4
# group n1 n2 tt1
# <fctr> <int> <int> <dbl>
#1 a 3 5 8
#2 b 3 4 7
#3 c 9 11 20

How to assign group by sum results to new columns in Pandas

We do pivot here I am using crosstab then merge

s=pd.crosstab(df.SKU,df.Calendar.dt.year,df.Quantity,aggfunc='sum').fillna(0).add_prefix('Year_Quantity_').reset_index()
df=df.merge(s,how='left')
Calendar SKU Quantity Year_Quantity_2017 Year_Quantity_2018
0 2017-10-01 1001 10 50.0 160.0
1 2017-10-01 1002 20 70.0 80.0
2 2017-10-01 1003 30 90.0 0.0
3 2017-11-01 1001 40 50.0 160.0
4 2017-11-01 1002 50 70.0 80.0
5 2017-11-01 1003 60 90.0 0.0
6 2018-11-01 1001 70 50.0 160.0
7 2018-11-01 1002 80 70.0 80.0
8 2018-03-01 1001 90 50.0 160.0

Groupby multiple columns & Sum - Create new column with added If Condition

Cause of error

  • The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
  • Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

Group by Sum as new column name

I think you need remove parameter as_index=False and use Series.reset_index, because this parameter return df and then DataFrame.reset_index with parameter name failed:

df = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')

Or rename column first:

d = {'Amount':'Total Amount'}
df = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()

Sample:

df = pd.DataFrame({'Id':[1,2,2],'Amount':[10, 30,50]})
print (df)
Amount Id
0 10 1
1 30 2
2 50 2

df1 = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')
print (df1)
Id Total Amount
0 1 10
1 2 80

d = {'Amount':'Total Amount'}
df1 = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()
print (df1)
Id Total Amount
0 1 10
1 2 80

But if need new column with sum in original df use transform and assign output to new column:

df['Total Amount'] = df.groupby('Id', sort=False)["Amount"].transform('sum')
print (df)
Amount Id Total Amount
0 10 1 10
1 30 2 80
2 50 2 80


Related Topics



Leave a reply



Submit