How to Create a New Column from the Output of Pandas Groupby().Sum()

Groupby & Sum - Create new column with added If Condition

We can use Series.where to replace the values that don't match the condition with NaN, then just groupby transform 'sum' since NaN values are ignored by 'sum' by default:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0).groupby(df['ID']).transform('sum')
)

Or explicitly replace with the additive identity (0) which will not affect the sum:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0, 0)
        .groupby(df['ID']).transform('sum')
)

Or with a lambda inside groupby transform:

df['Overspend Total'] = df.groupby('ID')['Variance'].transform(
    lambda s: s[s > 0].sum()
)

In any case df is:

    ID      Start        End  Variance  Overspend Total
0    1  100000.00  120000.00  20000.00          20000.0
1    1       1.00       0.00     -1.00          20000.0
2    1    7815.58    7815.58      0.00          20000.0
3    1    5261.00    5261.00      0.00          20000.0
4    1  138783.20   89969.37 -48813.83          20000.0
5    1    2459.92    2459.92      0.00          20000.0
6    2  101421.99   93387.45  -8034.54           3000.0
7    2     940.04     940.04      0.00           3000.0
8    2      63.06      63.06      0.00           3000.0
9    2    2454.86    2454.86      0.00           3000.0
10   2     830.00     830.00      0.00           3000.0
11   2     299.00     299.00      0.00           3000.0
12   2   14000.00   12000.00   2000.00           3000.0
13   2    1500.00     500.00   1000.00           3000.0

Pandas create new column with count from groupby

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

How to create a new column that increments within a subgroup of a group in Python?

You could use groupby + ngroup:

df['colC'] = df.groupby('colA').apply(lambda x: x.groupby('colB').ngroup()+1).droplevel(0)

Output:

    colA colB  colC
0      1    a     1
1      1    a     1
2      1    c     2
3      1    c     2
4      1    f     3
5      1    z     4
6      1    z     4
7      1    z     4
8      2    a     1
9      2    b     2
10     2    b     2
11     2    b     2
12     3    c     1
13     3    d     2
14     3    k     3
15     3    k     3
16     3    m     4
17     3    m     4
18     3    m     4

pandas - groupby a column, apply a function to create a new column - giving incorrect results

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
                                               .rolling(window=N)
                                               .mean()
                                               .droplevel(0))

How to assign group by sum results to new columns in Pandas

We do pivot here I am using crosstab then merge

s=pd.crosstab(df.SKU,df.Calendar.dt.year,df.Quantity,aggfunc='sum').fillna(0).add_prefix('Year_Quantity_').reset_index()
df=df.merge(s,how='left')
    Calendar   SKU  Quantity  Year_Quantity_2017  Year_Quantity_2018
0 2017-10-01  1001        10                50.0               160.0
1 2017-10-01  1002        20                70.0                80.0
2 2017-10-01  1003        30                90.0                 0.0
3 2017-11-01  1001        40                50.0               160.0
4 2017-11-01  1002        50                70.0                80.0
5 2017-11-01  1003        60                90.0                 0.0
6 2018-11-01  1001        70                50.0               160.0
7 2018-11-01  1002        80                70.0                80.0
8 2018-03-01  1001        90                50.0               160.0

create a new column with pandas groupby division between two columns excluding the current row

Try with transform

g = df.groupby('Group')
df['New'] = (g['Col_2'].transform('sum')-df.Col_2)/(g['Col_1'].transform('sum')-df.Col_1)
df
Out[339]: 
  Group  Col_1  Col_2       New
0     A    100     55  0.286000
1     A    200     66  0.330000
2     A    300     77  0.403333
3     B    400     88  0.198000
4     B    500     99  0.220000

Pandas create new column base on groupby and apply lambda if statement

Use GroupBy.transformwith lambda, function, then compare and for convert True/False to 1/0 convert to integers:

from scipy import stats

s = df.groupby('A')['B'].transform(lambda x: np.abs(stats.zscore(x, nan_policy='omit')))
df['C'] = (s > 2).astype(int)

Or use numpy.where:

df['C'] = np.where(s > 2, 1, 0)

Error in your solution is per groups:

from scipy import stats

df = df.groupby('A')['B'].apply(lambda x: 1 if np.abs(stats.zscore(x, nan_policy='omit')) > 2 else 0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If check gotcha in pandas docs:

pandas follows the NumPy convention of raising an error when you try to convert something to a bool. This happens in an if-statement or when using the boolean operations: and, or, and not.

So if use one of solutions instead if-else:

from scipy import stats

df = df.groupby('A')['B'].apply(lambda x: (np.abs(stats.zscore(x, nan_policy='omit')) > 2).astype(int))

print (df)
A
a       [0, 0, 0]
b    [0, 0, 0, 0]
Name: B, dtype: object

but then need convert to column, for avoid this problems is used groupby.transform.

How to create new column in pandas based on result of groupby without needing to use join

You can use transform():

df["max_date"] = df.groupby("name")['date'].transform('max')