How to Create a New Column from the Output of Pandas Groupby().Sum()

Groupby & Sum - Create new column with added If Condition

We can use Series.where to replace the values that don't match the condition with NaN, then just groupby transform 'sum' since NaN values are ignored by 'sum' by default:

df['Overspend Total'] = (
df['Variance'].where(df['Variance'] > 0).groupby(df['ID']).transform('sum')
)

Or explicitly replace with the additive identity (0) which will not affect the sum:

df['Overspend Total'] = (
df['Variance'].where(df['Variance'] > 0, 0)
.groupby(df['ID']).transform('sum')
)

Or with a lambda inside groupby transform:

df['Overspend Total'] = df.groupby('ID')['Variance'].transform(
lambda s: s[s > 0].sum()
)

In any case df is:

    ID      Start        End  Variance  Overspend Total
0 1 100000.00 120000.00 20000.00 20000.0
1 1 1.00 0.00 -1.00 20000.0
2 1 7815.58 7815.58 0.00 20000.0
3 1 5261.00 5261.00 0.00 20000.0
4 1 138783.20 89969.37 -48813.83 20000.0
5 1 2459.92 2459.92 0.00 20000.0
6 2 101421.99 93387.45 -8034.54 3000.0
7 2 940.04 940.04 0.00 3000.0
8 2 63.06 63.06 0.00 3000.0
9 2 2454.86 2454.86 0.00 3000.0
10 2 830.00 830.00 0.00 3000.0
11 2 299.00 299.00 0.00 3000.0
12 2 14000.00 12000.00 2000.00 3000.0
13 2 1500.00 500.00 1000.00 3000.0

Pandas create new column with count from groupby

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

How to create a new column that increments within a subgroup of a group in Python?

You could use groupby + ngroup:

df['colC'] = df.groupby('colA').apply(lambda x: x.groupby('colB').ngroup()+1).droplevel(0)

Output:

    colA colB  colC
0 1 a 1
1 1 a 1
2 1 c 2
3 1 c 2
4 1 f 3
5 1 z 4
6 1 z 4
7 1 z 4
8 2 a 1
9 2 b 2
10 2 b 2
11 2 b 2
12 3 c 1
13 3 d 2
14 3 k 3
15 3 k 3
16 3 m 4
17 3 m 4
18 3 m 4

pandas - groupby a column, apply a function to create a new column - giving incorrect results

You can remove values:

df['num_col1_SMA'] = things_groupby['num_col1'].apply(pandas_rolling)
df['num_col2_SMA'] = things_groupby['num_col2'].apply(pandas_rolling)

Or:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.apply(pandas_rolling))

If possible without groupby.apply is necessary remove first level of MultiIndex:

df[['num_col1_SMA', 'num_col2_SMA']] = (things_groupby[['num_col1','num_col2']]
.rolling(window=N)
.mean()
.droplevel(0))

How to assign group by sum results to new columns in Pandas

We do pivot here I am using crosstab then merge

s=pd.crosstab(df.SKU,df.Calendar.dt.year,df.Quantity,aggfunc='sum').fillna(0).add_prefix('Year_Quantity_').reset_index()
df=df.merge(s,how='left')
Calendar SKU Quantity Year_Quantity_2017 Year_Quantity_2018
0 2017-10-01 1001 10 50.0 160.0
1 2017-10-01 1002 20 70.0 80.0
2 2017-10-01 1003 30 90.0 0.0
3 2017-11-01 1001 40 50.0 160.0
4 2017-11-01 1002 50 70.0 80.0
5 2017-11-01 1003 60 90.0 0.0
6 2018-11-01 1001 70 50.0 160.0
7 2018-11-01 1002 80 70.0 80.0
8 2018-03-01 1001 90 50.0 160.0

create a new column with pandas groupby division between two columns excluding the current row

Try with transform

g = df.groupby('Group')
df['New'] = (g['Col_2'].transform('sum')-df.Col_2)/(g['Col_1'].transform('sum')-df.Col_1)
df
Out[339]:
Group Col_1 Col_2 New
0 A 100 55 0.286000
1 A 200 66 0.330000
2 A 300 77 0.403333
3 B 400 88 0.198000
4 B 500 99 0.220000

Pandas create new column base on groupby and apply lambda if statement

Use GroupBy.transformwith lambda, function, then compare and for convert True/False to 1/0 convert to integers:

from scipy import stats

s = df.groupby('A')['B'].transform(lambda x: np.abs(stats.zscore(x, nan_policy='omit')))
df['C'] = (s > 2).astype(int)

Or use numpy.where:

df['C'] = np.where(s > 2, 1, 0)

Error in your solution is per groups:

from scipy import stats

df = df.groupby('A')['B'].apply(lambda x: 1 if np.abs(stats.zscore(x, nan_policy='omit')) > 2 else 0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If check gotcha in pandas docs:

pandas follows the NumPy convention of raising an error when you try to convert something to a bool. This happens in an if-statement or when using the boolean operations: and, or, and not.

So if use one of solutions instead if-else:

from scipy import stats

df = df.groupby('A')['B'].apply(lambda x: (np.abs(stats.zscore(x, nan_policy='omit')) > 2).astype(int))

print (df)
A
a [0, 0, 0]
b [0, 0, 0, 0]
Name: B, dtype: object

but then need convert to column, for avoid this problems is used groupby.transform.

How to create new column in pandas based on result of groupby without needing to use join

You can use transform():

df["max_date"] = df.groupby("name")['date'].transform('max')

Output:

         date      name    max_date
0 2020-01-01 Romulo 2020-03-01
1 2020-02-01 Romulo 2020-03-01
2 2020-03-01 Romulo 2020-03-01
3 2020-01-01 Daniel 2020-03-01
4 2020-02-01 Daniel 2020-03-01
5 2020-03-01 Daniel 2020-03-01
6 2020-01-01 Fernando 2020-03-01
7 2020-02-01 Fernando 2020-03-01
8 2020-03-01 Fernando 2020-03-01


Related Topics



Leave a reply



Submit