Pandas Add Column to Groupby Dataframe

pandas add column to groupby dataframe

Use transform to add a column back to the orig df from a groupby aggregation, transform returns a Series with its index aligned to the orig df:

In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g

Out[123]:
c type t size
0 1 m 1 3
1 1 n 1 3
2 1 o 1 3
3 2 m 2 4
4 2 n 2 4

Adding column to pandas dataframe using group name in function when iterating through groupby

Use lambda function:

df['ycalc'] = df.groupby(['a','b'])['x'].transform(lambda x: func(x, p[x.name]))

Pandas DataFrame adding column after groupby

You're using pd.groupby on the wrong colums.

Your question suggests that "country" and "account" are the same for all "sku". In this case you should use:

df.groupby(['sku', 'country', 'account'], as_index=False).quantity.sum()
Out []:
sku country account quantity
0 CB-BB-AMB12-CA usa hch 2
1 CB-BB-CLR12-CA usa hch 2
2 CHG-FOOD1COMP-CA usa hch 3
3 CHG-FOOD2COMP-CA usa hch 2
4 CHG-FOODCONT1-CA usa hch 2
5 CHG-FRY-12PT5-CA usa hch 4
6 CHG-FRY-9PT5-CA usa hch 1
7 Q7-QDH0-EBB5-CA usa hch 3

Note: I removed two lines from your example where there is no "sku" nor "quantity". It these cases should be handled, just tell is in comment.

Pandas create new column with count from groupby

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

Make a new column based on group by conditionally in Python

Almost there. Change filter to transform and use a condition:

df['new_group'] = df.groupby("id")["group"] \
.transform(lambda x: 'two' if (x.nunique() == 2) else x)
print(df)

# Output:
id group new_group
0 x1 A two
1 x1 B two
2 x2 A A
3 x2 A A
4 x3 B B

Pandas add column to df after group_by and value_counts

Alternatively join counts on group and color:

counts = df.groupby('group')['color'].value_counts(normalize=True)
df = df.join(counts.rename('freq'), on=['group', 'color'])
   group  color      freq
0 A red 0.400000
1 A red 0.400000
2 A green 0.400000
3 A blue 0.200000
4 A green 0.400000
5 B red 0.750000
6 B red 0.750000
7 B red 0.750000
8 B green 0.250000
9 C blue 0.333333
10 C green 0.333333
11 C red 0.333333

Or calculate normalized value counts manually with counting group + color counts vs group counts via groupby transform:

df['freq'] = (
df.groupby(['group', 'color'])['color'].transform('count') /
df.groupby('group')['group'].transform('count')
)
   group  color      freq
0 A red 0.400000
1 A red 0.400000
2 A green 0.400000
3 A blue 0.200000
4 A green 0.400000
5 B red 0.750000
6 B red 0.750000
7 B red 0.750000
8 B green 0.250000
9 C blue 0.333333
10 C green 0.333333
11 C red 0.333333

Pandas - Add Column Name to Results of groupby

Method 1:

use the argument as_index = False in your groupby:

df2 = df.groupby(['timeIndex'], as_index=False)['isZero'].sum()

>>> df2
timeIndex isZero
0 1 1
1 2 0

>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64

Method 2:

You can use to_frame with your desired column name and then reset_index:

df2 = df.groupby(['timeIndex'])['isZero'].sum().to_frame('isZero').reset_index()

>>> df2
timeIndex isZero
0 1 1
1 2 0

>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64

Add column with previous values by group

use shift

df2['PreviousValues'] = df2['FN'].shift()

output:


Date FN AuM PreviousValues
0 01012021 A 10 NaN
1 01012021 B 20 A
2 02012021 A 12 B
3 02012021 B 23 A


Related Topics



Leave a reply



Submit