pandas add column to groupby dataframe
Use transform
to add a column back to the orig df from a groupby
aggregation, transform
returns a Series
with its index aligned to the orig df:
In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g
Out[123]:
c type t size
0 1 m 1 3
1 1 n 1 3
2 1 o 1 3
3 2 m 2 4
4 2 n 2 4
Adding column to pandas dataframe using group name in function when iterating through groupby
Use lambda function:
df['ycalc'] = df.groupby(['a','b'])['x'].transform(lambda x: func(x, p[x.name]))
Pandas DataFrame adding column after groupby
You're using pd.groupby
on the wrong colums.
Your question suggests that "country" and "account" are the same for all "sku". In this case you should use:
df.groupby(['sku', 'country', 'account'], as_index=False).quantity.sum()
Out []:
sku country account quantity
0 CB-BB-AMB12-CA usa hch 2
1 CB-BB-CLR12-CA usa hch 2
2 CHG-FOOD1COMP-CA usa hch 3
3 CHG-FOOD2COMP-CA usa hch 2
4 CHG-FOODCONT1-CA usa hch 2
5 CHG-FRY-12PT5-CA usa hch 4
6 CHG-FRY-9PT5-CA usa hch 1
7 Q7-QDH0-EBB5-CA usa hch 3
Note: I removed two lines from your example where there is no "sku" nor "quantity". It these cases should be handled, just tell is in comment.
Pandas create new column with count from groupby
That's not a new column, that's a new DataFrame:
In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2
To get the result you want is to use reset_index
:
In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2
To get a "new column" you could use transform:
In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64
I recommend reading the split-apply-combine section of the docs.
Make a new column based on group by conditionally in Python
Almost there. Change filter
to transform
and use a condition:
df['new_group'] = df.groupby("id")["group"] \
.transform(lambda x: 'two' if (x.nunique() == 2) else x)
print(df)
# Output:
id group new_group
0 x1 A two
1 x1 B two
2 x2 A A
3 x2 A A
4 x3 B B
Pandas add column to df after group_by and value_counts
Alternatively join
counts on group
and color
:
counts = df.groupby('group')['color'].value_counts(normalize=True)
df = df.join(counts.rename('freq'), on=['group', 'color'])
group color freq
0 A red 0.400000
1 A red 0.400000
2 A green 0.400000
3 A blue 0.200000
4 A green 0.400000
5 B red 0.750000
6 B red 0.750000
7 B red 0.750000
8 B green 0.250000
9 C blue 0.333333
10 C green 0.333333
11 C red 0.333333
Or calculate normalized value counts manually with counting group
+ color
counts vs group
counts via groupby transform
:
df['freq'] = (
df.groupby(['group', 'color'])['color'].transform('count') /
df.groupby('group')['group'].transform('count')
)
group color freq
0 A red 0.400000
1 A red 0.400000
2 A green 0.400000
3 A blue 0.200000
4 A green 0.400000
5 B red 0.750000
6 B red 0.750000
7 B red 0.750000
8 B green 0.250000
9 C blue 0.333333
10 C green 0.333333
11 C red 0.333333
Pandas - Add Column Name to Results of groupby
Method 1:
use the argument as_index = False
in your groupby
:
df2 = df.groupby(['timeIndex'], as_index=False)['isZero'].sum()
>>> df2
timeIndex isZero
0 1 1
1 2 0
>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64
Method 2:
You can use to_frame
with your desired column name and then reset_index
:
df2 = df.groupby(['timeIndex'])['isZero'].sum().to_frame('isZero').reset_index()
>>> df2
timeIndex isZero
0 1 1
1 2 0
>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64
Add column with previous values by group
use shift
df2['PreviousValues'] = df2['FN'].shift()
output:
Date FN AuM PreviousValues
0 01012021 A 10 NaN
1 01012021 B 20 A
2 02012021 A 12 B
3 02012021 B 23 A
Related Topics
How to Log While Using Multiprocessing in Python
How to Replace Text in a String Column of a Pandas Dataframe
Authenticate from Linux to Windows SQL Server with Pyodbc
Differencebetween Installing a Package Using Pip VS. Apt-Get
Dictionaries and Default Values
Regular Expression Matching a Multiline Block of Text
What's the How to Install Pip, Virtualenv, and Distribute for Python
Alternative to Dict Comprehension Prior to Python 2.7
What Can You Use Generator Functions For
Plotting Dates on the X-Axis with Python's Matplotlib
List Comprehension with If Statement
How to Get Different Colored Lines for Different Plots in a Single Figure