Pandas Groupby Mean - into a Dataframe

Pandas groupby mean - into a dataframe?

If you call .reset_index() on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column):

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index()

EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, pandas is smart and will fill in the values properly for you. Try this:

cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale']
data = [
['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0],
['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0],
['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0],
['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0],
['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0],
['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0],
['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0],
['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0],
['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0]
]
df = pd.DataFrame(data, columns=cols)

mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index!
df = df.set_index(['name', 'id', 'dept']) # make the same index here
df['mean_col'] = mean_col
df = df.reset_index() # to take the hierarchical index off again

group by in group by and average

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
.groupby('cluster')['time'].mean())
Out[59]:
cluster
1 15
2 54
3 6
Name: time, dtype: int64

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean()
Out[58]:
time
cluster
1 12.333333
2 54.000000
3 6.000000

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
time
cluster org
1 a 438886
c 23
2 d 9874
h 34
3 w 6

Pandas DataFrame groupby.mean() including string columns

You can use a custom aggregation function:

dct = {
'p1': 'mean',
'p2': 'mean',
'p3': 'mean',
'p4': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}
agg = df.groupby(['ID','ID2']).agg(**{k: (k, v) for k, v in dct.items()})

Or, by type:

dct = {
'number': 'mean',
'object': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}

groupby_cols = ['ID','ID2']
dct = {k: v for i in [{col: agg for col in df.select_dtypes(tp).columns.difference(groupby_cols)} for tp, agg in dct.items()] for k, v in i.items()}
agg = df.groupby(groupby_cols).agg(**{k: (k, v) for k, v in dct.items()})

Output for both:

>>> agg
p1 p2 p3 p4
ID ID2
1 A 1.333333 1.333333 1.333333 A
2 B 34.000000 34.000000 34.000000 B
3 C 4.000000 4.250000 5.000000 NaN

Python Dataframe Groupby Mean and STD

Try to remove () from the np. functions:

xdf = df.groupby("a").agg([np.mean, np.std])
print(xdf)

Prints:

          b         c              d     
mean std mean std mean std
a
Apple 3 0.0 4.5 0.707107 7 0.0
Banana 4 NaN 4.0 NaN 8 NaN
Cherry 7 NaN 1.0 NaN 3 NaN

EDIT: To "flatten" column multi-index:

xdf = df.groupby("a").agg([np.mean, np.std])
xdf.columns = xdf.columns.map("_".join)
print(xdf)

Prints:

        b_mean  b_std  c_mean     c_std  d_mean  d_std
a
Apple 3 0.0 4.5 0.707107 7 0.0
Banana 4 NaN 4.0 NaN 8 NaN
Cherry 7 NaN 1.0 NaN 3 NaN

pandas group by mean and add back into dataframe on another index

You can use:

df['affect'] = df['affect'].bfill().ffill()

Converting a Pandas GroupBy output from Series to DataFrame

g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1)
Out[19]: pandas.core.frame.DataFrame

In [20]: g1.index
Out[20]:
MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index()
Out[21]:
Name City City_Count Name_Count
0 Alice Seattle 1 1
1 Bob Seattle 2 2
2 Mallory Portland 2 2
3 Mallory Seattle 1 1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
Out[36]:
Name City count
0 Alice Seattle 1
1 Bob Seattle 2
2 Mallory Portland 2
3 Mallory Seattle 1

Pandas Groupby: Count and mean combined

You can use groupby with aggregate:

df = df.groupby('source') \
.agg({'text':'size', 'sent':'mean'}) \
.rename(columns={'text':'count','sent':'mean_sent'}) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500

Transform pandas groupby / aggregate result to dataframe

assuming you have the following Pandas.Series:

In [227]: result
Out[227]:
Exporter Importer sitc4
Afghanistan World 11 59.0
12 892.0
113 19.0
Austria World 11 41.0
113 8.0
118 4.0
Name: val, dtype: float64

you can pivot it as follows:

In [228]: (result.reset_index(name='Value')
...: .pivot_table(index='Exporter', columns='sitc4', values='Value',
...: aggfunc='sum', fill_value=0)
...: )
...:
Out[228]:
sitc4 11 12 113 118
Exporter
Afghanistan 59 892 19 0
Austria 41 0 8 4

Calculate the mean on a Groupby Object in Pandas after applying .nsmallest(2)

I think you need pass mean into apply method after nsmallest:

x = grupper['FINISH'].apply(lambda x: x.nsmallest(2).mean())

In your solution should working also:

x = grupper.apply(lambda x: x.nsmallest(2, 'FINISH').mean())

Pandas groupby mean issue

The groupby mean aggregation will exclude NaN values but include zeros. So you need to replace by 0 or keep the NaN depending on the result you're after.

This will set all the - and NaN values to 0:

cols = ['R1', 'R2', 'R3', 'R4']

for col in cols:
df[col] = np.where((df[col]=='-') | (df[col].isnull()==True), 0, df[col])
df[col] = pd.to_numeric(df[col])

df.groupby('event').mean()

If you want NaN instead of 0 simply replace the 0 in np.where() with np.NaN.



Related Topics



Leave a reply



Submit