Pandas groupby mean - into a dataframe?
If you call .reset_index()
on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column):
df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index()
EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, pandas
is smart and will fill in the values properly for you. Try this:
cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale']
data = [
['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0],
['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0],
['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0],
['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0],
['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0],
['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0],
['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0],
['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0],
['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0]
]
df = pd.DataFrame(data, columns=cols)
mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index!
df = df.set_index(['name', 'id', 'dept']) # make the same index here
df['mean_col'] = mean_col
df = df.reset_index() # to take the hierarchical index off again
group by in group by and average
If you want to first take mean on the combination of ['cluster', 'org']
and then take mean on cluster
groups, you can use:
In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
.groupby('cluster')['time'].mean())
Out[59]:
cluster
1 15
2 54
3 6
Name: time, dtype: int64
If you want the mean of cluster
groups only, then you can use:
In [58]: df.groupby(['cluster']).mean()
Out[58]:
time
cluster
1 12.333333
2 54.000000
3 6.000000
You can also use groupby
on ['cluster', 'org']
and then use mean()
:
In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
time
cluster org
1 a 438886
c 23
2 d 9874
h 34
3 w 6
Pandas DataFrame groupby.mean() including string columns
You can use a custom aggregation function:
dct = {
'p1': 'mean',
'p2': 'mean',
'p3': 'mean',
'p4': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}
agg = df.groupby(['ID','ID2']).agg(**{k: (k, v) for k, v in dct.items()})
Or, by type:
dct = {
'number': 'mean',
'object': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}
groupby_cols = ['ID','ID2']
dct = {k: v for i in [{col: agg for col in df.select_dtypes(tp).columns.difference(groupby_cols)} for tp, agg in dct.items()] for k, v in i.items()}
agg = df.groupby(groupby_cols).agg(**{k: (k, v) for k, v in dct.items()})
Output for both:
>>> agg
p1 p2 p3 p4
ID ID2
1 A 1.333333 1.333333 1.333333 A
2 B 34.000000 34.000000 34.000000 B
3 C 4.000000 4.250000 5.000000 NaN
Python Dataframe Groupby Mean and STD
Try to remove ()
from the np.
functions:
xdf = df.groupby("a").agg([np.mean, np.std])
print(xdf)
Prints:
b c d
mean std mean std mean std
a
Apple 3 0.0 4.5 0.707107 7 0.0
Banana 4 NaN 4.0 NaN 8 NaN
Cherry 7 NaN 1.0 NaN 3 NaN
EDIT: To "flatten" column multi-index:
xdf = df.groupby("a").agg([np.mean, np.std])
xdf.columns = xdf.columns.map("_".join)
print(xdf)
Prints:
b_mean b_std c_mean c_std d_mean d_std
a
Apple 3 0.0 4.5 0.707107 7 0.0
Banana 4 NaN 4.0 NaN 8 NaN
Cherry 7 NaN 1.0 NaN 3 NaN
pandas group by mean and add back into dataframe on another index
You can use:
df['affect'] = df['affect'].bfill().ffill()
Converting a Pandas GroupBy output from Series to DataFrame
g1
here is a DataFrame. It has a hierarchical index, though:
In [19]: type(g1)
Out[19]: pandas.core.frame.DataFrame
In [20]: g1.index
Out[20]:
MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
('Mallory', 'Seattle')], dtype=object)
Perhaps you want something like this?
In [21]: g1.add_suffix('_Count').reset_index()
Out[21]:
Name City City_Count Name_Count
0 Alice Seattle 1 1
1 Bob Seattle 2 2
2 Mallory Portland 2 2
3 Mallory Seattle 1 1
Or something like:
In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
Out[36]:
Name City count
0 Alice Seattle 1
1 Bob Seattle 2
2 Mallory Portland 2
3 Mallory Seattle 1
Pandas Groupby: Count and mean combined
You can use groupby
with aggregate
:
df = df.groupby('source') \
.agg({'text':'size', 'sent':'mean'}) \
.rename(columns={'text':'count','sent':'mean_sent'}) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
Transform pandas groupby / aggregate result to dataframe
assuming you have the following Pandas.Series:
In [227]: result
Out[227]:
Exporter Importer sitc4
Afghanistan World 11 59.0
12 892.0
113 19.0
Austria World 11 41.0
113 8.0
118 4.0
Name: val, dtype: float64
you can pivot it as follows:
In [228]: (result.reset_index(name='Value')
...: .pivot_table(index='Exporter', columns='sitc4', values='Value',
...: aggfunc='sum', fill_value=0)
...: )
...:
Out[228]:
sitc4 11 12 113 118
Exporter
Afghanistan 59 892 19 0
Austria 41 0 8 4
Calculate the mean on a Groupby Object in Pandas after applying .nsmallest(2)
I think you need pass mean
into apply
method after nsmallest
:
x = grupper['FINISH'].apply(lambda x: x.nsmallest(2).mean())
In your solution should working also:
x = grupper.apply(lambda x: x.nsmallest(2, 'FINISH').mean())
Pandas groupby mean issue
The groupby
mean aggregation will exclude NaN
values but include zeros. So you need to replace by 0
or keep the NaN
depending on the result you're after.
This will set all the -
and NaN
values to 0
:
cols = ['R1', 'R2', 'R3', 'R4']
for col in cols:
df[col] = np.where((df[col]=='-') | (df[col].isnull()==True), 0, df[col])
df[col] = pd.to_numeric(df[col])
df.groupby('event').mean()
If you want NaN
instead of 0
simply replace the 0
in np.where()
with np.NaN
.
Related Topics
Boto3 Client Noregionerror: You Must Specify a Region Error Only Sometimes
How to Use Subprocess Popen Python
Importerror: Libcblas.So.3: Cannot Open Shared Object File: No Such File or Directory
Creating Same Random Number Sequence in Python, Numpy and R
Computing Cross-Correlation Function
Loading .Rdata Files into Python
Which Key/Value Store Is the Most Promising/Stable
Financial Charts/Graphs in Ruby or Python
How to Write to a CSV Line by Line
Matplotlib: Format Axis Offset-Values to Whole Numbers or Specific Number
How to Write Output in Same Place on the Console
Best Way to Join/Merge by Range in Pandas
How to Include Third Party Python Libraries in Google App Engine