Pandas Groupby Mean - into a Dataframe

Pandas groupby mean - into a dataframe?

If you call .reset_index() on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column):

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index()

EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, pandas is smart and will fill in the values properly for you. Try this:

cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale']
data = [
['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0],
['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0],
['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0],
['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0],
['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0],
['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0],
['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0],
['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0],
['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0]
]
df = pd.DataFrame(data, columns=cols)

mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index!
df = df.set_index(['name', 'id', 'dept']) # make the same index here
df['mean_col'] = mean_col
df = df.reset_index() # to take the hierarchical index off again

group by in group by and average

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()
            .groupby('cluster')['time'].mean())
Out[59]:
cluster
1          15
2          54
3           6
Name: time, dtype: int64

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean()
Out[58]:
              time
cluster
1        12.333333
2        54.000000
3         6.000000

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean()
Out[57]:
               time
cluster org
1       a    438886
        c        23
2       d      9874
        h        34
3       w         6

Pandas DataFrame groupby.mean() including string columns

You can use a custom aggregation function:

dct = {
    'p1': 'mean',
    'p2': 'mean',
    'p3': 'mean',
    'p4': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}
agg = df.groupby(['ID','ID2']).agg(**{k: (k, v) for k, v in dct.items()})

Or, by type:

dct = {
    'number': 'mean',
    'object': lambda col: col.mode() if col.nunique() == 1 else np.nan,
}

groupby_cols = ['ID','ID2']
dct = {k: v for i in [{col: agg for col in df.select_dtypes(tp).columns.difference(groupby_cols)} for tp, agg in dct.items()] for k, v in i.items()}
agg = df.groupby(groupby_cols).agg(**{k: (k, v) for k, v in dct.items()})

Output for both:

>>> agg
               p1         p2         p3   p4
ID ID2                                      
1  A     1.333333   1.333333   1.333333    A
2  B    34.000000  34.000000  34.000000    B
3  C     4.000000   4.250000   5.000000  NaN

Python Dataframe Groupby Mean and STD

Try to remove () from the np. functions:

xdf = df.groupby("a").agg([np.mean, np.std])
print(xdf)

Prints:

          b         c              d     
       mean  std mean       std mean  std
a                                        
Apple     3  0.0  4.5  0.707107    7  0.0
Banana    4  NaN  4.0       NaN    8  NaN
Cherry    7  NaN  1.0       NaN    3  NaN

EDIT: To "flatten" column multi-index:

xdf = df.groupby("a").agg([np.mean, np.std])
xdf.columns = xdf.columns.map("_".join)
print(xdf)

Prints:

        b_mean  b_std  c_mean     c_std  d_mean  d_std
a                                                     
Apple        3    0.0     4.5  0.707107       7    0.0
Banana       4    NaN     4.0       NaN       8    NaN
Cherry       7    NaN     1.0       NaN       3    NaN

pandas group by mean and add back into dataframe on another index

You can use:

df['affect'] = df['affect'].bfill().ffill()

Converting a Pandas GroupBy output from Series to DataFrame

g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1)
Out[19]: pandas.core.frame.DataFrame

In [20]: g1.index
Out[20]: 
MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
       ('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index()
Out[21]: 
      Name      City  City_Count  Name_Count
0    Alice   Seattle           1           1
1      Bob   Seattle           2           2
2  Mallory  Portland           2           2
3  Mallory   Seattle           1           1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
Out[36]: 
      Name      City  count
0    Alice   Seattle      1
1      Bob   Seattle      2
2  Mallory  Portland      2
3  Mallory   Seattle      1

Pandas Groupby: Count and mean combined

You can use groupby with aggregate:

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

Transform pandas groupby / aggregate result to dataframe

assuming you have the following Pandas.Series:

In [227]: result
Out[227]:
Exporter     Importer  sitc4
Afghanistan  World     11        59.0
                       12       892.0
                       113       19.0
Austria      World     11        41.0
                       113        8.0
                       118        4.0
Name: val, dtype: float64

you can pivot it as follows:

In [228]: (result.reset_index(name='Value')
     ...:        .pivot_table(index='Exporter', columns='sitc4', values='Value',
     ...:                     aggfunc='sum', fill_value=0)
     ...: )
     ...:
Out[228]:
sitc4        11   12   113  118
Exporter
Afghanistan   59  892   19    0
Austria       41    0    8    4

Calculate the mean on a Groupby Object in Pandas after applying .nsmallest(2)

I think you need pass mean into apply method after nsmallest:

x = grupper['FINISH'].apply(lambda x: x.nsmallest(2).mean())

In your solution should working also:

x = grupper.apply(lambda x: x.nsmallest(2, 'FINISH').mean())

Pandas groupby mean issue

The groupby mean aggregation will exclude NaN values but include zeros. So you need to replace by 0 or keep the NaN depending on the result you're after.

This will set all the - and NaN values to 0:

cols = ['R1', 'R2', 'R3', 'R4']

for col in cols:
    df[col] = np.where((df[col]=='-') | (df[col].isnull()==True), 0, df[col])
    df[col] = pd.to_numeric(df[col])

df.groupby('event').mean()

If you want NaN instead of 0 simply replace the 0 in np.where() with np.NaN.

Pandas Groupby Mean - into a Dataframe