Can Pandas Groupby Aggregate into a List, Rather Than Sum, Mean, etc

Can pandas groupby aggregate into a list, rather than sum, mean, etc?

my solution is a bit longer than you may expect, I'm sure it could be shortened, but:

g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)

# output
# rn   1   2   3   4   5   6
# A                         
# 1   10  12  11  22  20   8
# 2   10  11  10  13 NaN NaN
# 3   14  10 NaN NaN NaN NaN

A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"]))). This one group df by A and then put columns B and C into one column:

Then k = g.reset_index(), creating sequential index, result is:

    A  level_1   0
0   1        0  10
1   1        1  12
2   1        2  11
3   1        0  22
4   1        1  20
5   1        2   8
6   2        3  10
7   2        4  11
8   2        3  10
9   2        4  13
10  3        5  14
11  3        5  10

Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index:

    A  level_1   0   i
0   1        0  10   0
1   1        1  12   1
2   1        2  11   2
3   1        0  22   3
4   1        1  20   4
5   1        2   8   5
6   2        3  10   6
7   2        4  11   7
8   2        3  10   8
9   2        4  13   9
10  3        5  14  10
11  3        5  10  11

Now, k["rn"] = k1.groupby("A")["i"].rank() will add row_number inside each A (like row_number() over(partition by A order by i) in SQL:

    A  level_1   0   i  rn
0   1        0  10   0   1
1   1        1  12   1   2
2   1        2  11   2   3
3   1        0  22   3   4
4   1        1  20   4   5
5   1        2   8   5   6
6   2        3  10   6   1
7   2        4  11   7   2
8   2        3  10   8   3
9   2        4  13   9   4
10  3        5  14  10   1
11  3        5  10  11   2

And finally, just pivoting with k.pivot_table(rows="A", cols="rn", values=0):

rn   1   2   3   4   5   6
A                         
1   10  12  11  22  20   8
2   10  11  10  13 NaN NaN
3   14  10 NaN NaN NaN NaN

Multiple aggregations of the same column using pandas GroupBy.agg()

As of 2022-06-20, the below is the accepted practice for aggregations:

df.groupby('dummy').agg(
    Mean=('returns', np.mean),
    Sum=('returns', np.sum))

Below the fold included for historical versions of pandas.

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]:         
           mean       sum
dummy                    
1      0.036901  0.369012

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
Out[21]: 
        returns          
           Mean       Sum
dummy                    
1      0.036901  0.369012

pandas groupby aggregate element-wise list addition

It's possible to use apply on the grouped dataframe to get element-wise addition:

df.groupby('X')['Y'].apply(lambda x: [sum(y) for y in zip(*x)])

Which results in a pandas series object:

X
57674     [54.0, 114.0, 124.0, 103.0]
71455    [337.0, 327.0, 311.0, 333.0]

Group dataframe and get sum AND count?

try this:

In [110]: (df.groupby('Company Name')
   .....:    .agg({'Organisation Name':'count', 'Amount': 'sum'})
   .....:    .reset_index()
   .....:    .rename(columns={'Organisation Name':'Organisation Count'})
   .....: )
Out[110]:
          Company Name   Amount  Organisation Count
0  Vifor Pharma UK Ltd  4207.93                   5

or if you don't want to reset index:

df.groupby('Company Name')['Amount'].agg(['sum','count'])

df.groupby('Company Name').agg({'Amount': ['sum','count']})

Demo:

In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
                         sum  count
Company Name
Vifor Pharma UK Ltd  4207.93      5

In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
                      Amount
                         sum count
Company Name
Vifor Pharma UK Ltd  4207.93     5

Groupby with both sum() and mean()

Use DataFrameGroupBy.agg with a dictionary:

df.groupby([pd.Grouper(freq="60min", key="ds"), "name"]).agg({'val1': 'mean', 'val2': sum, 'val3': 'mean'})

How can I aggregate (sum, mean, etc.) values and create a new Pandas dataframe based on that?

This operation can be done by;

agg_col = "Year"

new_df = df.groupby(by=agg_col, as_index=False).agg({"Count1": "sum", "Count2": "sum"})

And you can change agg_col to Month if you want to group by month.

When using Pandas .groupby, why use .agg versus directly using the function eg .sum()

Setup

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

The primary benefit of using agg is stated in the docs:

Aggregate using one or more operations over the specified axis.

If you have separate operations that need to be applied to each individual column, agg takes a dictionary (or a function, string, or list of strings/functions) that allows you to create that mapping in a single statement. So if you'd like the sum of column a, and the mean of column b:

df.agg({'a': 'sum', 'b': 'mean'})

a    6.0
b    5.0
dtype: float64

It also allows you to apply multiple operations to a single column in a single statement. For example, to find the sum, mean, and std of column a:

df.agg({'a': ['sum', 'mean', 'std']})

        a
sum   6.0
mean  2.0
std   1.0

There's no difference in outcome when you use agg with a single operation. I'd argue that df.agg('sum') is less clear than df.sum(), but the results will be the same:

df.agg('sum')

a     6
b    15
dtype: int64

df.sum()

a     6
b    15
dtype: int64

The main benefit agg provides is the convenience of applying multiple operations.

Can Pandas Groupby Aggregate into a List, Rather Than Sum, Mean, etc