How to Group Dataframe Rows into List in Pandas Groupby

How to group dataframe rows into list in pandas groupby

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df

Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

In [2]: df.groupby('a')['b'].apply(list)
Out[2]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
df1
Out[3]:
a new
0 A [1, 2]
1 B [5, 5, 4]
2 C [6]

groupby and convert rows into list using pandas

df['e_values'] = df.filter(like='col_').apply(list, axis=1)

Group by and aggregate rows into list of series or dicts in Pandas

Try this

compact_df =  df.groupby('ID').apply(lambda group: group.to_dict(orient='records'))

Can pandas groupby aggregate into a list, rather than sum, mean, etc?

my solution is a bit longer than you may expect, I'm sure it could be shortened, but:

g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)

# output
# rn 1 2 3 4 5 6
# A
# 1 10 12 11 22 20 8
# 2 10 11 10 13 NaN NaN
# 3 14 10 NaN NaN NaN NaN

A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"]))). This one group df by A and then put columns B and C into one column:

A   
1 0 10
1 12
2 11
0 22
1 20
2 8
2 3 10
4 11
3 10
4 13
3 5 14
5 10

Then k = g.reset_index(), creating sequential index, result is:

    A  level_1   0
0 1 0 10
1 1 1 12
2 1 2 11
3 1 0 22
4 1 1 20
5 1 2 8
6 2 3 10
7 2 4 11
8 2 3 10
9 2 4 13
10 3 5 14
11 3 5 10

Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index:

    A  level_1   0   i
0 1 0 10 0
1 1 1 12 1
2 1 2 11 2
3 1 0 22 3
4 1 1 20 4
5 1 2 8 5
6 2 3 10 6
7 2 4 11 7
8 2 3 10 8
9 2 4 13 9
10 3 5 14 10
11 3 5 10 11

Now, k["rn"] = k1.groupby("A")["i"].rank() will add row_number inside each A (like row_number() over(partition by A order by i) in SQL:

    A  level_1   0   i  rn
0 1 0 10 0 1
1 1 1 12 1 2
2 1 2 11 2 3
3 1 0 22 3 4
4 1 1 20 4 5
5 1 2 8 5 6
6 2 3 10 6 1
7 2 4 11 7 2
8 2 3 10 8 3
9 2 4 13 9 4
10 3 5 14 10 1
11 3 5 10 11 2

And finally, just pivoting with k.pivot_table(rows="A", cols="rn", values=0):

rn   1   2   3   4   5   6
A
1 10 12 11 22 20 8
2 10 11 10 13 NaN NaN
3 14 10 NaN NaN NaN NaN

How to combine dataframe rows, and combine their string column into list?

This will group by the name column and set all of the values in a to a unique list

import pandas as pd
import numpy as np

df.groupby(['name'])['A'].apply(lambda x : np.unique(list(x))).reset_index()

Combine top n-th rows of a group into a single row of list with Pandas

As your table is already sorted desc by the amount column, you can get the top n-th rows for each group by GroupBy.head(n). To further group item column of these top n-th rows into list, you can further use GroupBy.agg(), as follows:

n = 2      # define n

(df.groupby('store_id').head(n)
.groupby('store_id')['item'].agg(list)
).reset_index()

Result:

   store_id           item
0 1 [shirt, sock]
1 2 [sock, pants]

Pandas groupby rows into list and sum

Use groupby and sum for aggregating the numeric data, and apply(tuple) for aggregating the index level.

g = df.reset_index(level=-1).groupby(level=[0, 1])
res = g.sum().set_index(g.level_2.apply(tuple), append=True)

print(res)
F M
0 5 10 30
level_2
x y (a1, a2, a3, a4) 1 3 0 4
x1 y1 (a1, a2, a3, a4) 3 4 2 8
x2 y2 (a1, a2) 0 0 0 0

Note, the index can only contain hashable values, and lists are not hashable, so tuples are the next best thing.

Pandas: groupby to list

You can use apply(list):

>>> df.groupby(['id', 'time'])['value'].apply(list)

id time
1 2000 [5, 6, 7]
2001 [5]
2 2000 [3]
2001 [3]
2005 [4, 5]
3 2000 [3]
2005 [6]
Name: value, dtype: object

If you really want it in the exact format as you displayed, you can then groupby id and apply list again, but this is not efficient, and that format is arguably harder to work with...

>>> df.groupby(['id','time'])['value'].apply(list).groupby('id').apply(list).tolist()
[[[5, 6, 7], [5]], [[3], [3], [4, 5]], [[3], [6]]]

Filtering rows in Pandas Groupby based on a condition within the group

You can groupby account_id and filter rows before the first initial_balance then cumsum() on amount column

out = df.groupby('account_id').apply(lambda g: g[g['data_type'].eq('initial_balance').cumsum().eq(1)]).reset_index(drop=True)
out['amount'] = out.groupby('account_id')['amount'].cumsum()
print(out)

account_id data_type transaction_date amount
0 1001 initial_balance 2022-04-01 100
1 1001 payment 2022-04-14 80
2 1002 initial_balance 2022-04-02 200
3 1002 payment 2022-04-13 180
4 1002 payment 2022-05-01 160
5 1002 payment 2022-05-03 140
6 1003 initial_balance 2022-04-10 150
7 1003 payment 2022-04-20 100


Related Topics



Leave a reply



Submit