How to group dataframe rows into list in pandas groupby
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
In [2]: df.groupby('a')['b'].apply(list)
Out[2]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
df1
Out[3]:
a new
0 A [1, 2]
1 B [5, 5, 4]
2 C [6]
groupby and convert rows into list using pandas
df['e_values'] = df.filter(like='col_').apply(list, axis=1)
Group by and aggregate rows into list of series or dicts in Pandas
Try this
compact_df = df.groupby('ID').apply(lambda group: group.to_dict(orient='records'))
Can pandas groupby aggregate into a list, rather than sum, mean, etc?
my solution is a bit longer than you may expect, I'm sure it could be shortened, but:
g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)
# output
# rn 1 2 3 4 5 6
# A
# 1 10 12 11 22 20 8
# 2 10 11 10 13 NaN NaN
# 3 14 10 NaN NaN NaN NaN
A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
. This one group df
by A
and then put columns B
and C
into one column:
A
1 0 10
1 12
2 11
0 22
1 20
2 8
2 3 10
4 11
3 10
4 13
3 5 14
5 10
Then k = g.reset_index()
, creating sequential index, result is:
A level_1 0
0 1 0 10
1 1 1 12
2 1 2 11
3 1 0 22
4 1 1 20
5 1 2 8
6 2 3 10
7 2 4 11
8 2 3 10
9 2 4 13
10 3 5 14
11 3 5 10
Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index
:
A level_1 0 i
0 1 0 10 0
1 1 1 12 1
2 1 2 11 2
3 1 0 22 3
4 1 1 20 4
5 1 2 8 5
6 2 3 10 6
7 2 4 11 7
8 2 3 10 8
9 2 4 13 9
10 3 5 14 10
11 3 5 10 11
Now, k["rn"] = k1.groupby("A")["i"].rank()
will add row_number inside each A
(like row_number() over(partition by A order by i)
in SQL:
A level_1 0 i rn
0 1 0 10 0 1
1 1 1 12 1 2
2 1 2 11 2 3
3 1 0 22 3 4
4 1 1 20 4 5
5 1 2 8 5 6
6 2 3 10 6 1
7 2 4 11 7 2
8 2 3 10 8 3
9 2 4 13 9 4
10 3 5 14 10 1
11 3 5 10 11 2
And finally, just pivoting with k.pivot_table(rows="A", cols="rn", values=0)
:
rn 1 2 3 4 5 6
A
1 10 12 11 22 20 8
2 10 11 10 13 NaN NaN
3 14 10 NaN NaN NaN NaN
How to combine dataframe rows, and combine their string column into list?
This will group by the name column and set all of the values in a to a unique list
import pandas as pd
import numpy as np
df.groupby(['name'])['A'].apply(lambda x : np.unique(list(x))).reset_index()
Combine top n-th rows of a group into a single row of list with Pandas
As your table is already sorted desc by the amount
column, you can get the top n-th rows for each group by GroupBy.head(n)
. To further group item
column of these top n-th rows into list, you can further use GroupBy.agg()
, as follows:
n = 2 # define n
(df.groupby('store_id').head(n)
.groupby('store_id')['item'].agg(list)
).reset_index()
Result:
store_id item
0 1 [shirt, sock]
1 2 [sock, pants]
Pandas groupby rows into list and sum
Use groupby
and sum
for aggregating the numeric data, and apply(tuple)
for aggregating the index level.
g = df.reset_index(level=-1).groupby(level=[0, 1])
res = g.sum().set_index(g.level_2.apply(tuple), append=True)
print(res)
F M
0 5 10 30
level_2
x y (a1, a2, a3, a4) 1 3 0 4
x1 y1 (a1, a2, a3, a4) 3 4 2 8
x2 y2 (a1, a2) 0 0 0 0
Note, the index can only contain hashable values, and lists are not hashable, so tuples are the next best thing.
Pandas: groupby to list
You can use apply(list)
:
>>> df.groupby(['id', 'time'])['value'].apply(list)
id time
1 2000 [5, 6, 7]
2001 [5]
2 2000 [3]
2001 [3]
2005 [4, 5]
3 2000 [3]
2005 [6]
Name: value, dtype: object
If you really want it in the exact format as you displayed, you can then groupby id
and apply list
again, but this is not efficient, and that format is arguably harder to work with...
>>> df.groupby(['id','time'])['value'].apply(list).groupby('id').apply(list).tolist()
[[[5, 6, 7], [5]], [[3], [3], [4, 5]], [[3], [6]]]
Filtering rows in Pandas Groupby based on a condition within the group
You can groupby account_id
and filter rows before the first initial_balance
then cumsum()
on amount
column
out = df.groupby('account_id').apply(lambda g: g[g['data_type'].eq('initial_balance').cumsum().eq(1)]).reset_index(drop=True)
out['amount'] = out.groupby('account_id')['amount'].cumsum()
print(out)
account_id data_type transaction_date amount
0 1001 initial_balance 2022-04-01 100
1 1001 payment 2022-04-14 80
2 1002 initial_balance 2022-04-02 200
3 1002 payment 2022-04-13 180
4 1002 payment 2022-05-01 160
5 1002 payment 2022-05-03 140
6 1003 initial_balance 2022-04-10 150
7 1003 payment 2022-04-20 100
Related Topics
Pip' Is Not Recognized as an Internal or External Command
Use Different Python Version With Virtualenv
How to Flush the Output of the Print Function
How to Force Division to Be Floating Point? Division Keeps Rounding Down to 0
Why Isn't the 'Global' Keyword Needed to Access a Global Variable
Does Python Have a String 'Contains' Substring Method
A Non-Blocking Read on a Subprocess.Pipe in Python
Local Variables in Nested Functions
How to Type Hint a Method With the Type of the Enclosing Class
Error "Microsoft Visual C++ 14.0 Is Required (Unable to Find Vcvarsall.Bat)"
Best Way to Structure a Tkinter Application
How to Rotate an Image Around Its Center Using Pygame
Redirect Stdout to a File in Python
Configure Flask Dev Server to Be Visible Across the Network
How to Do Relative Imports in Python