How to loop over grouped Pandas dataframe?
df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
does already return a dataframe, so you cannot loop over the groups anymore.
In general:
df.groupby(...)
returns aGroupBy
object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:grouped = df.groupby('A')
for name, group in grouped:
...When you apply a function on the groupby, in your example
df.groupby(...).agg(...)
(but this can also betransform
,apply
,mean
, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).
How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?
I think I would do it like this:
Create some data for testing
df = pd.DataFrame({'Id':np.random.randint(1,10,100),'Type':np.random.choice(list('ABCD'),100),'Guid':np.random.randint(10000,99999,100)})
print(df.head()
Id Type Guid
0 2 A 89247
1 4 B 39262
2 3 C 45522
3 1 B 99724
4 4 C 51322
Choose n for number of records to return and groupby
n = 5
df_groups = df.groupby('Id')
Iterate through df_group with for loop and print
for name,group in df_groups:
print('ID: ' + str(name))
print(group.head(n))
print("\n")
Output:
ID: 1
Id Type Guid
3 1 B 99724
5 1 B 74182
37 1 D 49219
47 1 B 81464
65 1 C 84925
ID: 2
Id Type Guid
0 2 A 89247
6 2 A 16499
7 2 A 79956
34 2 C 56393
40 2 A 49883
.
.
.
EDIT To print all the Guids in a list for each ID you can use the following:
for name,group in df_groups:
print('ID: ' + str(name))
print(group.Guid.tolist())
print("\n")
Output:
ID: 1
[99724, 74182, 49219, 81464, 84925, 67834, 43275, 35743, 36478, 94662, 21183]
ID: 2
[89247, 16499, 79956, 56393, 49883, 97633, 11768, 14639, 88591, 31263, 98729]
ID: 3
[45522, 13971, 75882, 96489, 58414, 22051, 80304, 46144, 22481, 11278, 84622, 61145]
ID: 4
[39262, 51322, 76930, 83740, 60152, 90735, 42039, 22114, 76077, 83234, 96134, 93559, 87903, 98199, 76096, 64378]
ID: 5
[13444, 55762, 13206, 94768, 19665, 75761, 90755, 45737, 23506, 89345, 94912, 81200, 91868]
.
.
.
Grouping a grouped dataframe in a nested loop
We can't know with what you've given us, but your code should work fine... it does for me.
df = pd.DataFrame({'EmployeeNo':[11111,11112,11113,11115,11116,11128],
'OutletName':['Outlet1', 'Outlet2', 'Outlet3','Outlet4', 'Outlet5','Outlet6'],
'EmployeeName':['John','Tom','Bob','Sam', 'Sean', 'Zac'],
'TargetAmount':[1000,500,400,500,300,800]})
g=df.groupby('OutletName')
for a,b in g:
for c,d in b.groupby('EmployeeName'):
print(c)
Output:
John
Tom
Bob
Sam
Sean
Zac
Iterating over groups (Python pandas dataframe)
The .groupby()
object has a .groups
attribute that returns a Python dict of indices. In this case:
In [26]: df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
....: 'B': ['me', 'you', 'me'] * 2,
....: 'C': [5, 2, 3, 4, 6, 9]})
In [27]: groups = df.groupby('A')
In [28]: groups.groups
Out[28]: {'bar': [1L, 3L, 5L], 'foo': [0L, 2L, 4L]}
You can iterate over this as follows:
keys = groups.groups.keys()
for index in range(0, len(keys) - 1):
g1 = df.ix[groups.groups[keys[index]]]
g2 = df.ix[groups.groups[keys[index + 1]]]
# Do something with g1, g2
However, please remember that using for
loops to iterate over Pandas objects is generally slower than vector operations. Depending on what you need done, and if it needs to be fast, you may want to try other approaches.
How to for loop on pandas dataframe by grouping the values by dates then extracting the filtered group to be saved as a new dataframe
It's possible to create an empty dictionary and then group the dates into groups and do a loop process.
df_dict = {}
for name, group in df.groupby('Date_UTC'):
df_dict[name] = group
df_dict.values()
dict_values([ Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D, Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D, Date_UTC Magnitude Vector Station
...])
You can also get simply grouping
for name, group in df.groupby('Date_UTC'):
print('split_date:', name)
print(group)
split_date: 2020-01-05 00:00:00
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D
split_date: 2020-01-19 00:00:00
Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D
....
Create dataframe
df_list = ['df_0','df_1','df_2','df_3','df_4','df_5','df_6','df_7']
i = 0
for name, group in df.groupby('Date_UTC'):
df_list[i] = group
i += 1
df_0
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D
Related Topics
How to Check If a String Contains 2 of the Same Character
How to Get Elasticsearch to Perform an Exact Match Query
How to Scroll a Web Page Using Selenium Webdriver in Python
How to Make Multiple Empty Lists in Python
Test a Function Called Twice in Python
How to Transform Floats to Integers in a List
How to Insert Text At Line and Column Position in a File
Replace Values of a Numpy Index Array With Values of a List
Check If a Python Script Is Already Running in Windows
Extract File Name from Read_Csv - Python
How to Replace Negative Numbers in Pandas Data Frame by Zero
Django Viewset Has Not Attribute 'Get_Extra_Actions'
How to Store Multiple Strings as One Variable
Flask API Typeerror: Object of Type 'Response' Is Not Json Serializable
Printing Simple Diamond Pattern in Python
Make Alternate Letters Capital