How to Loop Over Grouped Pandas Dataframe

How to loop over grouped Pandas dataframe?

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

In general:

  • df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:

    grouped = df.groupby('A')

    for name, group in grouped:
    ...
  • When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?

I think I would do it like this:

Create some data for testing

df = pd.DataFrame({'Id':np.random.randint(1,10,100),'Type':np.random.choice(list('ABCD'),100),'Guid':np.random.randint(10000,99999,100)})

print(df.head()
Id Type Guid
0 2 A 89247
1 4 B 39262
2 3 C 45522
3 1 B 99724
4 4 C 51322

Choose n for number of records to return and groupby

n = 5
df_groups = df.groupby('Id')

Iterate through df_group with for loop and print

for name,group in df_groups:
print('ID: ' + str(name))
print(group.head(n))
print("\n")

Output:

ID: 1
Id Type Guid
3 1 B 99724
5 1 B 74182
37 1 D 49219
47 1 B 81464
65 1 C 84925


ID: 2
Id Type Guid
0 2 A 89247
6 2 A 16499
7 2 A 79956
34 2 C 56393
40 2 A 49883
.
.
.

EDIT To print all the Guids in a list for each ID you can use the following:

for name,group in df_groups:
print('ID: ' + str(name))
print(group.Guid.tolist())
print("\n")

Output:

ID: 1
[99724, 74182, 49219, 81464, 84925, 67834, 43275, 35743, 36478, 94662, 21183]


ID: 2
[89247, 16499, 79956, 56393, 49883, 97633, 11768, 14639, 88591, 31263, 98729]


ID: 3
[45522, 13971, 75882, 96489, 58414, 22051, 80304, 46144, 22481, 11278, 84622, 61145]


ID: 4
[39262, 51322, 76930, 83740, 60152, 90735, 42039, 22114, 76077, 83234, 96134, 93559, 87903, 98199, 76096, 64378]


ID: 5
[13444, 55762, 13206, 94768, 19665, 75761, 90755, 45737, 23506, 89345, 94912, 81200, 91868]
.
.
.

Grouping a grouped dataframe in a nested loop

We can't know with what you've given us, but your code should work fine... it does for me.

df = pd.DataFrame({'EmployeeNo':[11111,11112,11113,11115,11116,11128],
'OutletName':['Outlet1', 'Outlet2', 'Outlet3','Outlet4', 'Outlet5','Outlet6'],
'EmployeeName':['John','Tom','Bob','Sam', 'Sean', 'Zac'],
'TargetAmount':[1000,500,400,500,300,800]})
g=df.groupby('OutletName')
for a,b in g:
for c,d in b.groupby('EmployeeName'):
print(c)

Output:

John
Tom
Bob
Sam
Sean
Zac

Iterating over groups (Python pandas dataframe)

The .groupby() object has a .groups attribute that returns a Python dict of indices. In this case:

In [26]: df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
....: 'B': ['me', 'you', 'me'] * 2,
....: 'C': [5, 2, 3, 4, 6, 9]})

In [27]: groups = df.groupby('A')

In [28]: groups.groups
Out[28]: {'bar': [1L, 3L, 5L], 'foo': [0L, 2L, 4L]}

You can iterate over this as follows:

keys = groups.groups.keys()
for index in range(0, len(keys) - 1):
g1 = df.ix[groups.groups[keys[index]]]
g2 = df.ix[groups.groups[keys[index + 1]]]
# Do something with g1, g2

However, please remember that using for loops to iterate over Pandas objects is generally slower than vector operations. Depending on what you need done, and if it needs to be fast, you may want to try other approaches.

How to for loop on pandas dataframe by grouping the values by dates then extracting the filtered group to be saved as a new dataframe

It's possible to create an empty dictionary and then group the dates into groups and do a loop process.

df_dict = {}
for name, group in df.groupby('Date_UTC'):
df_dict[name] = group

df_dict.values()
dict_values([ Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D, Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D, Date_UTC Magnitude Vector Station
...])

You can also get simply grouping

for name, group in df.groupby('Date_UTC'):
print('split_date:', name)
print(group)

split_date: 2020-01-05 00:00:00
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D
split_date: 2020-01-19 00:00:00
Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D
....

Create dataframe

df_list = ['df_0','df_1','df_2','df_3','df_4','df_5','df_6','df_7']
i = 0
for name, group in df.groupby('Date_UTC'):
df_list[i] = group
i += 1

df_0
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D


Related Topics



Leave a reply



Submit