How to Access Pandas Groupby Dataframe by Key

How to access pandas groupby dataframe by key

You can use the get_group method:

In [21]: gb.get_group('foo')
Out[21]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14

Note: This doesn't require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb)). This is because it uses data-structures already available in the groupby object.


You can select different columns using the groupby slicing:

In [22]: gb[["A", "B"]].get_group("foo")
Out[22]:
A B
0 foo 1.624345
2 foo -0.528172
4 foo 0.865408

In [23]: gb["C"].get_group("foo")
Out[23]:
0 5
2 11
4 14
Name: C, dtype: int64

How to access group keys during aggregation in pandas groupby?

Use:

df1 = df.groupby('score').agg(score_age = ('age', lambda x: x.name + x.sum()))
print (df1)
score_age
score
2 42
3 48

Get all keys from GroupBy object in Pandas

You can access this via attribute .groups on the groupby object, this returns a dict, the keys of the dict gives you the groups:

In [40]:
df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[40]:
dict_keys([0, 1, 2, 3])

here is the output from groups:

In [41]:
gp.groups

Out[41]:
{0: Int64Index([0], dtype='int64'),
1: Int64Index([1, 2, 3], dtype='int64'),
2: Int64Index([4, 5], dtype='int64'),
3: Int64Index([6, 7, 8], dtype='int64')}

Update

it looks like that because the type of groups is a dict then the group order isn't maintained when you call keys:

In [65]:
df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[65]:
dict_keys(['b', 'e', 'g', 'a', 'x'])

if you call groups you can see the order is maintained:

In [79]:
gp.groups

Out[79]:
{'a': Int64Index([2, 3, 4], dtype='int64'),
'b': Int64Index([0, 5, 8], dtype='int64'),
'e': Int64Index([7], dtype='int64'),
'g': Int64Index([1], dtype='int64'),
'x': Int64Index([6], dtype='int64')}

then the key order is maintained, a hack around this is to access the .name attribute of each group:

In [78]:
gp.apply(lambda x: x.name)

Out[78]:
group
a a
b b
e e
g g
x x
dtype: object

which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:

In [81]:
agg = gp.sum()
agg

Out[81]:
val
group
a 9
b 13
e 7
g 1
x 6

In [83]:
agg.index.get_level_values(0)

Out[83]:
Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')

Pandas: how to get a particular group after groupby?

Try: grouped.get_group('foo'), that is what you need.

How do I access a pandas groupby dataframe by grouped index?

gf.reset_index(level=0, inplace=True)

gf[gf.A == 'bar']

returns:

     A  B
0 bar 7

Plot:

import matplotlib.pyplot as plt

plt.bar(gf.A, gf.B)

Pandas Groupby: selection where both subgroups exist

Let's try:

# mask the groups with more than one colors
s = df.groupby(['id','group'])['value'].transform('size') > 1

# boolean index the groups and query, then another groupby with max
df[s].query('color=="black"').groupby(['id','color'])['value'].max()

Output:

id  color
i1 black 5
i2 black 6
Name: value, dtype: int64

How do I access data inside a pandas dataframe groupby object?

As noted in the Group By: split-apply-combine documentation, the data are stored in a GroupBy object, which is a data structure with special attributes.

You can verify this for yourself:

>>> type(df_grouped)

Should return:

<class 'pandas.core.groupby.DataFrameGroupBy'>

The structure of the data is well explained by this snippet from the docs:

The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

As you noticed, you can easily iterate through each individual group. However, there are often vectorized methods that work very nicely with groupby objects, and can access information and calculate things much more effectively and quickly.

Accessing groupby value based on id?

If I understand you correctly, you can use .to_dict() and then you can access your values by key (in this case id):


#... code as in your question

out = a.to_dict(orient="index")

print(out)
print(out[5]["min"]) # <-- access by `5` and `min`

Prints:

{
1: {"min": 100, "max": 200},
3: {"min": 258, "max": 585},
5: {"min": 89, "max": 632},
}
89


Related Topics



Leave a reply



Submit