How to access pandas groupby dataframe by key
You can use the get_group
method:
In [21]: gb.get_group('foo')
Out[21]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
Note: This doesn't require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb))
. This is because it uses data-structures already available in the groupby object.
You can select different columns using the groupby slicing:
In [22]: gb[["A", "B"]].get_group("foo")
Out[22]:
A B
0 foo 1.624345
2 foo -0.528172
4 foo 0.865408
In [23]: gb["C"].get_group("foo")
Out[23]:
0 5
2 11
4 14
Name: C, dtype: int64
How to access group keys during aggregation in pandas groupby?
Use:
df1 = df.groupby('score').agg(score_age = ('age', lambda x: x.name + x.sum()))
print (df1)
score_age
score
2 42
3 48
Get all keys from GroupBy object in Pandas
You can access this via attribute .groups
on the groupby
object, this returns a dict, the keys of the dict gives you the groups:
In [40]:
df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()
Out[40]:
dict_keys([0, 1, 2, 3])
here is the output from groups
:
In [41]:
gp.groups
Out[41]:
{0: Int64Index([0], dtype='int64'),
1: Int64Index([1, 2, 3], dtype='int64'),
2: Int64Index([4, 5], dtype='int64'),
3: Int64Index([6, 7, 8], dtype='int64')}
Update
it looks like that because the type of groups
is a dict
then the group order isn't maintained when you call keys
:
In [65]:
df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()
Out[65]:
dict_keys(['b', 'e', 'g', 'a', 'x'])
if you call groups
you can see the order is maintained:
In [79]:
gp.groups
Out[79]:
{'a': Int64Index([2, 3, 4], dtype='int64'),
'b': Int64Index([0, 5, 8], dtype='int64'),
'e': Int64Index([7], dtype='int64'),
'g': Int64Index([1], dtype='int64'),
'x': Int64Index([6], dtype='int64')}
then the key order is maintained, a hack around this is to access the .name
attribute of each group:
In [78]:
gp.apply(lambda x: x.name)
Out[78]:
group
a a
b b
e e
g g
x x
dtype: object
which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:
In [81]:
agg = gp.sum()
agg
Out[81]:
val
group
a 9
b 13
e 7
g 1
x 6
In [83]:
agg.index.get_level_values(0)
Out[83]:
Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')
Pandas: how to get a particular group after groupby?
Try: grouped.get_group('foo')
, that is what you need.
How do I access a pandas groupby dataframe by grouped index?
gf.reset_index(level=0, inplace=True)
gf[gf.A == 'bar']
returns:
A B
0 bar 7
Plot:
import matplotlib.pyplot as plt
plt.bar(gf.A, gf.B)
Pandas Groupby: selection where both subgroups exist
Let's try:
# mask the groups with more than one colors
s = df.groupby(['id','group'])['value'].transform('size') > 1
# boolean index the groups and query, then another groupby with max
df[s].query('color=="black"').groupby(['id','color'])['value'].max()
Output:
id color
i1 black 5
i2 black 6
Name: value, dtype: int64
How do I access data inside a pandas dataframe groupby object?
As noted in the Group By: split-apply-combine documentation, the data are stored in a GroupBy object
, which is a data structure with special attributes.
You can verify this for yourself:
>>> type(df_grouped)
Should return:
<class 'pandas.core.groupby.DataFrameGroupBy'>
The structure of the data is well explained by this snippet from the docs:
The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.
As you noticed, you can easily iterate through each individual group. However, there are often vectorized methods that work very nicely with groupby
objects, and can access information and calculate things much more effectively and quickly.
Accessing groupby value based on id?
If I understand you correctly, you can use .to_dict()
and then you can access your values by key (in this case id
):
#... code as in your question
out = a.to_dict(orient="index")
print(out)
print(out[5]["min"]) # <-- access by `5` and `min`
Prints:
{
1: {"min": 100, "max": 200},
3: {"min": 258, "max": 585},
5: {"min": 89, "max": 632},
}
89
Related Topics
Speed Comparison with Project Euler: C VS Python VS Erlang VS Haskell
How to Use a Multiprocessing.Manager()
Create PDF from a List of Images
Run Child Processes as Different User from a Long Running Python Process
How to Convert Strings in a Pandas Data Frame to a 'Date' Data Type
Is There a Clever Way to Pass the Key to Defaultdict's Default_Factory
How to Do a Not Equal in Django Queryset Filtering
Creating Over 20 Unique Legend Colors Using Matplotlib
Merging Two CSV Files Using Python
Running Get_Dummies on Several Dataframe Columns
Ssl.Sslerror: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed (_Ssl.C:749)
Add Sum of Values of Two Lists into New List
How to Use Pil to Make All White Pixels Transparent
How to Get a Thread Safe Print in Python 2.6
How to Get Around Declaring an Unused Variable in a for Loop
Creating Dynamically Named Variables from User Input
What's the Best Way to Generate a Uml Diagram from Python Source Code