How to Do a Conditional Count After Groupby on a Pandas Dataframe

How to do a conditional count after groupby on a Pandas Dataframe?

I think you need add condition first:

#if need also category c with no values of 'one'
df11=df.groupby('key1')['key2'].apply(lambda x: (x=='one').sum()).reset_index(name='count')
print (df11)
key1 count
0 a 2
1 b 1
2 c 0

Or use categorical with key1, then missing value is added by size:

df['key1'] = df['key1'].astype('category')
df1 = df[df['key2'] == 'one'].groupby(['key1']).size().reset_index(name='count')
print (df1)
key1 count
0 a 2
1 b 1
2 c 0

If need all combinations:

df2 = df.groupby(['key1', 'key2']).size().reset_index(name='count') 
print (df2)
key1 key2 count
0 a one 2
1 a two 1
2 b one 1
3 b two 1
4 c two 1

df3 = df.groupby(['key1', 'key2']).size().unstack(fill_value=0)
print (df3)
key2 one two
key1
a 2 1
b 1 1
c 0 1

Conditional counts in pandas group by

You can try replace your 2 lines with .count() to .sum(), as follows:

d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()

.count() returns number of non-NA/null observations in the Series of boolean index while both True/False are not NA/null and will be counted as well.

.sum() returns the sum of entries of True since True is interpreted as 1 while False is interpreted as 0 in the summation.

pandas dataframe groupby conditional count on multi-level column

The only ways to select MultiIndex columns from a groupby is with a list of tuples or a MultiIndex (as indicated by the Error Message):

So, instead of [('exp0', 'rdn')] it needs to be [[('exp0', 'rdn')]], then it just needs to be a valid column name like ('exp0', 'rnd0'), for example.

df['exp0_cnt', 'rdn0'] = (
df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]]
# ^ need to use valid column name
# ^ needs to be a list of tuples
.transform(lambda x: x.gt(50).sum()) # Some function that works
)

*I've also changed the apply function, because it seems to be missing the lambda so I made a guess as to an equivalent:

.apply(sum(x > 50 for x in df[(('exp0', 'rdn'))])

To transform since it's being assigned back to the DataFrame:

.transform(lambda x: x.gt(50).sum())

df:

experiments exp0      exp1      grp1 grp2 exp0_cnt
rnd_runs rnd0 rnd1 rnd0 rnd1 cat cat2 rdn0
0 66 92 98 17 A C 2
1 83 57 86 97 A C 2 # 2 values over 50 (in group)
2 96 47 73 32 B C 1
3 46 96 25 83 B B 1 # 1 values over 50 (in group)

Please Note: This means that a SeriesGroupBy cannot be created by selecting MultiIndex columns, only DataFrameGroupBy operations.

type(df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]])
# <class 'pandas.core.groupby.generic.DataFrameGroupBy'>

This will exclude a few operations like SeriesGroupBy.unique

df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]].unique()
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

However, we can force a SeriesGroupBy by Selecting the Series from the DataFrame and grouping by the Series values directly:

df[('exp0', 'rnd0')].groupby(df[('grp1', 'cat')]).unique()
# ^ select specific column ^ pass the Series to groupby directly
(grp1, cat)
A [66, 83]
B [96, 46]
Name: (exp0, rnd0), dtype: object

How can I use pandas groupby.count() for a condition

You can do it like this:

df_Grouped = df.groupby(['Ticker']).agg({'Trade Results': [('Count', 'count'), ('Profitable', lambda x: len(x[x>0]))]}).reset_index()

Output:

                 Count Profitable
0 BTC 3 2
1 ETH 3 2
2 LTC 3 2

Conditional Counting in Groupby Pandas

Using groupby.agg with a dictionary of calculations:

from collections import OrderedDict

df.columns=['ticker', 'date', 'accuracy']

groupers = OrderedDict([('mean', np.mean),
('>_0.20_pct', lambda x: (x > 0.20).sum()/len(x)),
('>_0.50_pct', lambda x: (x > 0.50).sum()/len(x)),
('>_0.70_pct', lambda x: (x > 0.70).sum()/len(x))])

res = df.groupby('ticker')['accuracy'].agg(groupers)

print(res)

mean >_0.20_pct >_0.50_pct >_0.70_pct
ticker
AAAP 0.806244 1.000000 0.666667 0.666667
AAL 0.298683 0.666667 0.000000 0.000000
ZAYO 0.164886 0.333333 0.000000 0.000000
ZBH 0.103811 0.000000 0.000000 0.000000

Conditionally count values in a pandas groupby object

I think you need:

np.random.seed(6)

N = 15
master_lso = pd.DataFrame({'lsoa11': np.random.randint(4, size=N),
'TOTAL_FLOOR_AREA': np.random.choice([0,30,40,50], size=N)})
master_lso['lsoa11'] = 'a' + master_lso['lsoa11'].astype(str)
print (master_lso)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
2 30 a3
3 0 a0
4 40 a2
5 0 a1
6 30 a3
7 0 a2
8 40 a0
9 0 a2
10 0 a1
11 50 a1
12 50 a3
13 40 a1
14 30 a1

First filter rows by condition by boolean indexing - it is faster before grouping, because less rows.

df = master_lso[master_lso['TOTAL_FLOOR_AREA'] > 30]
print (df)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
4 40 a2
8 40 a0
11 50 a1
12 50 a3
13 40 a1

Then groupby and aggregate size:

df1 = df.groupby('lsoa11')['TOTAL_FLOOR_AREA'].size().reset_index(name='Count')
print (df1)
lsoa11 Count
0 a0 1
1 a1 3
2 a2 2
3 a3 1


Related Topics



Leave a reply



Submit