How to do a conditional count after groupby on a Pandas Dataframe?
I think you need add condition first:
#if need also category c with no values of 'one'
df11=df.groupby('key1')['key2'].apply(lambda x: (x=='one').sum()).reset_index(name='count')
print (df11)
key1 count
0 a 2
1 b 1
2 c 0
Or use categorical
with key1
, then missing value is added by size
:
df['key1'] = df['key1'].astype('category')
df1 = df[df['key2'] == 'one'].groupby(['key1']).size().reset_index(name='count')
print (df1)
key1 count
0 a 2
1 b 1
2 c 0
If need all combinations:
df2 = df.groupby(['key1', 'key2']).size().reset_index(name='count')
print (df2)
key1 key2 count
0 a one 2
1 a two 1
2 b one 1
3 b two 1
4 c two 1
df3 = df.groupby(['key1', 'key2']).size().unstack(fill_value=0)
print (df3)
key2 one two
key1
a 2 1
b 1 1
c 0 1
Conditional counts in pandas group by
You can try replace your 2 lines with .count()
to .sum()
, as follows:
d['Zero_Balance_days'] = (x['Balance'] < 0).sum()
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()
.count()
returns number of non-NA/null observations in the Series of boolean index while both True
/False
are not NA/null and will be counted as well.
.sum()
returns the sum of entries of True
since True
is interpreted as 1
while False
is interpreted as 0
in the summation.
pandas dataframe groupby conditional count on multi-level column
The only ways to select MultiIndex columns from a groupby is with a list of tuples or a MultiIndex (as indicated by the Error Message):
So, instead of [('exp0', 'rdn')]
it needs to be [[('exp0', 'rdn')]]
, then it just needs to be a valid column name like ('exp0', 'rnd0')
, for example.
df['exp0_cnt', 'rdn0'] = (
df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]]
# ^ need to use valid column name
# ^ needs to be a list of tuples
.transform(lambda x: x.gt(50).sum()) # Some function that works
)
*I've also changed the apply function, because it seems to be missing the lambda
so I made a guess as to an equivalent:
.apply(sum(x > 50 for x in df[(('exp0', 'rdn'))])
To transform
since it's being assigned back to the DataFrame:
.transform(lambda x: x.gt(50).sum())
df
:
experiments exp0 exp1 grp1 grp2 exp0_cnt
rnd_runs rnd0 rnd1 rnd0 rnd1 cat cat2 rdn0
0 66 92 98 17 A C 2
1 83 57 86 97 A C 2 # 2 values over 50 (in group)
2 96 47 73 32 B C 1
3 46 96 25 83 B B 1 # 1 values over 50 (in group)
Please Note: This means that a SeriesGroupBy
cannot be created by selecting MultiIndex columns, only DataFrameGroupBy
operations.
type(df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]])
# <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
This will exclude a few operations like SeriesGroupBy.unique
df.groupby([('grp1', 'cat')])[[('exp0', 'rnd0')]].unique()
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'
However, we can force a SeriesGroupBy
by Selecting the Series from the DataFrame and grouping by the Series values directly:
df[('exp0', 'rnd0')].groupby(df[('grp1', 'cat')]).unique()
# ^ select specific column ^ pass the Series to groupby directly
(grp1, cat)
A [66, 83]
B [96, 46]
Name: (exp0, rnd0), dtype: object
How can I use pandas groupby.count() for a condition
You can do it like this:
df_Grouped = df.groupby(['Ticker']).agg({'Trade Results': [('Count', 'count'), ('Profitable', lambda x: len(x[x>0]))]}).reset_index()
Output:
Count Profitable
0 BTC 3 2
1 ETH 3 2
2 LTC 3 2
Conditional Counting in Groupby Pandas
Using groupby.agg
with a dictionary of calculations:
from collections import OrderedDict
df.columns=['ticker', 'date', 'accuracy']
groupers = OrderedDict([('mean', np.mean),
('>_0.20_pct', lambda x: (x > 0.20).sum()/len(x)),
('>_0.50_pct', lambda x: (x > 0.50).sum()/len(x)),
('>_0.70_pct', lambda x: (x > 0.70).sum()/len(x))])
res = df.groupby('ticker')['accuracy'].agg(groupers)
print(res)
mean >_0.20_pct >_0.50_pct >_0.70_pct
ticker
AAAP 0.806244 1.000000 0.666667 0.666667
AAL 0.298683 0.666667 0.000000 0.000000
ZAYO 0.164886 0.333333 0.000000 0.000000
ZBH 0.103811 0.000000 0.000000 0.000000
Conditionally count values in a pandas groupby object
I think you need:
np.random.seed(6)
N = 15
master_lso = pd.DataFrame({'lsoa11': np.random.randint(4, size=N),
'TOTAL_FLOOR_AREA': np.random.choice([0,30,40,50], size=N)})
master_lso['lsoa11'] = 'a' + master_lso['lsoa11'].astype(str)
print (master_lso)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
2 30 a3
3 0 a0
4 40 a2
5 0 a1
6 30 a3
7 0 a2
8 40 a0
9 0 a2
10 0 a1
11 50 a1
12 50 a3
13 40 a1
14 30 a1
First filter rows by condition by boolean indexing
- it is faster before grouping, because less rows.
df = master_lso[master_lso['TOTAL_FLOOR_AREA'] > 30]
print (df)
TOTAL_FLOOR_AREA lsoa11
0 40 a2
1 50 a1
4 40 a2
8 40 a0
11 50 a1
12 50 a3
13 40 a1
Then groupby
and aggregate size
:
df1 = df.groupby('lsoa11')['TOTAL_FLOOR_AREA'].size().reset_index(name='Count')
print (df1)
lsoa11 Count
0 a0 1
1 a1 3
2 a2 2
3 a3 1
Related Topics
Python: How to Print Separate Lines from a List
How to Overwrite Part of a Text File in Python
How to Change Python Version in Anaconda Spyder
How to Update a Pyspark Dataframe With New Values from Another Dataframe
Convert Numbers into Corresponding Letter Using Python
How to Check If Keras Is Using Gpu Version of Tensorflow
Get List of Files in a Sharepoint Directory Using Python
How to Eliminate Null Valued Cells from a CSV Dataset Using Python
Deleting Rows from CSV Based on Cell Contents from Another Csv
Plot Different Dataframes in the Same Figure
How to Split Text Without Spaces into List of Words
How to Clear or Overwrite a Tkinter Canvas
How to Compile Python Script to Binary Executable
How to Fill in Arbitrary Missing Dates in Pandas Dataframe
Print Floating Point Values Without Leading Zero
Opening a Word Document That Has a Password Using Docx Library