Pandas counting and summing specific conditions
You can first make a conditional selection, and sum up the results of the selection using the sum
function.
>> df = pd.DataFrame({'a': [1, 2, 3]})
>> df[df.a > 1].sum()
a 5
dtype: int64
Having more than one condition:
>> df[(df.a > 1) & (df.a < 3)].sum()
a 2
dtype: int64
If you want to do COUNTIF
, just replace sum()
with count()
Pandas counting and suming specific conditions returns only nan
Instead loops in apply
is possible use vectorized solution, first create numpy arrays chained by &
, compare and for counts True
s is possible use sum
:
a = df['datet']
b = a + pd.Timedelta(days=1)
c = a - pd.Timedelta(days=1)
mask = (a.to_numpy() <= b.to_numpy()[:, None]) & (a.to_numpy() >= c.to_numpy()[:, None])
df["caseIntensity"] = mask.sum(axis=1)
print (df)
datet caseIntensity
0 2020-03-04 2
1 2020-03-05 2
2 2020-03-09 2
3 2020-03-10 3
4 2020-03-11 3
5 2020-03-12 2
Here is perfomance for 6k rows:
df = pd.DataFrame({'datet': [pd.to_datetime("2020-03-04 00:00:00"), pd.to_datetime("2020-03-05 00:00:00"),\
pd.to_datetime("2020-03-09 00:00:00"), pd.to_datetime("2020-03-10 00:00:00"),\
pd.to_datetime("2020-03-11 00:00:00"), pd.to_datetime("2020-03-12 00:00:00")]})
df = pd.concat([df] * 1000, ignore_index=True)
In [140]: %%timeit
...: a = df['datet']
...: b = a + pd.Timedelta(days=1)
...: c = a - pd.Timedelta(days=1)
...:
...: mask = (a.to_numpy() <= b.to_numpy()[:, None]) & (a.to_numpy() >= c.to_numpy()[:, None])
...:
...: df["caseIntensity"] = mask.sum(axis=1)
...:
469 ms ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [141]: %%timeit
...: df["caseIntensity1"] = df.apply(lambda row: get_dates_in_range(df, row), axis=1)
...:
...:
6.2 s ± 368 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Counting the number of rows that meet certain sum condition in pandas dataframe
cumsum
+ idxmax
should work:
df.A.cumsum().gt(5).idxmax()
3
Pandas Pivot Table counting based on condition and sum columns
Use:
dft=df.pivot_table(values='sold_kg',columns='day', index='Product', aggfunc=['sum','size'])
First flatten MultiIndex in columns
with mapping:
dft.columns = dft.columns.map(lambda x: f'{x[0]}_{x[1]}')
Then select columns by DataFrame.filter
and sum
, for count values greater or equal use DataFrame.ge
and count True
s by sum
:
dft['Fruit Total'] = dft.filter(like='sum').sum(axis=1)
dft['Count >= 2'] = dft.filter(like='size').ge(2).sum(axis=1)
print (dft)
sum_22 sum_23 sum_25 size_22 size_23 size_25 Fruit Total \
Product
apple 8 2 2 3 1 1 12
orange 7 7 2 2 2 1 16
Count >= 2
Product
apple 1
orange 2
Python Pandas Counting and Summing columns based on datetime values
You would have to loop across the dataframe as you have to compare each row with every other row. One improvement can be there in the below solution is by sorting by Submit_Date
such that you have to compare with either below that record or above that record for the submit_date comparison.
result = list()
for row in df.iterrows():
cur_data = row[1]
result.append((((cur_data['Submit_Date'] < df['Submit_Date']) & (df['Submit_Date']< cur_data['Resolved_Date']))
| ((cur_data['Submit_Date'] < df['Resolved_Date']) & (df['Resolved_Date'] < cur_data['Resolved_Date']))).sum())
df['count'] = result
Submit_Date Resolved_Date count
1 2016-10-01 23:41:00 2016-10-02 02:27:00 2
2 2016-10-01 23:50:00 2017-03-09 19:39:00 3
3 2016-10-02 00:05:00 2016-11-15 12:46:00 2
4 2016-10-03 05:17:00 2016-11-14 17:37:00 0
Count values in column with ranges given a specific condition
You need to loop here.
Either using Series.apply
with a lambda function and sum
:
df['ct'] = df['nv1'].apply(lambda s: sum(e<-1 for e in s))
or with a classical loop comprehension:
df['ct'] = [sum(e<-1 for e in s) for s in df['nv1']]
output:
R an nv1 ct
0 1 f [-1.0] 0
1 2 i [-1.0] 0
2 3 - [] 0
3 4 - [] 0
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
If you really want empty strings in place of zeros:
df['ct'] = [S if (S:=sum(e<-1 for e in s)) else '' for s in df['nv1']]
output:
R an nv1 ct
0 1 f [-1.0]
1 2 i [-1.0]
2 3 - []
3 4 - []
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
Countif in Pandas Dataframe
Since (df[cols] == 2)
outputs a df
of True
or False
values, and True
is equivalent to 1
, while False
is equivalent to 0
, you should use sum
instead of count
:
Twos = (df[cols] == 2).sum(axis=1)
count
will count all non missing values, sum
with a conditional filter will result in a count of values satisfying your condition.
Related Topics
How to Get the Sum of a CSV Column List to Print
Python Ttk Treeview: How to Select and Set Focus on a Row
How to Plot Multiple Pandas Columns
Most Pythonic Way to Kill a Thread After Some Period of Time
How to Skip Specific Indexes in an Array
How to Use Chrome Webdriver in Selenium to Download Files in Python
How to Pass Variables from Python Script to Bash Script
Pyspark: How to Duplicate a Row N Time in Dataframe
Move Files Between Two Aws S3 Buckets Using Boto3
Vscode Import Error for Python Module
Convert Images from [-1; 1] to [0; 255]
How to Download the Latest File of an S3 Bucket Using Boto3
Creating New Dataframes in Loop in Python
How to Execute Local Python Scripts in Jenkins Ui
How to Merge 2 CSV Files Together by Multiple Columns in Python