How to count values greater than the group mean in Pandas?
You would need the agg
method
In [28]: df.groupby(['col1', 'col2']).agg(lambda x: (x > x.mean()).sum())
Out[28]:
col3 col4 col5 col6
col1 col2
A B 1.0 2.0 2.0 2.0
C D 2.0 2.0 2.0 2.0
E F 1.0 1.0 1.0 1.0
G H 0.0 0.0 0.0 0.0
In essence, the x
is going to be array-like. x > x.mean()
gives True if the element is larger than the mean and 0 otherwise, sum
then counts the number of Trues.
Count items greater than a value in pandas groupby
You can try to do :
reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
Pandas groupby count values greater than given values in each group
IIUC
(df.set_index('item')).sub(c.set_index('item').reindex(df.item).value,axis=0).gt(0).groupby(level=0).sum()
Out[646]:
A B C
item
a 1.0 1.0 0.0
b 0.0 2.0 1.0
c 1.0 1.0 1.0
d 2.0 0.0 0.0
Counting values greater than each value in a Pandas series following groupby
df = pd.DataFrame({'group': {0: 'a', 1: 'a', 2: 'a', 3: 'a', 4: 'a', 5: 'b', 6: 'b', 7: 'b', 8: 'b', 9: 'b'}, 'value': {0: 2, 1: 4, 2: 2, 3: 3, 4: 5, 5: 1, 6: 2, 7: 4, 8: 1, 9: 5}})
# You may want other methods of rank, but it's not clear from your question.
df['count_in_group'] = df.groupby('group').rank('min').sub(1)
print(df.sort_values(['group', 'value']))
...
group value count_in_group
0 a 2 0.0
2 a 2 0.0
3 a 3 2.0
1 a 4 3.0
4 a 5 4.0
5 b 1 0.0
8 b 1 0.0
6 b 2 2.0
7 b 4 3.0
9 b 5 4.0
Pandas groupby count values above threshold
Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x:
instead.
df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])
The logic here: (x > 0)
returns True/False bool; *1
turns the bool to an integer (1 = True, 0 = False); .sum()
will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.
Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.
Count the value of a column if is greater than 0 in a groupby result
While working on your problem, I also wanted to see if I can get the average percentage for B (while ignoring 0s). I was able to accomplish this as well while getting the counts.
DataFrame for this exercise:
A B C
0 a1 B1 0.00
1 a1 B1 0.00
2 a1 B1 98.87
3 a1 B1 101.10
4 a1 B2 106.67
5 a1 B2 103.00
6 a2 B1 0.00
7 a2 B1 0.00
8 a2 B1 33.00
9 a2 B1 100.00
10 a2 B2 80.00
11 a3 B1 90.00
12 a3 B2 99.00
Average while excluding the zeros
for this I had to add .replace(0, np.nan)
before the groupby function.
A = ['a1','a1','a1','a1','a1','a1','a2','a2','a2','a2','a2','a3','a3']
B = ['B1','B1','B1','B1','B2','B2','B1','B1','B1','B1','B2','B1','B2']
C = [0,0,98.87,101.1,106.67,103,0,0,33,100,80,90,99]
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':A,'B':B,'C':C})
df = pd.DataFrame(df.replace(0, np.nan)
.groupby(['A', 'B'])
.agg({'B':'size','C':['count','mean']})
.rename(columns={'size':'Count','count':'Passed','mean':'Avg Score'})).unstack(level=1)
df.columns = df.columns.droplevel(0)
Count Passed Avg Score
B B1 B2 B1 B2 B1 B2
A
a1 4 2 2 2 99.985 104.835
a2 4 1 2 1 66.500 80.000
a3 1 1 1 1 90.000 99.000
Pandas group by, sum greater than and count
Use DataFrame.assign
for create new column filled by NaN
s for values <= 15
by Series.where
, then is used named aggregation:
final = (test.assign(new = test['VALUE'].where(test['VALUE'] > 15))
.groupby(['DAY','MONTH','TYPE'])
.aggregate(sum = ('new', 'sum'),
count = ('VALUE', 'count')))
How to count value greater than or equal to 0.5 continuous for 5 or greater than 5 rows python
As the source DataFrame I took:
x y z n
0 0.1 1.0 1.0 1.0
1 0.5 1.0 1.0 1.0
2 0.6 1.0 1.0 1.0
3 0.7 1.0 1.0 1.0
4 0.6 1.0 1.0 1.0
5 0.5 1.0 1.0 1.0
6 0.1 1.0 1.0 1.0
7 0.5 1.0 1.0 1.0
8 0.6 1.0 1.0 1.0
9 0.7 1.0 1.0 1.0
10 0.1 1.0 1.0 1.0
11 0.5 1.0 1.0 1.0
12 0.6 1.0 1.0 1.0
13 0.7 1.0 1.0 1.0
14 0.7 1.0 1.0 1.0
15 0.6 1.0 1.0 1.0
16 0.5 1.0 1.0 1.0
17 0.1 1.0 1.0 1.0
18 0.5 2.0 1.0 1.0
19 0.6 2.0 1.0 1.0
20 0.7 2.0 1.0 1.0
21 0.6 2.0 1.0 1.0
22 0.5 2.0 1.0 1.0
(one group for (y, z, n) == (1.0, 1.0, 1.0) and another for (2.0, 1.0, 1.0)).
Start from import itertools as it
.
Then define the following function to get the count of your "wanted"
elements from the current group:
def getCnt(grp):
return sum(filter(lambda x: x >= 5, [ len(list(group))
for key, group in it.groupby(grp.x, lambda elem: elem >= 0.5)
if key ]))
Note that it contains it.groupby, i.e. groupby function from itertools
(not the pandasonic version of it).
The difference is that the itertools version starts a new group on each change
of the grouping key (by default, the value of the source element).
Steps:
it.groupby(grp.x, lambda elem: elem >= 0.5)
- create an iterator,
returning pairs (key, group), from x column of the current group.
The key states whether the current group (from itertools grouping)
contains your "wanted" elements (>= 0.5) and the group contains these
elements.[ len(list(group)) for key, group in … if key ]
- get a list of
lengths of groups, excluding groups of "smaller" elements.filter(lambda x: x >= 5, …)
- filter the above list, leaving only counts
of groups with 5 or more members.sum(…)
- sum the above counts.
Then, to get your expected result, as a DataFrame, apply this function to
each group of rows, this time grouping with the pandasonic version of
groupby.
Then set the name of the resulting Series (it will be the column name
in the final result) and reset the index, to convert it to a DataFrame.
The code to do it is:
result = df.groupby(['y','z','n']).apply(getCnt).rename('Cnt').reset_index()
The result is:
y z n Cnt
0 1.0 1.0 1.0 11
1 2.0 1.0 1.0 5
How to count number of rows per group greater than the average of that group in pandas group by?
You could also aggregate using a lambda
function as the following:
df.groupby(['col1', 'col2']).agg(['mean', 'count',
lambda x: (x > x.mean()).sum()])
Related Topics
Check If Values of Multiple Columns Are the Same (Python)
How to Merge Elements in List in Python With Condition
How to Install Pypdf2 Module Using Windows
How to Write Python Array (Data = []) to Excel
Pandas - How to Compare 2 CSV Files and Output Changes
Calculate Monthly Returns from Daily Returns in Pandas(Cumpound)
How to Locate the Index With in a Nested List Python
How to Create a for Loop That Goes Through All Diagonal Possibilities of a List
Splitting Dictionary Items into Smaller Dictionaries Based on Condition
How to Track the Number of Times a Function Is Called
Robot Framework Using Python, Key Press Without Selecting Any Button or Element in the Page
Spliting a Row to Multiple Row Pyspark
How to Locate Elements on Webpage With Headless Chrome
How to Assign Values to a Numpy Array as a Function of Index
Add One Month to a Given Date (Rounded Day After) With Python
How to Crop the Black Background of the Image Using Opencv in Python