How to Count Values Greater Than the Group Mean in Pandas

How to count values greater than the group mean in Pandas?

You would need the agg method

In [28]: df.groupby(['col1', 'col2']).agg(lambda x: (x > x.mean()).sum())
Out[28]:
col3 col4 col5 col6
col1 col2
A B 1.0 2.0 2.0 2.0
C D 2.0 2.0 2.0 2.0
E F 1.0 1.0 1.0 1.0
G H 0.0 0.0 0.0 0.0

In essence, the x is going to be array-like. x > x.mean() gives True if the element is larger than the mean and 0 otherwise, sum then counts the number of Trues.

Count items greater than a value in pandas groupby

You can try to do :

reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()

Pandas groupby count values greater than given values in each group

IIUC

(df.set_index('item')).sub(c.set_index('item').reindex(df.item).value,axis=0).gt(0).groupby(level=0).sum()
Out[646]:
A B C
item
a 1.0 1.0 0.0
b 0.0 2.0 1.0
c 1.0 1.0 1.0
d 2.0 0.0 0.0

Counting values greater than each value in a Pandas series following groupby

df = pd.DataFrame({'group': {0: 'a', 1: 'a', 2: 'a', 3: 'a', 4: 'a', 5: 'b', 6: 'b', 7: 'b', 8: 'b', 9: 'b'}, 'value': {0: 2, 1: 4, 2: 2, 3: 3, 4: 5, 5: 1, 6: 2, 7: 4, 8: 1, 9: 5}})
# You may want other methods of rank, but it's not clear from your question.
df['count_in_group'] = df.groupby('group').rank('min').sub(1)
print(df.sort_values(['group', 'value']))
...

group value count_in_group
0 a 2 0.0
2 a 2 0.0
3 a 3 2.0
1 a 4 3.0
4 a 5 4.0
5 b 1 0.0
8 b 1 0.0
6 b 2 2.0
7 b 4 3.0
9 b 5 4.0

Pandas groupby count values above threshold

Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x: instead.

df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])

The logic here: (x > 0) returns True/False bool; *1 turns the bool to an integer (1 = True, 0 = False); .sum() will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.

Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.

Count the value of a column if is greater than 0 in a groupby result

While working on your problem, I also wanted to see if I can get the average percentage for B (while ignoring 0s). I was able to accomplish this as well while getting the counts.

DataFrame for this exercise:

     A   B       C
0 a1 B1 0.00
1 a1 B1 0.00
2 a1 B1 98.87
3 a1 B1 101.10
4 a1 B2 106.67
5 a1 B2 103.00
6 a2 B1 0.00
7 a2 B1 0.00
8 a2 B1 33.00
9 a2 B1 100.00
10 a2 B2 80.00
11 a3 B1 90.00
12 a3 B2 99.00

Average while excluding the zeros

for this I had to add .replace(0, np.nan) before the groupby function.

A = ['a1','a1','a1','a1','a1','a1','a2','a2','a2','a2','a2','a3','a3']
B = ['B1','B1','B1','B1','B2','B2','B1','B1','B1','B1','B2','B1','B2']
C = [0,0,98.87,101.1,106.67,103,0,0,33,100,80,90,99]
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':A,'B':B,'C':C})

df = pd.DataFrame(df.replace(0, np.nan)
.groupby(['A', 'B'])
.agg({'B':'size','C':['count','mean']})
.rename(columns={'size':'Count','count':'Passed','mean':'Avg Score'})).unstack(level=1)
df.columns = df.columns.droplevel(0)

Count Passed Avg Score
B B1 B2 B1 B2 B1 B2
A
a1 4 2 2 2 99.985 104.835
a2 4 1 2 1 66.500 80.000
a3 1 1 1 1 90.000 99.000

Pandas group by, sum greater than and count

Use DataFrame.assign for create new column filled by NaNs for values <= 15 by Series.where, then is used named aggregation:

final = (test.assign(new = test['VALUE'].where(test['VALUE'] > 15))
.groupby(['DAY','MONTH','TYPE'])
.aggregate(sum = ('new', 'sum'),
count = ('VALUE', 'count')))

How to count value greater than or equal to 0.5 continuous for 5 or greater than 5 rows python

As the source DataFrame I took:

      x    y    z    n
0 0.1 1.0 1.0 1.0
1 0.5 1.0 1.0 1.0
2 0.6 1.0 1.0 1.0
3 0.7 1.0 1.0 1.0
4 0.6 1.0 1.0 1.0
5 0.5 1.0 1.0 1.0
6 0.1 1.0 1.0 1.0
7 0.5 1.0 1.0 1.0
8 0.6 1.0 1.0 1.0
9 0.7 1.0 1.0 1.0
10 0.1 1.0 1.0 1.0
11 0.5 1.0 1.0 1.0
12 0.6 1.0 1.0 1.0
13 0.7 1.0 1.0 1.0
14 0.7 1.0 1.0 1.0
15 0.6 1.0 1.0 1.0
16 0.5 1.0 1.0 1.0
17 0.1 1.0 1.0 1.0
18 0.5 2.0 1.0 1.0
19 0.6 2.0 1.0 1.0
20 0.7 2.0 1.0 1.0
21 0.6 2.0 1.0 1.0
22 0.5 2.0 1.0 1.0

(one group for (y, z, n) == (1.0, 1.0, 1.0) and another for (2.0, 1.0, 1.0)).

Start from import itertools as it.

Then define the following function to get the count of your "wanted"
elements from the current group:

def getCnt(grp):
return sum(filter(lambda x: x >= 5, [ len(list(group))
for key, group in it.groupby(grp.x, lambda elem: elem >= 0.5)
if key ]))

Note that it contains it.groupby, i.e. groupby function from itertools
(not the pandasonic version of it).

The difference is that the itertools version starts a new group on each change
of the grouping key (by default, the value of the source element).

Steps:

  • it.groupby(grp.x, lambda elem: elem >= 0.5) - create an iterator,
    returning pairs (key, group), from x column of the current group.
    The key states whether the current group (from itertools grouping)
    contains your "wanted" elements (>= 0.5) and the group contains these
    elements.
  • [ len(list(group)) for key, group in … if key ] - get a list of
    lengths of groups, excluding groups of "smaller" elements.
  • filter(lambda x: x >= 5, …) - filter the above list, leaving only counts
    of groups with 5 or more members.
  • sum(…) - sum the above counts.

Then, to get your expected result, as a DataFrame, apply this function to
each group of rows, this time grouping with the pandasonic version of
groupby.

Then set the name of the resulting Series (it will be the column name
in the final result) and reset the index, to convert it to a DataFrame.

The code to do it is:

result = df.groupby(['y','z','n']).apply(getCnt).rename('Cnt').reset_index()

The result is:

     y    z    n  Cnt
0 1.0 1.0 1.0 11
1 2.0 1.0 1.0 5

How to count number of rows per group greater than the average of that group in pandas group by?

You could also aggregate using a lambda function as the following:

df.groupby(['col1', 'col2']).agg(['mean', 'count', 
lambda x: (x > x.mean()).sum()])


Related Topics



Leave a reply



Submit