Python Pandas Counting the Occurrences of a Specific Value

Python Pandas Counting the Occurrences of a Specific value

You can create subset of data with your condition and then use shape or len:

print df
col1 education
0 a 9th
1 b 9th
2 c 8th

print df.education == '9th'
0 True
1 True
2 False
Name: education, dtype: bool

print df[df.education == '9th']
col1 education
0 a 9th
1 b 9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

Performance is interesting, the fastest solution is compare numpy array and sum:

graph

Code:

import perfplot, string
np.random.seed(123)

def shape(df):
return df[df.education == 'a'].shape[0]

def len_df(df):
return len(df[df['education'] == 'a'])

def query_count(df):
return df.query('education == "a"').education.count()

def sum_mask(df):
return (df.education == 'a').sum()

def sum_mask_numpy(df):
return (df.education.values == 'a').sum()

def make_df(n):
L = list(string.ascii_letters)
df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
return df

perfplot.show(
setup=make_df,
kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')

Python Dataframes: Count how many of occurrences of value on specific date and delete based on condition

We can use transform here with groupby:

s = df["Business hours"].eq("Yes").groupby(df["Date"]).transform("Sum")
df[s >= 7]

Python Pandas, counting all the occurrences of an item in a CSV and assigning that count to a unique name for that item

Try this:

pd.pivot_table(df, index='Machine Name', values=['Total Revenue','Machine Use Count'], aggfunc=max)

What is the most efficient way of counting occurrences in pandas?

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you'll save some time. I'm not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you'll do much better than that.

Efficient way in Pandas to count occurrences of Series of values by row

We can compare the transposed df.T directly to the df.max series, thanks to broadcasting:

(df.T == df.max(axis=1)).sum()

# result
0 2
1 1
2 1
3 2
dtype: int64

(Transposing also has the added benefit that we can use sum without specifying the axis, i.e. with the default axis=0.)

how to count occurrences of specific string in previous x rows

You can use a boolean to see when a particular event is occurring, then perform a rolling sum on the boolean series. As @mozway pointed out, the argument min_periods=1 will avoid NaN appearing at the beginning of the resulting DataFrame:

df['walking_count'] = (df['event'] == 'walking').rolling(5, min_periods=1).sum()

This sets a new column 'walking_count' to the following series:

0     0.0
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 5.0
8 4.0
9 3.0
10 2.0
11 1.0

Count occurrences of specific value in column based on categories of another column

Try sum and mean on the condition Clicks==1. Since you're working with groups, put them in groupby:

df['Clicks'].eq(1).groupby(df['Categories']).agg(['sum','mean'])

Output:

            sum      mean
Categories
1 1 0.333333
2 2 0.500000
3 1 0.333333
4 1 0.500000

To match output's naming, use named aggregation:

df['Clicks'].eq(1).groupby(df['Categories']).agg(Click_counts='sum', Clicks_prob='mean')

Output:

            Click_counts  Clicks_prob
Categories
1 1 0.333333
2 2 0.500000
3 1 0.333333
4 1 0.500000


Related Topics



Leave a reply



Submit