Count the Frequency That a Value Occurs in a Dataframe Column

Count the frequency that a value occurs in a dataframe column

Use value_counts() as @DSM commented.

In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df['a'].value_counts()

Out[37]:

b 3
a 2
s 2
dtype: int64

Also groupby and count. Many ways to skin a cat here.

In [38]:
df.groupby('a').count()

Out[38]:

a
a
a 2
b 3
s 2

[3 rows x 1 columns]

See the online docs.

If you wanted to add frequency back to the original dataframe use transform to return an aligned index:

In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df

Out[41]:

a freq
0 a 2
1 b 3
2 s 2
3 s 2
4 b 3
5 a 2
6 b 3

[7 rows x 2 columns]

Count frequency of values in pandas DataFrame column

You can use value_counts and to_dict:

print df['status'].value_counts()
N 14
S 4
C 2
Name: status, dtype: int64

counts = df['status'].value_counts().to_dict()
print counts
{'S': 4, 'C': 2, 'N': 14}

Count the frequency that a bunch of values occurs in a dataframe column

IIUC, use pd.cut:

out = df.groupby(pd.cut(df['col2'], np.linspace(0, 1, 101)))['col1'].sum()
print(out)

# Output
col2
(0.0, 0.01] 33
(0.01, 0.02] 0
(0.02, 0.03] 31
(0.03, 0.04] 12
(0.04, 0.05] 0
..
(0.95, 0.96] 0
(0.96, 0.97] 0
(0.97, 0.98] 0
(0.98, 0.99] 0
(0.99, 1.0] 0
Name: col1, Length: 100, dtype: int64

count the frequency that a value occurs in the end of a dataframe column

df['category'] = np.where(df['category'] == "cat b", df['category'],np.nan)
df['category'].bfill().isna().sum()
>>>4

Python Pandas Counting the Occurrences of a Specific value

You can create subset of data with your condition and then use shape or len:

print df
col1 education
0 a 9th
1 b 9th
2 c 8th

print df.education == '9th'
0 True
1 True
2 False
Name: education, dtype: bool

print df[df.education == '9th']
col1 education
0 a 9th
1 b 9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

Performance is interesting, the fastest solution is compare numpy array and sum:

graph

Code:

import perfplot, string
np.random.seed(123)


def shape(df):
return df[df.education == 'a'].shape[0]

def len_df(df):
return len(df[df['education'] == 'a'])

def query_count(df):
return df.query('education == "a"').education.count()

def sum_mask(df):
return (df.education == 'a').sum()

def sum_mask_numpy(df):
return (df.education.values == 'a').sum()

def make_df(n):
L = list(string.ascii_letters)
df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
return df

perfplot.show(
setup=make_df,
kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')

Count freq of one column values in pandas dataframe and tag each row with its frequency occurence number

You can use transform and take the index(after reset_index) as the value and then plus one(as new index starts from 0).

df['freq2'] = df.groupby('B')['B'].transform(lambda x: x.reset_index().index).add(1)

A B freq freq2
0 foo a 4 1
1 bar b 5 1
2 g2g a 4 2
3 g2g b 5 2
4 g2g b 5 3
5 bar b 5 4
6 bar a 4 3
7 foo a 4 4
8 bar b 5 5

Count the frequency of strings found (in any order) in another column and return result in a new column

You can create a custom function that takes a row as an input, and then apply it to the dataframe rowwise using the argument axis=1:

def count_keywords(row):
freq = 0
for word in row['Keyword'].split(" "):
if word in row['Cluster']:
freq += 1
return freq

df['Frequency'] = df.apply(lambda row: count_keywords(row), axis=1)

Output:

>>> df
Keyword Cluster Frequency
0 Nike Nike Socks 1
1 Nike Socks Nike Socks 2
2 Nike Stripy Socks Nike Socks 2
3 Socks Nike Nike Socks 2
4 Adidas Socks Nike Socks 1


Related Topics



Leave a reply



Submit