Count frequency of values in pandas DataFrame column
You can use value_counts
and to_dict
:
print df['status'].value_counts()
N 14
S 4
C 2
Name: status, dtype: int64
counts = df['status'].value_counts().to_dict()
print counts
{'S': 4, 'C': 2, 'N': 14}
How to count the frequency of the elements in a dataframe's row?
Given your df
:
>>> df
col1 col2 col3 col4 col5
0 pqr abc pqr NaN NaN
You can use stack()
and value_counts()
:
df.stack().value_counts().reset_index().\
rename({'index':'code',
0:'count'},axis=1)
Prints:
code count
0 pqr 2
1 abc 1
Count the frequency that a value occurs in a dataframe column
Use value_counts()
as @DSM commented.
In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df['a'].value_counts()
Out[37]:
b 3
a 2
s 2
dtype: int64
Also groupby
and count
. Many ways to skin a cat here.
In [38]:
df.groupby('a').count()
Out[38]:
a
a
a 2
b 3
s 2
[3 rows x 1 columns]
See the online docs.
If you wanted to add frequency back to the original dataframe use transform
to return an aligned index:
In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df
Out[41]:
a freq
0 a 2
1 b 3
2 s 2
3 s 2
4 b 3
5 a 2
6 b 3
[7 rows x 2 columns]
How to count frequency of values across the entire dataframe
You can use the stack function to stack all values in one column, and then use value_counts
:
df.stack().value_counts()
Splitting and counting the frequency of the elements of a Pandas column
Try with the something as follows
from collections import Counter
def most_common_words(labels, quantity):
"""
Split all words present in list and count how many times it
is repeated in the list.
Args:
labels (list): List of strings to split.
quantity (int): Amount of most common words to return.
Returns:
counter (liste): List of words splitted with its number of ocurrences.
"""
#words = [i.split(" ", 3)[0] for i in labels]
#counter = Counter(words).most_common(quantity)
words = [(re.split('(;|,|-| |\*|\n)', i)) for i in labels]
counter = Counter(x for xs in words for x in set(xs)).most_common(quantity)
df = pd.DataFrame(counter, columns=["Word", "Occurence number"])\
.sort_values(by="Occurence number", ascending=True)
df = df[df["Word"] != " "].reset_index(drop=True)
return df
df_most_common_words = most_common_words(data_copy["col"].tolist(), 20)
print(df_most_common_words)
the output
Word Occurence number
19 Repetition 8946
18 Government 9159
17 SACMEQ: 11502
16 Gross 12993
15 PIAAC: 20874
14 PISA: 21087
13 TIMSS: 21300
12 Africa 21513
11 Enrolment 21939
In your case, you can do something as follows.
col_a = ['A', 'B', 'A-E', 'a', 'A;e', 'B;e', 'A;B', 'C A', 'As']
df = pd.DataFrame(col_a, columns=['col_a'])
df
col_a
0 A
1 B
2 A-E
3 a
4 A;e
5 B;e
6 A;B
7 C A
8 As
df['col_a'] = df['col_a'].str.replace('-',' ').str.replace(';',' ')
df
col_a
0 A
1 B
2 A E
3 a
4 A e
5 B e
6 A B
7 C A
8 As
df_most_common_words = most_common_words(df["col_a"].tolist(), 20)
df_most_common_words
Word Occurence number
0 E 1
1 a 1
2 C 1
3 As 1
4 e 2
5 B 3
6 A 5
Counting the frequency of an element in a data frame
You can turn your data.frame
to a vector
and then use table
df <- read.table(text = " b c e f g h j
1 Dxa2 Dxa2 Dxa2 Dxa2 Dxa2 Dxa2 Dxa2
2 Dxb2 Dxb2 Dxb2 Dxb2 Dxc2 Dxc2 Dxc2
3 Dxd2 Dxi2 tneg tpos Dxd2 Dxi2 tneg", header = TRUE, row.names = 1)
table(unlist(df))
## Dxa2 Dxb2 Dxd2 Dxi2 tneg tpos Dxc2
## 7 4 2 2 2 1 3
You can turn the result to a data.frame
too
as.data.frame(table(unlist(df)))
## Var1 Freq
## 1 Dxa2 7
## 2 Dxb2 4
## 3 Dxd2 2
## 4 Dxi2 2
## 5 tneg 2
## 6 tpos 1
## 7 Dxc2 3
How to count frequency of elements from a columns of lists in pyspark dataframe?
The following command explodes the array, and provides the count of each element
import pyspark.sql.functions as F
df_ans = (df2
.withColumn("explode", F.explode("My_list"))
.groupBy("explode")
.count()
.orderBy(F.desc("count"))
the result is
Related Topics
Outputting Multiple Lines of Text with Rendertext() in R Shiny
Ggplot Geom_Bar: Meaning of Aes(Group = 1)
Differencebetween Cat and Print
How to Replace Na (Missing Values) in a Data Frame with Neighbouring Values
Set the Size of Ggsave Exactly
How to Programmatically Extract/Unzip a .7Z (7-Zip) File with R
Apply a Ggplot-Function Per Group with Dplyr and Set Title Per Group
In R Data.Table, How to Pass Variable Parameters to an Expression
Apply a Function to a Subset of Data.Table Columns, by Column-Indices Instead of Name
Suggestions for Speeding Up Random Forests
Selecting Columns in R Data Frame Based on Those *Not* in a Vector
Identifying Dependencies of R Functions and Scripts