Select the N Most Frequent Values in a Variable

Select the n most frequent values in a variable

We can count the number of values using table, sort them in decreasing order and select first 2 (or 10) values, get the corresponding ID's and subset those ID's from the data frame.

df[df$ID %in% names(sort(table(df$ID), decreasing = TRUE)[1:2]), ]

# ID col
#1 A blue
#2 A purple
#3 A green
#6 C red
#7 C blue
#8 C yellow
#9 C orange

Pandas get the most frequent values of a column

By using mode

df.name.mode()
Out[712]:
0 alex
1 helen
dtype: object

Find the n most common values in a vector

I'm sure this is a duplicate, but the answer is simple:

sort(table(variable),decreasing=TRUE)[1:3]

Find most frequent value in SQL column

SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`

FROM
<my_table>

GROUP BY
<column_name>

ORDER BY
`value_occurrence` DESC

LIMIT 1;

Replace <column_name> and <my_table>. Increase 1 if you want to see the N most common values of the column.

How to select most common values in a column

starwars %>%
count(homeworld, sort = TRUE) %>%
slice(1:2) %>%
left_join(starwars)

Result

Joining, by = "homeworld"
# A tibble: 21 x 15
homeworld n name height mass hair_color skin_color eye_color birth_year sex gender species films vehicles starships
<chr> <int> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <list> <list> <list>
1 Naboo 11 R2-D2 96 32 NA white, blue red 33 none masculine Droid <chr [7… <chr [0… <chr [0]>
2 Naboo 11 Palpatine 170 75 grey pale yellow 82 male masculine Human <chr [5… <chr [0… <chr [0]>
3 Naboo 11 Jar Jar Bin… 196 66 none orange orange 52 male masculine Gungan <chr [2… <chr [0… <chr [0]>
4 Naboo 11 Roos Tarpals 224 82 none grey orange NA male masculine Gungan <chr [1… <chr [0… <chr [0]>
5 Naboo 11 Rugor Nass 206 NA none green orange NA male masculine Gungan <chr [1… <chr [0… <chr [0]>
6 Naboo 11 Ric Olié 183 NA brown fair blue NA NA NA NA <chr [1… <chr [0… <chr [1]>
7 Naboo 11 Quarsh Pana… 183 NA black dark brown 62 NA NA NA <chr [1… <chr [0… <chr [0]>
8 Naboo 11 Gregar Typho 185 85 black dark brown NA male masculine Human <chr [1… <chr [0… <chr [1]>
9 Naboo 11 Cordé 157 NA brown light brown NA female feminine Human <chr [1… <chr [0… <chr [0]>
10 Naboo 11 Dormé 165 NA brown light brown NA female feminine Human <chr [1… <chr [0… <chr [0]>
# … with 11 more rows

Dataframe - get most frequent values and their count

I believe you need DataFrame with 2 columns filled by top10 values:

 df1 = df['product'].value_counts().iloc[:10].rename_axis('val').reset_index(name='count')

data.table get N most frequent values by group

Edit: improved.

I think you were exactly on the right track. One key thing you're missing, however, is the function frank, which has been optimized and should speed up your code considerably (runs almost instantaneously on your 3m row sample data):

d[ , .(purch_count = .N), 
by = .(purch_cat, purch_zip)
][, purch_rank := frank(-purch_count, ties.method = 'min'),
keyby = purch_cat
][purch_rank <= 3,
][order(purch_cat, purch_rank)]
purch_cat purch_zip purch_count purch_rank
1: condo 39169 32 1
2: condo 15725 31 2
3: condo 75768 30 3
4: condo 72023 30 3
5: home 71294 30 1
6: home 56053 30 1
7: home 57971 29 3
8: home 77521 29 3
9: home 70124 29 3
10: home 25302 29 3
11: home 65292 29 3
12: home 39488 29 3
13: townhouse 39587 33 1
14: townhouse 80365 30 2
15: townhouse 37360 30 2


Incomplete answer with table (slow):

Yes, one way involves using table.

d[ , {x <- table(purch_zip)
x <- x[order(-x)]
names(x[x %in% unique(x)[1:3]])
}, keyby = purch_cat]
purch_cat V1
1: condo 39169
2: condo 15725
3: condo 72023
4: condo 75768
5: home 56053
6: home 71294
7: home 25302
8: home 39488
9: home 57971
10: home 65292
11: home 70124
12: home 77521
13: home 16943
14: home 43003
15: home 43426
16: home 76501
17: home 81754
18: home 88978
19: townhouse 39587
20: townhouse 37360
21: townhouse 80365
22: townhouse 22402
23: townhouse 33518
24: townhouse 59347
25: townhouse 83099
purch_cat V1


Related Topics



Leave a reply



Submit