Select the N Most Frequent Values in a Variable

Select the n most frequent values in a variable

We can count the number of values using table, sort them in decreasing order and select first 2 (or 10) values, get the corresponding ID's and subset those ID's from the data frame.

df[df$ID %in% names(sort(table(df$ID), decreasing = TRUE)[1:2]), ]

#  ID    col
#1  A   blue
#2  A purple
#3  A  green
#6  C    red
#7  C   blue
#8  C yellow
#9  C orange

Pandas get the most frequent values of a column

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object

Find the n most common values in a vector

I'm sure this is a duplicate, but the answer is simple:

sort(table(variable),decreasing=TRUE)[1:3]

Find most frequent value in SQL column

SELECT
  <column_name>,
  COUNT(<column_name>) AS `value_occurrence` 

FROM
  <my_table>

GROUP BY 
  <column_name>

ORDER BY 
  `value_occurrence` DESC

LIMIT 1;

Replace <column_name> and <my_table>. Increase 1 if you want to see the N most common values of the column.

How to select most common values in a column

starwars %>%
  count(homeworld, sort = TRUE) %>%
  slice(1:2) %>%
  left_join(starwars)

Result

Joining, by = "homeworld"
# A tibble: 21 x 15
   homeworld     n name         height  mass hair_color skin_color  eye_color birth_year sex    gender    species films    vehicles starships
   <chr>     <int> <chr>         <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr>  <chr>     <chr>   <list>   <list>   <list>   
 1 Naboo        11 R2-D2            96    32 NA         white, blue red               33 none   masculine Droid   <chr [7… <chr [0… <chr [0]>
 2 Naboo        11 Palpatine       170    75 grey       pale        yellow            82 male   masculine Human   <chr [5… <chr [0… <chr [0]>
 3 Naboo        11 Jar Jar Bin…    196    66 none       orange      orange            52 male   masculine Gungan  <chr [2… <chr [0… <chr [0]>
 4 Naboo        11 Roos Tarpals    224    82 none       grey        orange            NA male   masculine Gungan  <chr [1… <chr [0… <chr [0]>
 5 Naboo        11 Rugor Nass      206    NA none       green       orange            NA male   masculine Gungan  <chr [1… <chr [0… <chr [0]>
 6 Naboo        11 Ric Olié        183    NA brown      fair        blue              NA NA     NA        NA      <chr [1… <chr [0… <chr [1]>
 7 Naboo        11 Quarsh Pana…    183    NA black      dark        brown             62 NA     NA        NA      <chr [1… <chr [0… <chr [0]>
 8 Naboo        11 Gregar Typho    185    85 black      dark        brown             NA male   masculine Human   <chr [1… <chr [0… <chr [1]>
 9 Naboo        11 Cordé           157    NA brown      light       brown             NA female feminine  Human   <chr [1… <chr [0… <chr [0]>
10 Naboo        11 Dormé           165    NA brown      light       brown             NA female feminine  Human   <chr [1… <chr [0… <chr [0]>
# … with 11 more rows

Dataframe - get most frequent values and their count

I believe you need DataFrame with 2 columns filled by top10 values:

 df1 = df['product'].value_counts().iloc[:10].rename_axis('val').reset_index(name='count')

data.table get N most frequent values by group

Edit: improved.

I think you were exactly on the right track. One key thing you're missing, however, is the function frank, which has been optimized and should speed up your code considerably (runs almost instantaneously on your 3m row sample data):

d[ , .(purch_count = .N), 
  by = .(purch_cat, purch_zip)
  ][, purch_rank := frank(-purch_count, ties.method = 'min'), 
    keyby = purch_cat
    ][purch_rank <= 3,
      ][order(purch_cat, purch_rank)]
    purch_cat purch_zip purch_count purch_rank
 1:     condo     39169          32          1
 2:     condo     15725          31          2
 3:     condo     75768          30          3
 4:     condo     72023          30          3
 5:      home     71294          30          1
 6:      home     56053          30          1
 7:      home     57971          29          3
 8:      home     77521          29          3
 9:      home     70124          29          3
10:      home     25302          29          3
11:      home     65292          29          3
12:      home     39488          29          3
13: townhouse     39587          33          1
14: townhouse     80365          30          2
15: townhouse     37360          30          2

Incomplete answer with `table` (slow):

Yes, one way involves using table.

d[ , {x <- table(purch_zip)
x <- x[order(-x)]
names(x[x %in% unique(x)[1:3]])
}, keyby = purch_cat]
    purch_cat    V1
 1:     condo 39169
 2:     condo 15725
 3:     condo 72023
 4:     condo 75768
 5:      home 56053
 6:      home 71294
 7:      home 25302
 8:      home 39488
 9:      home 57971
10:      home 65292
11:      home 70124
12:      home 77521
13:      home 16943
14:      home 43003
15:      home 43426
16:      home 76501
17:      home 81754
18:      home 88978
19: townhouse 39587
20: townhouse 37360
21: townhouse 80365
22: townhouse 22402
23: townhouse 33518
24: townhouse 59347
25: townhouse 83099
    purch_cat    V1

Select the N Most Frequent Values in a Variable