Select the n most frequent values in a variable
We can count the number of values using table
, sort
them in decreasing
order and select first 2 (or 10) values, get the corresponding ID
's and subset those ID
's from the data frame.
df[df$ID %in% names(sort(table(df$ID), decreasing = TRUE)[1:2]), ]
# ID col
#1 A blue
#2 A purple
#3 A green
#6 C red
#7 C blue
#8 C yellow
#9 C orange
Pandas get the most frequent values of a column
By using mode
df.name.mode()
Out[712]:
0 alex
1 helen
dtype: object
Find the n most common values in a vector
I'm sure this is a duplicate, but the answer is simple:
sort(table(variable),decreasing=TRUE)[1:3]
Find most frequent value in SQL column
SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`
FROM
<my_table>
GROUP BY
<column_name>
ORDER BY
`value_occurrence` DESC
LIMIT 1;
Replace <column_name>
and <my_table>
. Increase 1
if you want to see the N
most common values of the column.
How to select most common values in a column
starwars %>%
count(homeworld, sort = TRUE) %>%
slice(1:2) %>%
left_join(starwars)
Result
Joining, by = "homeworld"
# A tibble: 21 x 15
homeworld n name height mass hair_color skin_color eye_color birth_year sex gender species films vehicles starships
<chr> <int> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <list> <list> <list>
1 Naboo 11 R2-D2 96 32 NA white, blue red 33 none masculine Droid <chr [7… <chr [0… <chr [0]>
2 Naboo 11 Palpatine 170 75 grey pale yellow 82 male masculine Human <chr [5… <chr [0… <chr [0]>
3 Naboo 11 Jar Jar Bin… 196 66 none orange orange 52 male masculine Gungan <chr [2… <chr [0… <chr [0]>
4 Naboo 11 Roos Tarpals 224 82 none grey orange NA male masculine Gungan <chr [1… <chr [0… <chr [0]>
5 Naboo 11 Rugor Nass 206 NA none green orange NA male masculine Gungan <chr [1… <chr [0… <chr [0]>
6 Naboo 11 Ric Olié 183 NA brown fair blue NA NA NA NA <chr [1… <chr [0… <chr [1]>
7 Naboo 11 Quarsh Pana… 183 NA black dark brown 62 NA NA NA <chr [1… <chr [0… <chr [0]>
8 Naboo 11 Gregar Typho 185 85 black dark brown NA male masculine Human <chr [1… <chr [0… <chr [1]>
9 Naboo 11 Cordé 157 NA brown light brown NA female feminine Human <chr [1… <chr [0… <chr [0]>
10 Naboo 11 Dormé 165 NA brown light brown NA female feminine Human <chr [1… <chr [0… <chr [0]>
# … with 11 more rows
Dataframe - get most frequent values and their count
I believe you need DataFrame
with 2 columns filled by top10 values:
df1 = df['product'].value_counts().iloc[:10].rename_axis('val').reset_index(name='count')
data.table get N most frequent values by group
Edit: improved.
I think you were exactly on the right track. One key thing you're missing, however, is the function frank
, which has been optimized and should speed up your code considerably (runs almost instantaneously on your 3m row sample data):
d[ , .(purch_count = .N),
by = .(purch_cat, purch_zip)
][, purch_rank := frank(-purch_count, ties.method = 'min'),
keyby = purch_cat
][purch_rank <= 3,
][order(purch_cat, purch_rank)]
purch_cat purch_zip purch_count purch_rank
1: condo 39169 32 1
2: condo 15725 31 2
3: condo 75768 30 3
4: condo 72023 30 3
5: home 71294 30 1
6: home 56053 30 1
7: home 57971 29 3
8: home 77521 29 3
9: home 70124 29 3
10: home 25302 29 3
11: home 65292 29 3
12: home 39488 29 3
13: townhouse 39587 33 1
14: townhouse 80365 30 2
15: townhouse 37360 30 2
Incomplete answer with table
(slow):
Yes, one way involves using table
.
d[ , {x <- table(purch_zip)
x <- x[order(-x)]
names(x[x %in% unique(x)[1:3]])
}, keyby = purch_cat]
purch_cat V1
1: condo 39169
2: condo 15725
3: condo 72023
4: condo 75768
5: home 56053
6: home 71294
7: home 25302
8: home 39488
9: home 57971
10: home 65292
11: home 70124
12: home 77521
13: home 16943
14: home 43003
15: home 43426
16: home 76501
17: home 81754
18: home 88978
19: townhouse 39587
20: townhouse 37360
21: townhouse 80365
22: townhouse 22402
23: townhouse 33518
24: townhouse 59347
25: townhouse 83099
purch_cat V1
Related Topics
Add X and Y Axis to All Facet_Wrap
How to Output the Columns With the Maximum Value
Error: Could Not Find Function ... in R
Aggregating by Unique Identifier and Concatenating Related Values into a String
Determine Path of the Executing Script
Elegant Way to Check For Missing Packages and Install Them
Sum Rows in Data.Frame or Matrix
How to Spread Repeated Measures of Multiple Variables into Wide Format
How to Add a Suffix (Or Prefix) Elements of an Existing List
Gsub a Every Element After a Keyword in R
Changing from Upper to Lower Case in Several Data Frames
Convert Continuous Numeric Values to Discrete Categories Defined by Intervals