Select Top Distinct Results Ordered by Frequency

Frequency of distinct values SQL

You can use aggregation, if I understand correctly:

SELECT ROUND(tripduration/60) AS duration, COUNT(*) as frequency
FROM TABLE
GROUP BY ROUND(tripduration/60)
ORDER BY COUNT(*) DESC;

Order SQL query records by frequency

SELECT   `column`,
COUNT(`column`) AS `count`
FROM `table`
GROUP BY `column`
ORDER BY `count` DESC

Quick PoC:


mysql> CREATE TABLE `table` (`id` SERIAL, `column` char(6) NOT NULL, KEY `column_idx`(`column`));
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO `table` (`column`) VALUES ('value1'), ('value1'), ('value1'), ('value1'), ('value1'), ('value2'), ('value2'), ('value2'), ('value3'), ('value3');
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM `table`;
+----+--------+
| id | column |
+----+--------+
| 1 | value1 |
| 2 | value1 |
| 3 | value1 |
| 4 | value1 |
| 5 | value1 |
| 6 | value2 |
| 7 | value2 |
| 8 | value2 |
| 9 | value3 |
| 10 | value3 |
+----+--------+
10 rows in set (0.00 sec)

mysql> SELECT `column`,
-> COUNT(`column`) AS `count`
-> FROM `table`
-> GROUP BY `column`
-> ORDER BY `count` DESC;
+--------+-------+
| column | count |
+--------+-------+
| value1 | 5 |
| value2 | 3 |
| value3 | 2 |
+--------+-------+
3 rows in set (0.00 sec)

SQL Select top frequent records

SELECT TOP(20) [Name], Count(*) FROM Table
WHERE [Group] = 1
GROUP BY [Name]
ORDER BY Count(*) DESC

SQL - get distinct values and its frequency count of occurring in a group

You can try below - use count(distinct s_product_id)

select search_product_result, count(distinct s_product_id) as Total 
from serp group by search_product_result

I want to sort a list by frequency of values and distinct it c#

Fairly easy with Linq's GroupBy:

var input = new [] {1,1,1,2,2,3,3,3,4,4};
var output = input.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.Select(x => x.Key)
.ToList();
  • GroupBy(x => x): Creates a list of 4 groups. Each group has a key which is the number, and values which are the members of the group. So you'll have something like { 1: [1, 1, 1], 2: [2, 2], 3: [3, 3, 3], 4: [4, 4] }
  • OrderByDescending(x => x.Count()): order the groups by the number of items in the group, with the largest group first. So you get { 1: [1, 1, 1], 3: [3, 3, 3], 2: [2, 2], 4: [4, 4] }
  • Select(x => x.Key): Take the key from each group, so you get [1, 3, 2, 4]
  • ToList(): turn it all into a list

If there are two groups with equal numbers of items -- in your example, there are three 1's and three 3's -- then this will sort them in the order that they appeared in the input (so, here, the output is [1, 3, 2, 4], because 1 comes before 3 in the input).

This is because of the ordering behaviour of GroupBy (see Remarks):

The IGrouping objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping. Elements in a grouping are yielded in the order they appear in source.

and the fact that OrderByDescending is stable (again, see the Remarks), so if two items compare equally their order is preserved).

This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved. In contrast, an unstable sort does not preserve the order of elements that have the same key.

SQL to find the number of distinct values in a column

You can use the DISTINCT keyword within the COUNT aggregate function:

SELECT COUNT(DISTINCT column_name) AS some_alias FROM table_name

This will count only the distinct values for that column.

how to select top n values from a data frame retaining the duplicates in r

Assuming that the 'freq' is ordered in descending, we get the unique elements of 'freq', select the first 3 with head, use %in% to get the logical index of those elements that in the 'freq' column, and subset the rows.

subset(df1, freq %in% head(unique(freq),3))
# id freq
#1 1 4
#2 2 3
#3 3 2
#4 4 2

If we are using rank, then dense_rank from dplyr will be an option

library(dplyr)
df1 %>%
filter(dense_rank(-freq) < 4)

Or another option using frank from data.table (contributed by @David Arenburg),

library(data.table)
setDT(df)[, .SD[frank(-freq, ties.method = "dense") < 4]]


Related Topics



Leave a reply



Submit