Sql: Returning the Most Common Value for Each Person

SQL: Returning the most common value for each person

Preliminary comment

Please learn to use the explicit JOIN notation, not the old (pre-1992) implicit join notation.

Old style:

SELECT transactionTable.rating as MostCommonRating 
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1

Preferred style:

SELECT transactionTable.rating AS MostCommonRating 
FROM personTable
JOIN transactionTable
ON personTable.transactionid = transactionTable.transactionid
WHERE personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1

You need an ON condition for each JOIN.

Also, the personID values in the data are strings, not numbers, so you'd need to write

 WHERE personTable.personid = "Ben"

for example, to get the query to work on the tables shown.


Main answer

You're seeking to find an aggregate of an aggregate: in this case, the maximum of a count. So, any general solution is going to involve both MAX and COUNT. You can't apply MAX directly to COUNT, but you can apply MAX to a column from a sub-query where the column happens to be a COUNT.

Build the query up using Test-Driven Query Design — TDQD.

Select person and transaction rating

SELECT p.PersonID, t.Rating, t.TransactionID
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID

Select person, rating, and number of occurrences of rating

SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating

This result will become a sub-query.

Find the maximum number of times the person gets any rating

SELECT s.PersonID, MAX(s.RatingCount)
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID

Now we know which is the maximum count for each person.

Required result

To get the result, we need to select the rows from the sub-query which have the maximum count. Note that if someone has 2 Good and 2 Bad ratings (and 2 is the maximum number of ratings of the same type for that person), then two records will be shown for that person.

SELECT s.PersonID, s.Rating
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount

If you want the actual rating count too, that's easily selected.

That's a fairly complex piece of SQL. I would hate to try writing that from scratch. Indeed, I probably wouldn't bother; I'd develop it step-by-step, more or less as shown. But because we've debugged the sub-queries before we use them in bigger expressions, we can be confident of the answer.

WITH clause

Note that Standard SQL provides a WITH clause that prefixes a SELECT statement, naming a sub-query. (It can also be used for recursive queries, but we aren't needing that here.)

WITH RatingList AS
(SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
)
SELECT s.PersonID, s.Rating
FROM RatingList AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM RatingList AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount

This is simpler to write. Unfortunately, MySQL does not yet support the WITH clause.


The SQL above has now been tested against IBM Informix Dynamic Server 11.70.FC2 running on Mac OS X 10.7.4. That test exposed the problem diagnosed in the preliminary comment. The SQL for the main answer worked correctly without needing to be changed.

SQL SSMS return most frequent value for each personal id

Assuming you just want to create a report about the favorite fruit per person you can use this query:

with cte as (
select p_id, fruit_bought, row_number() over (partition by p_id order by count(*) desc) as rn
from t
group by p_id, fruit_bought
)
select p_id, fruit_bought as favorite_fruit
from cte
where rn = 1

Find most frequent value in SQL column

SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`

FROM
<my_table>

GROUP BY
<column_name>

ORDER BY
`value_occurrence` DESC

LIMIT 1;

Replace <column_name> and <my_table>. Increase 1 if you want to see the N most common values of the column.

Find the most common value in a particular group

This value is called the mode in statistics. It is easy to calculate:

with cte as (<your query here>)
select timeblock, account
from (select timeblock, account, count(*) as cnt,
row_number() over (partition by timeblock order by count(*) desc) as seqnum
from cte
group by timeblock, account
) t
where seqnum = 1;

In the event of ties for the most common, this returns one value arbitrarily. If you want all of them, then use rank() or dense_rank().

SQL Select most common values

This will do:

select age from persons
group by age
having count(*) = (
select count(*) from persons
group by age
order by count(*) desc
limit 1)

SQL Server : most frequent value in each row

Considering these values are in separate columns, with an UNPIVOT query the solution would look something like.....

Test Data

Declare @T table (ID INT , Col1 varchar(1) , Col2 varchar(1) , Col3 varchar(1)
, Col4 varchar(1) , Col5 varchar(1) , Col6 varchar(1) , Col7 varchar(1))
Insert Into @T values
('1','a','d','a','a','c','a','b'),
('2','b','a','c','b','b','b','d'),
('3','h','a','h','h','b','c','d'),
('4','d','d','c','h','g','p','m'),
('5','e','e','g','h','d','e','h');

Query

WITH X AS (
Select ID , Val, COUNT(*)total
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY COUNT(*) DESC) rn
from @T
UNPIVOT (Val FOR N IN (Col1,Col2,Col3,Col4,Col5,Col6,Col7))up
GROUP BY ID , Val
)
Select t.* , Val
FROM X
INNER JOIN @T t ON x.ID = t.ID
WHERE rn = 1

Result Set

+----+------+------+------+------+------+------+------+-----+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Val |
+----+------+------+------+------+------+------+------+-----+
| 1 | a | d | a | a | c | a | b | a |
| 2 | b | a | c | b | b | b | d | b |
| 3 | h | a | h | h | b | c | d | h |
| 4 | d | d | c | h | g | p | m | d |
| 5 | e | e | g | h | d | e | h | e |
+----+------+------+------+------+------+------+------+-----+

Find the most common value per id

Wit row_number() window function:

select t.year, t.genre
from (
select year, genre,
row_number() over (partition by year order by count(*) desc) rn
from Oscar
group by year, genre
) t
where t.rn = 1

See the demo.

Results:

| year | genre   |
| ---- | ------- |
| 2016 | Action |
| 2017 | Romance |
| 2018 | Fantasy |
| 2019 | Fantasy |
| 2020 | Action |

If you want ties in the results use rank() instead of row_number().



Related Topics



Leave a reply



Submit