Simple Way to Calculate Median With MySQL

Simple way to calculate median with MySQL

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

Calculating a simple median on a column in MySQL

I would just use distinct, with an empty OVER() clause:

SELECT DISTINCT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY my_column) OVER () median
FROM my_table

Calculating the Median with Mysql

val is your time column, x and y are two references to the data table (you can write data AS x, data AS y).

EDIT:
To avoid computing your sums twice, you can store the intermediate results.

CREATE TEMPORARY TABLE average_user_total_time 
(SELECT SUM(time) AS time_taken
FROM scores
WHERE created_at >= '2010-10-10'
and created_at <= '2010-11-11'
GROUP BY user_id);

Then you can compute median over these values which are in a named table.

EDIT: Temporary table won't work here. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.

Simple way to calculate median with MySQL

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

MySQL: Calculating Median of Values grouped by a Column

Your query computes row numbers using user variables, which makes it more complicated to handle partitions. Since you are using MySQL 8.0, I would suggest using window functions instead.

This should get you close to what you expect:

select 
SchoolName,
avg(Marks) as median_val
from (
select
SchoolName,
Marks,
row_number() over(partition by SchoolName order by Marks) rn,
count(*) over(partition by SchoolName) cnt
from tablename
) as dd
where rn in ( FLOOR((cnt + 1) / 2), FLOOR( (cnt + 2) / 2) )
group by SchoolName

The arithmetic stays the same, but we are using window functions in groups of records having the same SchoolName (instead of a global partition in your initial query). Then, the outer query filters and aggregate by SchoolName.

In your DB Fiddlde, this returns:

| SchoolName | median_val |
| ---------- | ---------- |
| A | 71 |
| B | 254 |
| C | 344 |
| D | 233.5 |

Simple way to calculate median with MySQL

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

How to calculate the median category wise in mysql

Assuming you reduce the set to the following. Note: id_student isn't required at this point in the calculation.

CREATE TABLE tscores (
id int primary key auto_increment
, region int
, id_student int
, total_score int
, index (region, total_score)
);

INSERT INTO tscores (region, id_student, total_score) VALUES
(1, 1000, 40)
, (1, 1001, 50)
, (1, 1002, 30)
, (1, 1003, 90)
, (2, 1101, 50)
, (2, 1102, 51)
, (2, 1103, 55)
;

SQL and Result:

WITH cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;

+--------+-----------+
| region | med_score |
+--------+-----------+
| 1 | 45.00 |
| 2 | 51.00 |
+--------+-----------+
2 rows in set (0.004 sec)

Still not quite enough detail. But here's SQL that runs against your schema, minus the typos I think you had in your SQL:

WITH tscores AS (
SELECT i.region AS region
, Sum(S.score) AS total_score
FROM tredence.assessments A
JOIN tredence.studentassessment S
ON A.id_assessment = S.id_assessment
JOIN tredence.studentinfo i
ON i.id_student = S.id_student
WHERE A.assessment = 'Exam'
GROUP BY S.id_student
, i.region
)
, cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;

Calculating the median with where clause condition - sqlite

One approach uses analytic functions:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Stay) rn,
COUNT(*) OVER () AS cnt,
AVG(Room_Spend + Food_Spend) OVER () AS total_spent
FROM test
)

SELECT AVG(Stay) AS Stay, MAX(total_spent) AS total_spent
FROM cte
WHERE rn = (cnt / 2) + 1 AND cnt % 2 = 1 OR
rn IN (cnt / 2, cnt / 2 + 1) AND cnt % 2 = 0;


Related Topics



Leave a reply



Submit