Simple way to calculate median with MySQL
What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x)
for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.
Example data:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Sorting on val
gives 2 2 3 4 7 8 9
, so the median should be 4
, versus SELECT AVG(val)
which == 5
.
The problem with the proposed solution (TheJacobTaylor) is runtime. Joining the table to itself is slow as molasses for large datasets. My proposed alternative runs in mysql, has awesome runtime, uses an explicit ORDER BY statement, so you don't have to hope your indexes ordered it properly to give a correct result, and is easy to unroll the query to debug.
SELECT avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 as `row_number`, d.val
FROM data d, (SELECT @rownum:=0) r
WHERE 1
-- put some where clause here
ORDER BY d.val
) as t1,
(
SELECT count(*) as total_rows
FROM data d
WHERE 1
-- put same where clause here
) as t2
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) );
[edit] Added avg() around t1.val and row_number in(...) to correctly produce a median when there are an even number of records. Reasoning:
SELECT floor((3+1)/2),floor((3+2)/2);#total_rows is 3, so avg row_numbers 2 and 2
SELECT floor((4+1)/2),floor((4+2)/2);#total_rows is 4, so avg row_numbers 2 and 3
I just found another answer online in the comments:
For medians in almost any SQL:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.
select count(*) from table --find the number of rows
Calculate the "median" row number. Maybe use: median_row = floor(count / 2)
.
Then pick it out of the list:
select val from table order by val asc limit median_row,1
This should return you one row with just the value you want.
Jacob
我发现接受的解决方案不适用于我的MySQL安装,返回一个空集,但是这个查询在我测试过的所有情况下都适用于我:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
链接地址: http://www.djcxy.com/p/83820.html
上一篇: 函数来计算SQL Server中的中位数
下一篇: 用MySQL计算中位数的简单方法