Median of Medians

I have read the order statistics to find the k-th smallest (or largest) element in an array of size n in linear time O(n).

There is one step that it needs to find the median of the medians.

  • Split the array into [n/5] parts. Each part has 5 elements.
  • Find the median in each part. (We have [n/5] numbers now)
  • Repeat step 1 and 2 until we only have the last number. (ie recursive)
  • T(n) = T(n/5) + O(n) and we can get T(n) = O(n).

    But, is it true that, the number we finally get is not the median of medians, but the median of medians of medians of medians of medians of medians, if we have a large array.

    Please consider an array which has 125 elements.

    First, it is split into 25 parts and we find 25 medians. Then, we split these 25 numbers into 5 parts and find 5 medians, Finally, we obtain the number which is median of medians of medians. (Not median of medians)

    The reason why I care about it is that, I can understand there are at most about [3/4]*n elements that are smaller (or larger) than the median of medians. But what if it is not the median of medians but the median of medians of medians? In worse case there must be less elements that are smaller (or larger) than the pivot, which means the pivot is closer to the bound of the array.

    If we have a VERY large array, and we found its median of medians of medians of medians of medians of medians. In the worst case the pivot we found can still be very close to the bound and what is the time complexity in this case?

    I made up a dataset of 125 elements. Is the result 9?

    0.8 0.9 1 inf inf
    1.8 1.9 2 inf inf
    6.8 6.9 7 inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    
    2.8 2.9 3 inf inf
    3.8 3.9 4 inf inf
    7.8 7.9 8 inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    
    4.8 4.9 5 inf inf
    5.8 5.9 6 inf inf
    8.8 8.9 9 inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    inf inf inf inf inf
    

    where inf means the number is large enough.


    Let's denote your median of medians of medians of ... as [median of]* = M .

    First, I believe that median of medians algorithm (to select a good pivot) is not recursive. The algorithm goes as follows:

  • Split the elements in the groups of 5
  • Find the median of each group
  • Find the median of medians and use it as a pivot.
  • Median of medians will be smaller than 3n/10 elements and larger than another 3n/10 elements, not 3n/4. You have n/5 numbers after selecting medians. Median of median is greater/smaller than half of those numbers, which is n/10. Each of those numbers is a median itself, so it's greater/smaller than 2 numbers, giving you another 2n/10 numbers. Now in total, you get n/10 + 2n/10 = 3n/10 numbers.

    To address your second question, after collecting the group of 5's in your example dataset and calculating their medians, we will have the following sequence:

    1, 2, 7, inf, inf
    3, 4, 8, inf, inf
    5, 6, 9, inf, inf, 
    inf, inf, inf, inf, inf, 
    inf, inf, inf, inf, inf.
    

    So the median of medians would indeed be 9.

    Your proposed [median of]* algorithm's runtime will be:

    T(n) = O(n * log(n))
    

    Now let's try to analyze how many numbers we have less/greater than M. We have the following groups:

  • depth 1: n/5 elements all medians
  • depth 2: n/25 elements all medians
  • ...
  • depth i: n/(5^i) elements all medians
  • Each group is less/greater than 2 elements of the previous depth, which is less/greater than 2 elements of the previous depth, and so on:

    Calculating in total, we get that our M is greater/less than (n * (2^k) + k * n) /((2^k) * (5^k)). For depth = 1 you get median of medians, which is 3n/10.

    Now assuming your depth is [log_5 (n)], ie n = 5^k, we get:

    5^k * (k + 2^k)/(5^k * 2^k) which is -> 1.

    链接地址: http://www.djcxy.com/p/14660.html

    上一篇: 在编辑时在angularJS中进行排序

    下一篇: Medians中位数