How to calculate order (big O) for more complex algorithms (eg quicksort)

2018-06-13 22:17:21

I know there are quite a bunch of questions about big O notation, I have already checked:

Plain english explanation of Big O

Big O, how do you calculate/approximate it?

Big O Notation Homework--Code Fragment Algorithm Analysis?

to name a few.

I know by "intuition" how to calculate it for n , n^2 , n! and so, however I am completely lost on how to calculate it for algorithms that are log n , n log n , n log log n and so.

What I mean is, I know that Quick Sort is n log n (on average).. but, why ? Same thing for merge/comb, etc.

Could anybody explain me in a not too math-y way how do you calculate this?

The main reason is that Im about to have a big interview and I'm pretty sure they'll ask for this kind of stuff. I have researched for a few days now, and everybody seem to have either an explanation of why bubble sort is n^2 or the unreadable explanation (for me) on Wikipedia

The logarithm is the inverse operation of exponentiation. An example of exponentiation is when you double the number of items at each step. Thus, a logarithmic algorithm often halves the number of items at each step. For example, binary search falls into this category.

Many algorithms require a logarithmic number of big steps, but each big step requires O(n) units of work. Mergesort falls into this category.

Usually you can identify these kinds of problems by visualizing them as a balanced binary tree. For example, here's merge sort:

 6   2    0   4    1   3     7   5
  2 6      0 4      1 3       5 7
    0 2 4 6            1 3 5 7
         0 1 2 3 4 5 6 7

At the top is the input, as leaves of the tree. The algorithm creates a new node by sorting the two nodes above it. We know the height of a balanced binary tree is O(log n) so there are O(log n) big steps. However, creating each new row takes O(n) work. O(log n) big steps of O(n) work each means that mergesort is O(n log n) overall.

Generally, O(log n) algorithms look like the function below. They get to discard half of the data at each step.

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    if some_condition():
       function(data[:n/2], n / 2) # Recurse on first half of data
    else:
       function(data[n/2:], n - n / 2) # Recurse on second half of data

While O(n log n) algorithms look like the function below. They also split the data in half, but they need to consider both halves.

def function(data, n):
    if n <= constant:
       return do_simple_case(data, n)
    part1 = function(data[n/2:], n / 2)      # Recurse on first half of data
    part2 = function(data[:n/2], n - n / 2)  # Recurse on second half of data
    return combine(part1, part2)

Where do_simple_case() takes O(1) time and combine() takes no more than O(n) time.

The algorithms don't need to split the data exactly in half. They could split it into one-third and two-thirds, and that would be fine. For average-case performance, splitting it in half on average is sufficient (like QuickSort). As long as the recursion is done on pieces of (n/something) and (n - n/something), it's okay. If it's breaking it into (k) and (nk) then the height of the tree will be O(n) and not O(log n).

You can usually claim log n for algorithms where it halves the space/time each time it runs. A good example of this is any binary algorithm (eg, binary search). You pick either left or right, which then axes the space you're searching in half. The pattern of repeatedly doing half is log n.

For some algorithms, getting a tight bound for the running time through intuition is close to impossible (I don't think I'll ever be able to intuit a O(n log log n) running time, for instance, and I doubt anyone will ever expect you to). If you can get your hands on the CLRS Introduction to Algorithms text, you'll find a pretty thorough treatment of asymptotic notation which is appropriately rigorous without being completely opaque.

If the algorithm is recursive, one simple way to derive a bound is to write out a recurrence and then set out to solve it, either iteratively or using the Master Theorem or some other way. For instance, if you're not looking to be super rigorous about it, the easiest way to get QuickSort's running time is through the Master Theorem -- QuickSort entails partitioning the array into two relatively equal subarrays (it should be fairly intuitive to see that this is O(n) ), and then calling QuickSort recursively on those two subarrays. Then if we let T(n) denote the running time, we have T(n) = 2T(n/2) + O(n) , which by the Master Method is O(n log n) .

链接地址: http://www.djcxy.com/p/39678.html

上一篇: 如何从字母矩阵中找到可能的单词列表[Boggle Solver]

下一篇: 如何计算更复杂的算法（如快速排序）的顺序（大O）