What is a plain English explanation of "Big O" notation?
我宁愿尽可能少的正式定义和简单的数学。
Quick note, this is almost certainly confusing Big O notation (which is an upper bound) with Theta notation (which is a two-side bound). In my experience this is actually typical of discussions in non-academic settings. Apologies for any confusion caused.
Big O complexity can be visualized with this graph:
The simplest definition I can give for Big-O notation is this:
Big-O notation is a relative representation of the complexity of an algorithm.
There are some important and deliberately chosen words in that sentence:
Come back and reread the above when you've read the rest.
The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:
Each of these is an operation or a problem. A method of solving these is called an algorithm .
Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.
Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.
See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits n in the larger number. We call this O(n) or linear complexity .
Subtraction is similar (except you may need to borrow instead of carry).
Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.
If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.
As the algorithm scales with n-squared, this is O(n2) or quadratic complexity . This is a good time to introduce another important concept:
We only care about the most significant portion of complexity.
The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).
One can notice that we've assumed the worst case scenario here. While multiplying 6 digit numbers if one of them is 4 digit and the other one is 6 digit, then we only have 24 multiplications. Still we calculate the worst case scenario for that 'n', ie when both are 6 digit numbers. Hence Big-O notation is about the Worst-case scenario of an algorithm
The Telephone Book
The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.
Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?
A typical implementation might be to open up to the middle, take the 500,000th and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.
This is called a binary search and is used every day in programming whether you realize it or not.
So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.
That is staggeringly good isn't it?
In Big-O terms this is O(log n) or logarithmic complexity . Now the logarithm in question could be ln (base e), log10, log2 or some other base. It doesn't matter it's still O(log n) just like O(2n2) and O(100n2) are still both O(n2).
It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:
Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.
Back to the telephone book.
What if you have a phone number and want to find a name? The police have a reverse phone book but such look-ups are denied to the general public. Or are they? Technically you can reverse look-up a number in an ordinary phone book. How?
You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).
So to find a name given the phone number (reverse lookup):
The Travelling Salesman
This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.
Sounds simple? Think again.
If you have 3 towns A, B and C with roads between all pairs then you could go:
Well actually there's less than that because some of these are equivalent (A → B → C and C → B → A are equivalent, for example, because they use the same roads, just in reverse).
In actuality there are 3 possibilities.
This is a function of a mathematical operation called a factorial . Basically:
So the Big-O of the Travelling Salesman problem is O(n!) or factorial or combinatorial complexity .
By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.
Something to think about.
Polynomial Time
Another point I wanted to make quick mention of is that any algorithm that has a complexity of O(na) is said to have polynomial complexity or is solvable in polynomial time .
O(n), O(n2) etc are all polynomial time. Some problems cannot be solved in polynomial time. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.
Anyway, that's it for my (hopefully plain English) explanation of Big O (revised).
It shows how an algorithm scales.
O(n2) : known as Quadratic complexity
Notice that the number of items increases by a factor of 10, but the time increases by a factor of 102. Basically, n=10 and so O(n2) gives us the scaling factor n2 which is 102.
O(n) : known as Linear complexity
This time the number of items increases by a factor of 10, and so does the time. n=10 and so O(n)'s scaling factor is 10.
O(1) : known as Constant complexity
The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.
O(log n) : known as Logarithmic complexity
The number of computations is only increased by a log of the input value. So in this case, assuming each computation takes 1 second, the log of the input n
is the time required, hence log n
.
That's the gist of it. They reduce the maths down so it might not be exactly n2 or whatever they say it is, but that'll be the dominating factor in the scaling.
Big-O notation (also called "asymptotic growth" notation) is what functions "look like" when you ignore constant factors and stuff near the origin. We use it to talk about how thing scale .
Basics
for "sufficiently" large inputs...
f(x) ∈ O(upperbound)
means f
"grows no faster than" upperbound
f(x) ∈ Ɵ(justlikethis)
mean f
"grows exactly like" justlikethis
f(x) ∈ Ω(lowerbound)
means f
"grows no slower than" lowerbound
big-O notation doesn't care about constant factors: the function 9x²
is said to "grow exactly like" 10x²
. Neither does big-O asymptotic notation care about non-asymptotic stuff ("stuff near the origin" or "what happens when the problem size is small"): the function 10x²
is said to "grow exactly like" 10x² - x + 2
.
Why would you want to ignore the smaller parts of the equation? Because they become completely dwarfed by the big parts of the equation as you consider larger and larger scales; their contribution becomes dwarfed and irrelevant. (See example section.)
Put another way, it's all about the ratio as you go to infinity. If you divide the actual time it takes by the O(...)
, you will get a constant factor in the limit of large inputs. Intuitively this makes sense: functions "scale like" one another if you can multiply one to get the other. That is, when we say...
actualAlgorithmTime(N) ∈ O(bound(N))
e.g. "time to mergesort N elements
is O(N log(N))"
... this means that for "large enough" problem sizes N (if we ignore stuff near the origin), there exists some constant (eg 2.5, completely made up) such that:
actualAlgorithmTime(N) e.g. "mergesort_duration(N) "
────────────────────── < constant ───────────────────── < 2.5
bound(N) N log(N)
There are many choices of constant; often the "best" choice is known as the "constant factor" of the algorithm... but we often ignore it like we ignore non-largest terms (see Constant Factors section for why they don't usually matter). You can also think of the above equation as a bound, saying "In the worst-case scenario, the time it takes will never be worse than roughly N*log(N)
, within a factor of 2.5 (a constant factor we don't care much about)".
In general, O(...)
is the most useful one because we often care about worst-case behavior. If f(x)
represents something "bad" like processor or memory usage, then " f(x) ∈ O(upperbound)
" means " upperbound
is the worst-case scenario of processor/memory usage".
Applications
As a purely mathematical construct, big-O notation is not limited to talking about processing time and memory. You can use it to discuss the asymptotics of anything where scaling is meaningful, such as:
N
people at a party ( Ɵ(N²)
, specifically N(N-1)/2
, but what matters is that it "scales like" N²
) Example
For the handshake example above, everyone in a room shakes everyone else's hand. In that example, #handshakes ∈ Ɵ(N²)
. Why?
Back up a bit: the number of handshakes is exactly n-choose-2 or N*(N-1)/2
(each of N people shakes the hands of N-1 other people, but this double-counts handshakes so divide by 2):
However, for very large numbers of people, the linear term N
is dwarfed and effectively contributes 0 to the ratio (in the chart: the fraction of empty boxes on the diagonal over total boxes gets smaller as the number of participants becomes larger). Therefore the scaling behavior is order N²
, or the number of handshakes "grows like N²".
#handshakes(N)
────────────── ≈ 1/2
N²
It's as if the empty boxes on the diagonal of the chart (N*(N-1)/2 checkmarks) weren't even there (N2 checkmarks asymptotically).
(temporary digression from "plain English":) If you wanted to prove this to yourself, you could perform some simple algebra on the ratio to split it up into multiple terms ( lim
means "considered in the limit of", just ignore it if you haven't seen it, it's just notation for "and N is really really big"):
N²/2 - N/2 (N²)/2 N/2 1/2
lim ────────── = lim ( ────── - ─── ) = lim ─── = 1/2
N→∞ N² N→∞ N² N² N→∞ 1
┕━━━┙
this is 0 in the limit of N→∞:
graph it, or plug in a really large number for N
tl;dr: The number of handshakes 'looks like' x² so much for large values, that if we were to write down the ratio #handshakes/x², the fact that we don't need exactly x² handshakes wouldn't even show up in the decimal for an arbitrarily large while.
eg for x=1million, ratio #handshakes/x²: 0.499999...
Building Intuition
This lets us make statements like...
"For large enough inputsize=N, no matter what the constant factor is, if I double the input size...
N → (2N) = 2( N )
N² → (2N)² = 4( N² )
cN³ → c(2N)³ = 8( cN³ )
c log(N) → c log(2N) = (c log(2))+( c log(N) ) = (fixed amount)+( c log(N) )
c*1 → c*1
it's less than O(N1.000001), which you might be willing to call basically linear
2N → 22N = (4N)............put another way...... 2N → 2N+1 = 2N21 = 2 2N
[for the mathematically inclined, you can mouse over the spoilers for minor sidenotes]
(with credit to https://stackoverflow.com/a/487292/711085 )
(technically the constant factor could maybe matter in some more esoteric examples, but I've phrased things above (eg in log(N)) such that it doesn't)
These are the bread-and-butter orders of growth that programmers and applied computer scientists use as reference points. They see these all the time. (So while you could technically think "Doubling the input makes an O(√N) algorithm 1.414 times slower," it's better to think of it as "this is worse than logarithmic but better than linear".)
Constant factors
Usually we don't care what the specific constant factors are, because they don't affect the way the function grows. For example, two algorithms may both take O(N)
time to complete, but one may be twice as slow as the other. We usually don't care too much unless the factor is very large, since optimizing is tricky business ( When is optimisation premature? ); also the mere act of picking an algorithm with a better big-O will often improve performance by orders of magnitude.
Some asymptotically superior algorithms (eg a non-comparison O(N log(log(N)))
sort) can have so large a constant factor (eg 100000*N log(log(N))
), or overhead that is relatively large like O(N log(log(N)))
with a hidden + 100*N
, that they are rarely worth using even on "big data".
Why O(N) is sometimes the best you can do, ie why we need datastructures
O(N)
algorithms are in some sense the "best" algorithms if you need to read all your data. The very act of reading a bunch of data is an O(N)
operation. Loading it into memory is usually O(N)
(or faster if you have hardware support, or no time at all if you've already read the data). However if you touch or even look at every piece of data (or even every other piece of data), your algorithm will take O(N)
time to perform this looking. Nomatter how long your actual algorithm takes, it will be at least O(N)
because it spent that time looking at all the data.
The same can be said for the very act of writing . All algorithms which print out N things will take N time, because the output is at least that long (eg printing out all permutations (ways to rearrange) a set of N playing cards is factorial: O(N!)
).
This motivates the use of data structures : a data structure requires reading the data only once (usually O(N)
time), plus some arbitrary amount of preprocessing (eg O(N)
or O(N log(N))
or O(N²)
) which we try to keep small. Thereafter, modifying the data structure (insertions / deletions / etc.) and making queries on the data take very little time, such as O(1)
or O(log(N))
. You then proceed to make a large number of queries! In general, the more work you're willing to do ahead of time, the less work you'll have to do later on.
For example, say you had the latitude and longitude coordinates of millions of roads segments, and wanted to find all street intersections.
O(N)
work only once, but if you want to do it many times (in this case, N
times, once for each segment), we'd have to do O(N²)
work, or 1000000²=1000000000000 operations. Not good (a modern computer can perform about a billion operations per second). O(N)
time. Thereafter, it only takes constant time on average to look up something by its key (in this case, our key is the latitude and longitude coordinates, rounded into a grid; we search the adjacent gridspaces of which there are only 9, which is a constant). O(N²)
to a manageable O(N)
, and all we had to do was pay a minor cost to make a hash table. The moral of the story: a data structure lets us speed up operations. Even more advanced data structures can let you combine, delay, or even ignore operations in incredibly clever ways. Different problems would have different analogies, but they'd all involve organizing the data in a way that exploits some structure we care about, or which we've artificially imposed on it for bookkeeping. We do work ahead of time (basically planning and organizing), and now repeated tasks are much much easier!
Practical example: visualizing orders of growth while coding
Asymptotic notation is, at its core, quite separate from programming. Asymptotic notation is a mathematical framework for thinking about how things scale, and can be used in many different fields. That said... this is how you apply asymptotic notation to coding.
The basics: Whenever we interact with every element in a collection of size A (such as an array, a set, all keys of a map, etc.), or perform A iterations of a loop, that is a multiplcative factor of size A. Why do I say "a multiplicative factor"?--because loops and functions (almost by definition) have multiplicative running time: the number of iterations, times work done in the loop (or for functions: the number of times you call the function, times work done in the function). (This holds if we don't do anything fancy, like skip loops or exit the loop early, or change control flow in the function based on arguments, which is very common.) Here are some examples of visualization techniques, with accompanying pseudocode.
(here, the x
s represent constant-time units of work, processor instructions, interpreter opcodes, whatever)
for(i=0; i<A; i++) // A x ...
some O(1) operation // 1
--> A*1 --> O(A) time
visualization:
|<------ A ------->|
1 2 3 4 5 x x ... x
other languages, multiplying orders of growth:
javascript, O(A) time and space
someListOfSizeA.map((x,i) => [x,i])
python, O(rows*cols) time and space
[[r*c for c in range(cols)] for r in range(rows)]
Example 2:
for every x in listOfSizeA: // A x ...
some O(1) operation // 1
some O(B) operation // B
for every y in listOfSizeC: // C x ...
some O(1) operation // 1
--> O(A*(1 + B + C))
O(A*(B+C)) (1 is dwarfed)
visualization:
|<------ A ------->|
1 x x x x x x ... x
2 x x x x x x ... x ^
3 x x x x x x ... x |
4 x x x x x x ... x |
5 x x x x x x ... x B <-- A*B
x x x x x x x ... x |
................... |
x x x x x x x ... x v
x x x x x x x ... x ^
x x x x x x x ... x |
x x x x x x x ... x |
x x x x x x x ... x C <-- A*C
x x x x x x x ... x |
................... |
x x x x x x x ... x v
Example 3:
function nSquaredFunction(n) {
total = 0
for i in 1..n: // N x
for j in 1..n: // N x
total += i*k // 1
return total
}
// O(n^2)
function nCubedFunction(a) {
for i in 1..n: // A x
print(nSquaredFunction(a)) // A^2
}
// O(a^3)
If we do something slightly complicated, you might still be able to imagine visually what's going on:
for x in range(A):
for y in range(1..x):
simpleOperation(x*y)
x x x x x x x x x x |
x x x x x x x x x |
x x x x x x x x |
x x x x x x x |
x x x x x x |
x x x x x |
x x x x |
x x x |
x x |
x___________________|
Here, the smallest recognizable outline you can draw is what matters; a triangle is a two dimensional shape (0.5 A^2), just like a square is a two-dimensional shape (A^2); the constant factor of two here remains in the asymptotic ratio between the two, however we ignore it like all factors... (There are some unfortunate nuances to this technique I don't go into here; it can mislead you.)
Of course this does not mean that loops and functions are bad; on the contrary, they are the building blocks of modern programming languages, and we love them. However, we can see that the way we weave loops and functions and conditionals together with our data (control flow, etc.) mimics the time and space usage of our program! If time and space usage becomes an issue, that is when we resort to cleverness, and find an easy algorithm or data structure we hadn't considered, to reduce the order of growth somehow. Nevertheless, these visualization techniques (though they don't always work) can give you a naive guess at a worst-case running time.
Here is another thing we can recognize visually:
<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x
x x x x x x x x
x x x x
x x
x
We can just rearrange this and see it's O(N):
<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x|x x x x x x x x|x x x x|x x|x
Or maybe you do log(N) passes of the data, for O(N*log(N)) total time:
<----------------------------- N ----------------------------->
^ x x x x x x x x x x x x x x x x|x x x x x x x x x x x x x x x x
| x x x x x x x x|x x x x x x x x|x x x x x x x x|x x x x x x x x
lgN x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x
| x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x
v x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x
Unrelatedly but worth mentioning again: If we perform a hash (eg a dictionary / hashtable lookup), that is a factor of O(1). That's pretty fast.
[myDictionary.has(x) for x in listOfSizeA]
----- O(1) ------/
--> A*1 --> O(A)
If we do something very complicated, such as with a recursive function or divide-and-conquer algorithm, you can use the Master Theorem (usually works), or in ridiculous cases the Akra-Bazzi Theorem (almost always works) you look up the running time of your algorithm on Wikipedia.
But, programmers don't think like this because eventually, algorithm intuition just becomes second nature. You will start to code something inefficient, and immediately think "am I doing something grossly inefficient? ". If the answer is "yes" AND you foresee it actually mattering, then you can take a step back and think of various tricks to make things run faster (the answer is almost always "use a hashtable", rarely "use a tree", and very rarely something a bit more complicated).
Amortized and average-case complexity
There is also the concept of "amortized" and/or "average case" (note that these are different).
Average Case : This is no more than using big-O notation for the expected value of a function, rather than the function itself. In the usual case where you consider all inputs to be equally likely, the average case is just the average of the running time. For example with quicksort, even though the worst-case is O(N^2)
for some really bad inputs, the average case is the usual O(N log(N))
(the really bad inputs are very small in number, so few that we don't notice them in the average case).
Amortized Worst-Case : Some data structures may have a worst-case complexity that is large, but guarantee that if you do many of these operations, the average amount of work you do will be better than worst-case. For example you may have a data structure that normally takes constant O(1)
time. However, occasionally it will 'hiccup' and take O(N)
time for one random operation, because maybe it needs to do some bookkeeping or garbage collection or something... but it promises you that if it does hiccup, it won't hiccup again for N more operations. The worst-case cost is still O(N)
per operation, but the amortized cost over many runs is O(N)/N
= O(1)
per operation. Because the big operations are sufficiently rare, the massive amount of occasional work can be considered to blend in with the rest of the work as a constant factor. We say the work is "amortized" over a sufficiently large number of calls that it disappears asymptotically.
The analogy for amortized analysis:
You drive a car. Occasionally, you need to spend 10 minutes going to the gas station and then spend 1 minute refilling the tank with gas. If you did this every time you went anywhere with your car (spend 10 minutes driving to the gas station, spend a few seconds filling up a fraction of a gallon), it would be very inefficient. But if you fill up the tank once every few days, the 11 minutes spent driving to the gas station is "amortized" over a sufficiently large number of trips, that you can ignore it and pretend all your trips were maybe 5% longer.
Comparison between average-case and amortized worst-case:
Both average-case and amortization are incredibly useful tools for thinking about and designing with scaling in mind.
(See Difference between average case and amortized analysis if interested on this subtopic.)
Multidimensional big-O
Most of the time, people don't realize that there's more than one variable at work. For example, in a string-search algorithm, your algorithm may take time O([length of text] + [length of query])
, ie it is linear in two variables like O(N+M)
. Other more naive algorithms may be O([length of text]*[length of query])
or O(N*M)
. Ignoring multiple variables is one of the most common oversights I see in algorithm analysis, and can handicap you when designing an algorithm.
The whole story
Keep in mind that big-O is not the whole story. You can drastically speed up some algorithms by using caching, making them cache-oblivious, avoiding bottlenecks by working with RAM instead of disk, using parallelization, or doing work ahead of time -- these techniques are often independent of the order-of-growth "big-O" notation, though you will often see the number of cores in the big-O notation of parallel algorithms.
Also keep in mind that due to hidden constraints of your program, you might not really care about asymptotic behavior. You may be working with a bounded number of values, for example:
O(N log(N))
quicksort; you want to use insertion sort, which happens to perform well on small inputs. These situations often comes up in divide-and-conquer algorithms, where you split up the problem into smaller and smaller subproblems, such as recursive sorting, fast Fourier transforms, or matrix multiplication. In practice, even among algorithms which have the same or similar asymptotic performance, their relative merit may actually be driven by other things, such as: other performance factors (quicksort and mergesort are both O(N log(N))
, but quicksort takes advantage of CPU caches); non-performance considerations, like ease of implementation; whether a library is available, and how reputable and maintained the library is.
Programs will also run slower on a 500MHz computer vs 2GHz computer. We don't really consider this as part of the resource bounds, because we think of the scaling in terms of machine resources (eg per clock cycle), not per real second. However, there are similar things which can 'secretly' affect performance, such as whether you are running under emulation, or whether the compiler optimized code or not. These might make some basic operations take longer (even relative to each other), or even speed up or slow down some operations asymptotically (even relative to each other). The effect may be small or large between different implementation and/or environment. Do you switch languages or machines to eke out that little extra work? That depends on a hundred other reasons (necessity, skills, coworkers, programmer productivity, the monetary value of your time, familiarity, workarounds, why not assembly or GPU, etc...), which may be more important than performance.
The above issues, like programming language, are almost never considered as part of the constant factor (nor should they be); yet one should be aware of them, because sometimes (though rarely) they may affect things. For example in cpython, the native priority queue implementation is asymptotically non-optimal ( O(log(N))
rather than O(1)
for your choice of insertion or find-min); do you use another implementation? Probably not, since the C implementation is probably faster, and there are probably other similar issues elsewhere. There are tradeoffs; sometimes they matter and sometimes they don't.
(edit: The "plain English" explanation ends here.)
Math addenda
For completeness, the precise definition of big-O notation is as follows: f(x) ∈ O(g(x))
means that "f is asymptotically upper-bounded by const*g": ignoring everything below some finite value of x, there exists a constant such that |f(x)| ≤ const * |g(x)|
|f(x)| ≤ const * |g(x)|
. (The other symbols are as follows: just like O
means ≤, Ω
means ≥. There are lowercase variants: o
means <, and ω
means >.) f(x) ∈ Ɵ(g(x))
means both f(x) ∈ O(g(x))
and f(x) ∈ Ω(g(x))
(upper- and lower-bounded by g): there exists some constants such that f will always lie in the "band" between const1*g(x)
and const2*g(x)
. It is the strongest asymptotic statement you can make and roughly equivalent to ==
. (Sorry, I elected to delay the mention of the absolute-value symbols until now, for clarity's sake; especially because I have never seen negative values come up in a computer science context.)
People will often use = O(...)
, which is perhaps the more correct 'comp-sci' notation, and entirely legitimate to use... but one should realize =
is not being used as equality; it is a compound notation that must be read as an idiom. I was taught to use the more rigorous ∈ O(...)
. ∈
means "is an element of". O(N²)
is actually an equivalence class, that is, it is a set of things which we consider to be the same. In this particular case, O(N²)
contains elements like { 2 N²
, 3 N²
, 1/2 N²
, 2 N² + log(N)
, - N² + N^1.9
, ...} and is infinitely large, but it's still a set. The =
notation might be the more common one, and is even used in papers by world-renowned computer scientists. Additionally, it is often the case that in a casual setting, people will say O(...)
when they mean Ɵ(...)
; this is technically true since the set of things Ɵ(exactlyThis)
is a subset of O(noGreaterThanThis)
... and it's easier to type. ;-)
上一篇: 最终的C ++图书指南和列表
下一篇: “大O”符号的简单英文解释是什么?