Why does decreasing K in K

In an extract from my textbook it says that reducing the value of K when running this algorithm actually increases the complexity as it has to run more “smoothing”.

Can anyone explain this to me?

My understanding is that in 1NN , you feed it your training set. You test on your testing set. Assume your testing set has one point in it. It finds the one point closest to it in the training set and returns the value of this.

Surely this is less complex than finding the 3 closest points in 3NN , adding their values and dividing by three?

What have I misunderstood or overlooked?


I had the same moment of disbelief when reading that axiom ; a parameter of higher value that decreases complexity seems a bit counterintuitive at first.

To put an intuition on this, let's compare a 1-nearest-neighbour trained model, and a N>>1-nearest-neighbours one. Let's use a simplified 2D-plot (two-features dataset) with a binary classification (each "point" has a class, or label, of either A or B).

With the 1-nearest-neighbour model, each example of the training set is potentially the center of an area predicting class A or B, with most of its neighbors the center of an area predicting the other class. Your plot might look like one of those maps of ethnicity, language or religion in the regions of the world where they are deeply intertwined (Balkans or the Middle East comes to mind) : small patches of complex shapes and alternating colors, with no discernible logic, and thus "high complexity".

1个最近的邻居

If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point. Thus the areas will be of lesser number, larger sizes and probably simpler shapes, like the political maps of country borders in the same areas of the world. Thus "less complexity".

最近的邻居

(Intuition and source from this course.)


Q: Is k-NN faster than NN ?

A: No.

For more, see below.

In general NN search is more simple, thus requiring less effort than k-NN , when of course k is not equal to 1.

Take a look at my answer here, where I basically explain the concept for NNS (*Nearest Neighbour Search).

In the kNN case, the general algorithm can for example find the top NN , then the second top NN , and so on, until k NN s are found.

Another, most likely to see approach is to have a priority_queue , which contains k in the number NN 's and they are ordered by their distance to the query point.

In order the general algorithm to find more than one neighbours, it has to access more nodes/leaves, which means more step, thus increased time complexity.

It is quite obvious that the accuracy might increase when you increase k but the computation cost also increases.

as said in this blog.

I am suspecting that you are talking about a particular algorithm in your question, but without knowing which, no better answer can be given, in my opinion.

链接地址: http://www.djcxy.com/p/79290.html

上一篇: C ++使用对正在定义的变量的引用

下一篇: 为什么在K中减少K.