"Approximate" greatest common divisor

Suppose you have a list of floating point numbers that are approximately multiples of a common quantity, for example

2.468, 3.700, 6.1699

which are approximately all multiples of 1.234. How would you characterize this "approximate gcd", and how would you proceed to compute or estimate it?

Strictly related to my answer to this question.


You can run Euclid's gcd algorithm with anything smaller then 0.01 (or a small number of your choice) being a pseudo 0. With your numbers:

3.700 = 1 * 2.468 + 1.232,
2.468 = 2 * 1.232 + 0.004. 

So the pseudo gcd of the first two numbers is 1.232. Now you take the gcd of this with your last number:

6.1699 = 5 * 1.232 + 0.0099.

So 1.232 is the pseudo gcd, and the mutiples are 2,3,5. To improve this result, you may take the linear regression on the data points:

(2,2.468), (3,3.7), (5,6.1699).

The slope is the improved pseudo gcd.

Caveat: the first part of this is algorithm is numerically unstable - if you start with very dirty data, you are in trouble.


Express your measurements as multiples of the lowest one. Thus your list becomes 1.00000, 1.49919, 2.49996. The fractional parts of these values will be very close to 1/Nths, for some value of N dictated by how close your lowest value is to the fundamental frequency. I would suggest looping through increasing N until you find a sufficiently refined match. In this case, for N=1 (that is, assuming X=2.468 is your fundamental frequency) you would find a standard deviation of 0.3333 (two of the three values are .5 off of X * 1), which is unacceptably high. For N=2 (that is, assuming 2.468/2 is your fundamental frequency) you would find a standard deviation of virtually zero (all three values are within .001 of a multiple of X/2), thus 2.468/2 is your approximate GCD.

The major flaw in my plan is that it works best when the lowest measurement is the most accurate, which is likely not the case. This could be mitigated by performing the entire operation multiple times, discarding the lowest value on the list of measurements each time, then use the list of results of each pass to determine a more precise result. Another way to refine the results would be adjust the GCD to minimize the standard deviation between integer multiples of the GCD and the measured values.


This reminds me of the problem of finding good rational-number approximations of real numbers. The standard technique is a continued-fraction expansion:

def rationalizations(x):
    assert 0 <= x
    ix = int(x)
    yield ix, 1
    if x == ix: return
    for numer, denom in rationalizations(1.0/(x-ix)):
        yield denom + ix * numer, numer

We could apply this directly to Jonathan Leffler's and Sparr's approach:

>>> a, b, c = 2.468, 3.700, 6.1699
>>> b/a, c/a
(1.4991896272285252, 2.4999594813614263)
>>> list(itertools.islice(rationalizations(b/a), 3))
[(1, 1), (3, 2), (925, 617)]
>>> list(itertools.islice(rationalizations(c/a), 3))
[(2, 1), (5, 2), (30847, 12339)]

picking off the first good-enough approximation from each sequence. (3/2 and 5/2 here.) Or instead of directly comparing 3.0/2.0 to 1.499189..., you could notice than 925/617 uses much larger integers than 3/2, making 3/2 an excellent place to stop.

It shouldn't much matter which of the numbers you divide by. (Using a/b and c/b you get 2/3 and 5/3, for instance.) Once you have integer ratios, you could refine the implied estimate of the fundamental using shsmurfy's linear regression. Everybody wins!

链接地址: http://www.djcxy.com/p/60866.html

上一篇: 可变半径高斯模糊,近似内核

下一篇: “近似”最大公约数