How to correctly include uncertainties in fitting with python

2018-06-03 05:00:30

I am trying to fit some data points with y uncertainties in python. The data are labeled in python as x,y and yerr. I need to do a linear fit on that data in loglog scale. As a reference if the fit results are properly, i compare the python results with the ones from Scidavis

I tried curve_fit with

def func(x, a, b):
    return np.exp(a* np.log(x)+np.log(b))

popt, pcov = curve_fit(func, x, y,sigma=yerr)

as well as kmpfit with

def funcL(p, x):
   a,b = p
   return ( np.exp(a*np.log(x)+np.log(b)) )

def residualsL(p, data):
   a,b=p
   x, y, errorfit = data
   return (y-funcL(p,x)) / errorfit

a0=1
b0=0.1
p0 = [a0,b0]
fitterL = kmpfit.Fitter(residuals=residualsL, data=(x,y,yerr))
fitterL.parinfo = [{}, {}]
fitterL.fit(params0=p0)

and when i am trying to fit the data with one of those without uncertainties (ie setting yerr=1), everything works just fine and the results are identical with the ones from scidavis. But if i set yerr to the uncertainties of the data file i get some disturbing results. In python i get ie a=0.86 and in scidavis a=0.14. I read something about that the errors are included as weights. Do i have to change anything, in order to calculate the fit correctly? Or what am i doing wrong?

edit: here is an example of a data file (x,y,yerr)

3.942387e-02    1.987800e+00    5.513165e-01
6.623142e-02    7.126161e+00    1.425232e+00
9.348280e-02    1.238530e+01    1.536208e+00
1.353088e-01    1.090471e+01    7.829126e-01
2.028446e-01    1.023087e+01    3.839575e-01
3.058446e-01    8.403626e+00    1.756866e-01
4.584524e-01    7.345275e+00    8.442288e-02
6.879677e-01    6.128521e+00    3.847194e-02
1.032592e+00    5.359025e+00    1.837428e-02
1.549152e+00    5.380514e+00    1.007010e-02
2.323985e+00    6.404229e+00    6.534108e-03
3.355974e+00    9.489101e+00    6.342546e-03
4.384128e+00    1.497998e+01    2.273233e-02

and the result:

in python: 
   without uncertainties: a=0.06216 +/- 0.00650 ; b=8.53594 +/- 1.13985
   with uncertainties: a=0.86051 +/- 0.01640 ; b=3.38081 +/- 0.22667 
in scidavis:
   without uncertainties: a  = 0.06216 +/- 0.08060; b  = 8.53594 +/- 1.06763
   with uncertainties: a  = 0.14154 +/- 0.005731; b  = 7.38213 +/- 2.13653

I must be misunderstanding something. Your posted data does not look anything like

f(x,a,b) = np.exp(a*np.log(x)+np.log(b))

The red line is the result of scipy.optimize.curve_fit , the green line is the result of scidavis.

My guess is that neither algorithm is converging toward a good fit, so it is not surprising that the results do not match.

I can't explain how scidavis finds its parameters, but according to the definitions as I understand them, scipy is finding parameters with lower least squares residuals than scidavis :

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize

def func(x, a, b):
    return np.exp(a* np.log(x)+np.log(b))

def sum_square(residuals):
    return (residuals**2).sum()

def residuals(p, x, y, sigma):
    return 1.0/sigma*(y - func(x, *p))

data = np.loadtxt('test.dat').reshape((-1,3))
x, y, yerr = np.rollaxis(data, axis = 1)
sigma = yerr

popt, pcov = optimize.curve_fit(func, x, y, sigma = sigma, maxfev = 10000)
print('popt: {p}'.format(p = popt))
scidavis = (0.14154, 7.38213)
print('scidavis: {p}'.format(p = scidavis))

print('''
sum of squares for scipy:    {sp}
sum of squares for scidavis: {d}
'''.format(
          sp = sum_square(residuals(popt, x = x, y = y, sigma = sigma)),
          d = sum_square(residuals(scidavis, x = x, y = y, sigma = sigma))
      ))

plt.plot(x, y, 'bo', x, func(x,*popt), 'r-', x, func(x, *scidavis), 'g-')
plt.errorbar(x, y, yerr)
plt.show()

yields

popt: [ 0.86051258  3.38081125]
scidavis: (0.14154, 7.38213)
sum of squares for scipy:    53249.9915654
sum of squares for scidavis: 239654.84276

在这里输入图像描述

链接地址: http://www.djcxy.com/p/11066.html

上一篇: 响应式图像最大高度100％不工作在Firefox

下一篇: 如何正确包含用python拟合的不确定性