Getting Standard Errors on Fitted Parameters Using the Optimize.Leastsq Method in Python

Errors to fit parameters of scipy.optimize

TL;DR: You can actually place an upper bound on how precisely the minimization routine has found the optimal values of your parameters. See the snippet at the end of this answer that shows how to do it directly, without resorting to calling additional minimization routines.


The documentation for this method says

The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol.

Roughly speaking, the minimization stops when the value of the function f that you're minimizing is minimized to within ftol of the optimum. (This is a relative error if f is greater than 1, and absolute otherwise; for simplicity I'll assume it's an absolute error.) In more standard language, you'll probably think of your function f as a chi-squared value. So this roughly suggests that you would expect

\chi^2_{\mathrm{found}} - \chi^2_{\mathrm{optimal}} \lesssim \mathtt{ftol}

Of course, just the fact that you're applying a minimization routine like this assumes that your function is well behaved, in the sense that it's reasonably smooth and the optimum being found is well approximated near the optimum by a quadratic function of the parameters xi:

\chi^2_{\mathrm{found}} - \chi^2_{\mathrm{optimal}} \approx \sum_{i,j} H_{ij} \Delta x^i \Delta x^j

where Δxi is the difference between the found value of parameter xi and its optimal value, and Hij is the Hessian matrix. A little (surprisingly nontrivial) linear algebra gets you to a pretty standard result for an estimate of the uncertainty in any quantity X that's a function of your parameters xi:

(\Delta X)^2 = \left(\chi^2_{\mathrm{found}} - \chi^2_{\mathrm{optimal}}\right) \sum_{i,j} \left(H^{-1}\right)_{ij} \\frac{\partial X}{\partial x^i} \frac{\partial X}{\partial x^j}

which lets us write

(\Delta X)^2 \lesssim \mathtt{ftol} \sum_{i,j} \left(H^{-1}\right)_{ij} \\frac{\partial X}{\partial x^i} \frac{\partial X}{\partial x^j}

That's the most useful formula in general, but for the specific question here, we just have X = xi, so this simplifies to

\Delta x^i \lesssim \sqrt{\mathtt{ftol} \left(H^{-1}\right)_{ii}}

Finally, to be totally explicit, let's say you've stored the optimization result in a variable called res. The inverse Hessian is available as res.hess_inv, which is a function that takes a vector and returns the product of the inverse Hessian with that vector. So, for example, we can display the optimized parameters along with the uncertainty estimates with a snippet like this:

ftol = 2.220446049250313e-09
tmp_i = np.zeros(len(res.x))
for i in range(len(res.x)):
tmp_i[i] = 1.0
hess_inv_i = res.hess_inv(tmp_i)[i]
uncertainty_i = np.sqrt(max(1, abs(res.fun)) * ftol * hess_inv_i)
tmp_i[i] = 0.0
print('x^{0} = {1:12.4e} ± {2:.1e}'.format(i, res.x[i], uncertainty_i))

Note that I've incorporated the max behavior from the documentation, assuming that f^k and f^{k+1} are basically just the same as the final output value, res.fun, which really ought to be a good approximation. Also, for small problems, you can just use np.diag(res.hess_inv.todense()) to get the full inverse and extract the diagonal all at once. But for large numbers of variables, I've found that to be a much slower option. Finally, I've added the default value of ftol, but if you change it in an argument to minimize, you would obviously need to change it here.

Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?

You can use scipy.optimize.curve_fit:
This method does not only return the estimated optimal values of
the parameters, but also the corresponding covariance matrix:

popt : array

Optimal values for the parameters so that the sum of the squared residuals
of f(xdata, *popt) - ydata is minimized

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).

How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.

If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.

You can calculate the standard deviation errors of the parameters from the square roots of the diagonal elements of the covariance matrix as follows:

import numpy as np 
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import curve_fit

x = np.random.normal(size=50000)
fig, ax = plt.subplots()
nbins = 75
n, bins, patches = ax.hist(x,nbins, density=True, facecolor = 'grey', alpha = 0.5, label='before');

centers = (0.5*(bins[1:]+bins[:-1]))
pars, cov = curve_fit(lambda x, mu, sig : norm.pdf(x, loc=mu, scale=sig), centers, n, p0=[0,1])

ax.plot(centers, norm.pdf(centers,*pars), 'k--',linewidth = 2, label='fit before')
ax.set_title('$\mu={:.4f}\pm{:.4f}$, $\sigma={:.4f}\pm{:.4f}$'.format(pars[0],np.sqrt(cov[0,0]), pars[1], np.sqrt(cov[1,1 ])))

plt.show()

This results in the following plot:

Sample Image

Getting standard error associated with parameter estimates from scipy.optimize.curve_fit

The variance of parameters are the diagonal elements of the variance-co variance matrix, and the standard error is the square root of it. np.sqrt(np.diag(pcov))

Regarding getting inf, see and compare these two examples:

In [129]:
import numpy as np
def func(x, a, b, c, d):
return a * np.exp(-b * x) + c

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5, 1)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ inf inf inf inf]

And:

In [130]:

def func(x, a, b, c):
return a * np.exp(-b * x) + c

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = so.curve_fit(func, xdata, ydata)
print np.sqrt(np.diag(pcov))
[ 0.11097646 0.11849107 0.05230711]

In this extreme example, d has no effect on the function func, hence it will be associated with variance of +inf, or in another word, it can be just about any value. Removing d from func will get what will make sense.

In reality, if parameters are of very different scale, say:

def func(x, a, b, c, d):
#return a * np.exp(-b * x) + c
return a * np.exp(-b * x) + c + d*1e-10

You will also get inf due to float point overflow/underflow.

In your case, I think you never used a and b. So it is just like the first example here.

Getting covariance matrix of fitted parameters from scipy optimize.least_squares method

curve_fit itself uses a pseudoinverse of the jacobian from least_squares, https://github.com/scipy/scipy/blob/2526df72e5d4ca8bad6e2f4b3cbdfbc33e805865/scipy/optimize/minpack.py#L739

One thing to note is that this whole approach is dubious if the result is close to the bounds.



Related Topics



Leave a reply



Submit