SciPy-least_squares
scipy.optimize.least_squares
该函数用于求解变量有界的非线性最小二乘问题。
给定残差f(x)
(n个是变量的m维实函数)和损失函数rho(s)
(标量函数),least_squares
函数寻找代价函数F(x)
的最小值:
1 | min F(x)=0.5*sum(rho(f_i(x)**2),i=0,...,m-1) |
损失函数rho(s)
的作用是减小异常点(outliers)对解的影响。
函数声明
1 | scipy.optimize.least_squares(fun, x0, jac='2-point', bounds=(-inf, inf), method='trf', |
参数
fun : callable
用于计算残差向量的函数,形如fun(x,*args,**kwargs)
,最小化是随第1个参数进行的。传入到这个函数的参数x
是一个维度为(n,)
的numpy数组(不能是一个变量,n不能等于1)。该函数必须返回一个维度为(m,)
的1维数组。如果参数x
是复数或者函数fun
返回复数残差,它必须被具有实参数的实函数包裹,就像后面的例子一样。
Function which computes the vector of residuals, with the signature fun(x, *args, **kwargs)
, i.e., the minimization proceeds with respect to its first argument. The argument x
passed to this function is an ndarray of shape (n,) (never a scalar, even for n=1). It must return a 1-d array_like of shape (m,) or a scalar. If the argument x
is complex or the function fun
returns complex residuals, it must be wrapped in a real function of real arguments, as shown at the end of the Examples section.
x0 : array_like with shape (n,) or float
估计的初值。如果传入一个浮点数,会被看作是具有1个元素的1维数组。 Initial guess on independent variables. If float, it will be treated as a 1-d array with one element.jac : {'2-point', '3-point', 'cs', callable}, optional
计算Jacobian矩阵的方法(m×n的矩阵,其中元素(i,j)是f[i]对x[j]的偏导)。通过关键词选择一个有限差分方法来做数值估算。方法3-point
更加精确,但是相比默认的2-point
需要两倍的操作。方法cs
使用复杂的步骤,虽然可能是最准确的,它仅适用于fun
能正确处理复杂输入,并且可以解析的继续道复杂的平面。算法lm
通常使用2-point
方法。如果jac
是可调用的,则形如jac(x,*args,**kwargs)
,且应该返回一个Jacobian矩阵的好的近似(或者准确值)。
Method of computing the Jacobian matrix (an m-by-n matrix, where element (i, j) is the partial derivative of f[i] with respect to x[j]). The keywords select a finite difference scheme for numerical estimation. The scheme ‘3-point’ is more accurate, but requires twice as many operations as ‘2-point’ (default). The scheme ‘cs’ uses complex steps, and while potentially the most accurate, it is applicable only when fun
correctly handles complex inputs and can be analytically continued to the complex plane. Method ‘lm’ always uses the ‘2-point’ scheme. If callable, it is used as jac(x, *args, **kwargs)
and should return a good approximation (or the exact value) for the Jacobian as an array_like (np.atleast_2d is applied), a sparse matrix or a scipy.sparse.linalg.LinearOperator
.
bounds : 2-tuple of array_like, optional
自变量的下界和上界。默认是无界的。该数组必须和x0
的大小一致,或者是一个标量(这种边界值都是一样的)。使用带有适当符号的np.inf
来禁用所有或者部分变量的边界。
Lower and upper bounds on independent variables. Defaults to no bounds. Each array must match the size of x0
or be a scalar, in the latter case a bound will be the same for all variables. Use np.inf
with an appropriate sign to disable bounds on all or some variables.
method : {'trf', 'dogbox', 'lm'}, optional
进行最小化的算法。- trf : Trust Region Reflective algorithm, particularly suitable for large sparse problems with bounds. Generally robust method.特别适用于有界大规模稀疏问题。通用健壮方法。
- dogbox : dogleg algorithm with rectangular trust regions, typical use case is small problems with bounds. Not recommended for problems with rank-deficient Jacobian. 通常用于小规模有界问题。不推荐用于缺秩(秩不满)Jacobian问题。
- lm : Levenberg-Marquardt algorithm as implemented in MINPACK. Doesn't handle bounds and sparse Jacobians. Usually the most efficient method for small unconstrained problems. 不处理有界和稀疏Jacobians。通常能有效处理小规模无界问题。
Default is trf
. See Notes for more information.
ftol : float or None, optional
代价函数变化的终止容差。默认是1e-8。当dF < ftol * F
时,优化终止。如果为None
,那么就不会以该条件来终止。
Tolerance for termination by the change of the cost function. Default is 1e-8. The optimization process is stopped when dF < ftol * F
, and there was an adequate agreement between a local quadratic model and the true model in the last step. If None, the termination by this condition is disabled.
xtol : float or None, optional
自变量变化的终止容差。默认是1e-8。准确的终止条件取决于使用的算法:
- For
trf
anddogbox
:norm(dx) < xtol * (xtol + norm(x))
- For
lm
:Delta < xtol * norm(xs)
, whereDelta
is a trust-region radius andxs
is the value ofx
scaled according tox_scale
parameter (see below).
如果为None
,那么就不会以该条件来终止
gtol : float or None, optional
梯度范数的终止容差。默认为1e-8。准确的终止条件取决于使用的算法:
Tolerance for termination by the norm of the gradient. Default is 1e-8.
The exact condition depends on a method
used:
- For
trf
:norm(g_scaled, ord=np.inf) < gtol
, whereg_scaled
is the value of the gradient scaled to account for the presence of the bounds [1]. - For
dogbox
:norm(g_free, ord=np.inf) < gtol
, whereg_free
is the gradient with respect to the variables which are not in the optimal state on the boundary. - For
lm
: the maximum absolute value of the cosine of angles between columns of the Jacobian and the residual vector is less thangtol
, or the residual vector is zero.
如果为None
,那么就不会以该条件来终止。
x_scale : array_like or 'jac', optional
Characteristic scale of each variable. Setting x_scale
is equivalent to reformulating the problem in scaled variables xs = x / x_scale
. An alternative view is that the size of a trust region along j-th dimension is proportional to x_scale[j]
. Improved convergence may be achieved by setting x_scale
such that a step of a given size along any of the scaled variables has a similar effect on the cost function. If set to ‘jac’, the scale is iteratively updated using the inverse norms of the columns of the Jacobian matrix (as described in [2]).
loss : str or callable, optional
确定损失函数的形式。以下值可用:
Determines the loss function. The following keyword values are allowed:
linear
(default) :rho(z) = z
. Gives a standard least-squares problem.默认值,即为标准最小二乘问题。soft_l1
:rho(z) = 2 * ((1 + z)**0.5 - 1)
. The smooth approximation of l1 (absolute value) loss. Usually a good choice for robust least squares.huber
:rho(z) = z if z <= 1 else 2*z**0.5 - 1
. Works similarly to ‘soft_l1’.cauchy
:rho(z) = ln(1 + z)
. Severely weakens outliers influence, but may cause difficulties in optimization process.可以严重削弱异常值的影响,但可能增加优化的难度。arctan
:rho(z) = arctan(z)
. Limits a maximum loss on a single residual, has properties similar to ‘cauchy’.限制单个残差值的上限,作用类似于cauchy
。
If callable, it must take a 1-d ndarray z=f**2
and return an array_like with shape (3, m) where row 0 contains function values, row 1 contains first derivatives and row 2 contains second derivatives. Method ‘lm’ supports only ‘linear’ loss. 如果以可调用的形式传入,那么该可调用函数必须以1维数组作为参数,并且返回(3,m)数组,其中返回值数组的第1行是函数值,第2行是1阶导数值,第3行是2阶导数值。lm
算法只支持linear
形式。
f_scale : float, optional
Value of soft margin between inlier and outlier residuals, default is 1.0. The loss function is evaluated as follows rho_(f**2) = C**2 * rho(f**2 / C**2)
, where C
is f_scale
, and rho
is determined by loss
parameter. This parameter has no effect with loss='linear'
, but for other loss
values it is of crucial importance.
max_nfev : None or int, optional
Maximum number of function evaluations before the termination. If None (default), the value is chosen automatically:
- For
trf
anddogbox
: 100 * n. - For
lm
: 100 * n ifjac
is callable and 100 * n * (n + 1) otherwise (becauselm
counts function calls in Jacobian estimation).
diff_step : None or array_like, optional
Determines the relative step size for the finite difference approximation of the Jacobian. The actual step is computed as x * diff_step
. If None (default), then diff_step
is taken to be a conventional “optimal” power of machine epsilon for the finite difference scheme used [3].
tr_solver : {None, 'exact', 'lsmr'}, optional
Method for solving trust-region subproblems, relevant only for trf
and dogbox
methods.
exact
is suitable for not very large problems with dense Jacobian matrices. The computational complexity per iteration is comparable to a singular value decomposition of the Jacobian matrix.lsmr
is suitable for problems with sparse and large Jacobian matrices. It uses the iterative procedurescipy.sparse.linalg.lsmr
for finding a solution of a linear least-squares problem and only requires matrix-vector product evaluations.
If None (default) the solver is chosen based on the type of Jacobian returned on the first iteration.
tr_options : dict, optional
Keyword options passed to trust-region solver.
tr_solver='exact'
:tr_options
are ignored.tr_solver='lsmr'
: options forscipy.sparse.linalg.lsmr
.
Additionally method='trf'
supports 'regularize'
option (bool, default is True) which adds a regularization term to the normal equation, which improves convergence if the Jacobian is rank-deficient [4] (eq. 3.4).
jac_sparsity : {None, array_like, sparse matrix}, optional
Defines the sparsity structure of the Jacobian matrix for finite difference estimation, its shape must be (m, n). If the Jacobian has only few non-zero elements in each row, providing the sparsity structure will greatly speed up the computations [5]. A zero entry means that a corresponding element in the Jacobian is identically zero. If provided, forces the use of lsmr
trust-region solver. If None (default) then dense differencing will be used. Has no effect for lm
method.
verbose : {0, 1, 2}, optional
Level of algorithm’s verbosity:
- 0 (default) : work silently. 默认值,以后台形式工作。
- 1 : display a termination report. 显示终端报告。
- 2 : display progress during iterations (not supported by ‘lm’ method). 显示迭代过程(
lm
算法不支持)。
args, kwargs : tuple and dict, optional
Additional arguments passed to fun
and jac
. Both empty by default.
The calling signature is fun(x, *args, **kwargs)
and the same for jac
.
返回值
返回值为OptimizeResult
对象,有如下一些属性:
x : ndarray, shape (n,)
Solution found.
cost : float
Value of the cost function at the solution.
fun : ndarray, shape (m,)
Vector of residuals at the solution.
jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.
grad : ndarray, shape (m,)
Gradient of the cost function at the solution.
optimality : float
First-order optimality measure. In unconstrained problems, it is always the uniform norm of the gradient. In constrained problems, it is the quantity which was compared with gtol
during iterations.
active_mask : ndarray of int, shape (n,)
Each component shows whether a corresponding constraint is active (that is, whether a variable is at the bound):
0
: a constraint is not active.-1
: a lower bound is active.1
: an upper bound is active.
Might be somewhat arbitrary for ‘trf’ method as it generates a sequence of strictly feasible iterates and active_mask
is determined within a tolerance threshold.
nfev : int
Number of function evaluations done. Methods ‘trf’ and ‘dogbox’ do not count function calls for numerical Jacobian approximation, as opposed to ‘lm’ method.
njev : int or None
Number of Jacobian evaluations done. If numerical Jacobian approximation is used in ‘lm’ method, it is set to None.
status : int
The reason for algorithm termination:
-1
: improper input parameters status returned from MINPACK.0
: the maximum number of function evaluations is exceeded.1
:gtol
termination condition is satisfied.2
:ftol
termination condition is satisfied.3
:xtol
termination condition is satisfied.4
: Bothftol
andxtol
termination conditions are satisfied.
message : str
Verbal description of the termination reason.
success : bool
True if one of the convergence criteria is satisfied (status
> 0).
注意
lm
Method lm
(Levenberg-Marquardt) calls a wrapper over least-squares algorithms implemented in MINPACK (lmder, lmdif). It runs the Levenberg-Marquardt algorithm formulated as a trust-region type algorithm. The implementation is based on paper [2:1], it is very robust and efficient with a lot of smart tricks. It should be your first choice for unconstrained problems. Note that it doesn’t support bounds. Also it doesn’t work when .
算法lm
(Levenberg-Marquardt)调用MINPACK(lmder,lmdif)中实现的最小二乘算法。它以和trust-region类型的算法类似的形式运行。其实现是基于[2:2],它非常健壮有效,且有很多方便的设置技巧,应该作为无约束问题的首选。注意它不适用于有边界的问题,同样对于时也不适用。
trf
Method trf
(Trust Region Reflective) is motivated by the process of solving a system of equations, which constitute the first-order optimality condition for a bound-constrained minimization problem as formulated in [1:1]. The algorithm iteratively solves trust-region subproblems augmented by a special diagonal quadratic term and with trust-region shape determined by the distance from the bounds and the direction of the gradient. This enhancements help to avoid making steps directly into bounds and efficiently explore the whole space of variables. To further improve convergence, the algorithm considers search directions reflected from the bounds. To obey theoretical requirements, the algorithm keeps iterates strictly feasible. With dense Jacobians trust-region subproblems are solved by an exact method very similar to the one described in [2:3] (and implemented in MINPACK). The difference from the MINPACK implementation is that a singular value decomposition of a Jacobian matrix is done once per iteration, instead of a QR decomposition and series of Givens rotation eliminations. For large sparse Jacobians a 2-d subspace approach of solving trust-region subproblems is used [1:2], [4:1]. The subspace is spanned by a scaled gradient and an approximate Gauss-Newton solution delivered by scipy.sparse.linalg.lsmr
. When no constraints are imposed the algorithm is very similar to MINPACK and has generally comparable performance. The algorithm works quite robust in unbounded and bounded problems, thus it is chosen as a default algorithm.
dogbox
Method dogbox
operates in a trust-region framework, but considers rectangular trust regions as opposed to conventional ellipsoids [6]. The intersection of a current trust region and initial bounds is again rectangular, so on each iteration a quadratic minimization problem subject to bound constraints is solved approximately by Powell’s dogleg method [7]. The required Gauss-Newton step can be computed exactly for dense Jacobians or approximately by scipy.sparse.linalg.lsmr
for large sparse Jacobians. The algorithm is likely to exhibit slow convergence when the rank of Jacobian is less than the number of variables. The algorithm often outperforms ‘trf’ in bounded problems with a small number of variables.
Robust loss functions are implemented as described in [8]. The idea is to modify a residual vector and a Jacobian matrix on each iteration such that computed gradient and Gauss-Newton Hessian approximation match the true gradient and Hessian approximation of the cost function. Then the algorithm proceeds in a normal way, i.e. robust loss functions are implemented as a simple wrapper over standard least-squares algorithms.
versionadded:: 0.17.0
示例
无约束最小值
在这个例子中,我们来寻找Rosenbrock
函数的最小值,不给定自变量的边界。
Rosenbrock
函数的定义:
1 | def fun_rosenbrock(x): |
注意到这里函数代码中只给出了残差的的向量形式,算法会自动构建损失函数为残差平方和的形式(默认loss
参数就是的),即还原为Rosenbrock
函数。其取最小值的精确解为x=[1.0,1.0]
。
1 | import numpy as np |
1 | active_mask: array([0., 0.]) |
有约束最小值
现在我们给定约束变量:,不限制。这样,我们就需要在least_squares
函数中指定bounds
变量:bounds=([-np.inf,1.5],np.inf)
(注意:第一部分是下限,即最小值是-np.inf
,最小值是1.5)。
除此之外,我们再提供Jacobian矩阵的解析形式:
1 | def jac_rosenbrock(x): |
此时,求解过程为:
1 | res_2 = least_squares(fun_rosenbrock, x0_rosenbrock, jac_rosenbrock, |
1 | active_mask: array([ 0, -1]) |
方程组:大型系数矩阵
求解10万个变量的Broyden三对角向量值函数方程组(即损失函数需要在取最小值时取0)。
1 | def fun_broyden(x): |
相应的Jacobian矩阵是系数的。我们用有限差分法来估算,并给出Jacobian矩阵的稀疏形式以显著加快这个过程。
1 | from scipy.sparse import lil_matrix |
1 | active_mask: array([0., 0., 0., ..., 0., 0., 0.]) |
曲线拟合:含异常点的问题
使用健壮的损失函数来解决数据中存在异常点的曲线拟合问题。模型函数定义为,其中是预测变量,是观测值,是待估计的参数。
首先定义函数用来生成带有噪声和异常点的数据,定义模型参数,生成数据:
1 | def gen_data(t, a, b, c, noise=0, n_outliers=0, random_state=0): |
定义函数来计算残差,初始化待估计的参数:
1 | def fun(x, t, y): |
求解:
1 | res_lsq = least_squares(fun, x0, args=(t_train, y_train)) |
然后用另外两个健壮的损失函数求解。将参数f_scale
值设置为0.1,意味着点的残差不应该超过0.1(参考的是生成数据时使用的噪声等级值)。
1 | res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1, |
最后绘制曲线。可以看出,通过选用合适的损失函数,我们可以得到更优的结果,即便是存在异常点。但记住通常推荐先尝试soft_l1
和huber
,因为另外两个选项的优化过程可能比较困难。
1 | t_test = np.linspace(t_min, t_max, n_points * 10) |
复变量的复值残差函数
参考最后一个例子:点击跳转
参考文献
利用Levenberg_Marquardt算法求解无约束的非线性最小二乘问题
SciPy doc: least_squares
M. A. Branch, T. F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999. ↩︎ ↩︎ ↩︎
J. J. More, “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977. ↩︎ ↩︎ ↩︎ ↩︎
William H. Press et. al., “Numerical Recipes. The Art of Scientific Computing. 3rd edition”, Sec. 5.7. ↩︎
R. H. Byrd, R. B. Schnabel and G. A. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”, Math. Programming, 40, pp. 247-263, 1988. ↩︎ ↩︎
A. Curtis, M. J. D. Powell, and J. Reid, “On the estimation of sparse Jacobian matrices”, Journal of the Institute of Mathematics and its Applications, 13, pp. 117-120, 1974. ↩︎
C. Voglis and I. E. Lagaris, “A Rectangular Trust Region Dogleg Approach for Unconstrained and Bound Constrained Nonlinear Optimization”, WSEAS International Conference on Applied Mathematics, Corfu, Greece, 2004. ↩︎
J. Nocedal and S. J. Wright, “Numerical optimization, 2nd edition”, Chapter 4. ↩︎
B. Triggs et. al., “Bundle Adjustment - A Modern Synthesis”, Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, pp. 298-372, 1999. ↩︎