Main Content

Checking Validity of Gradients or Jacobians

Optimization solvers often compute more quickly and reliably when you provide first derivatives of objective and nonlinear constraint functions. However, you can easily make a mistake in programming these derivatives. To guard against incorrect derivatives, check the output of the programmed derivative against a finite-difference approximation by using the checkGradients function.

Check Gradient of Objective Function

Consider the objective Rastrigin's function

Ras(x)=20+x12+x2210cos(2πx1+2πx2).

The following function computes Rastrigin's function and its gradient correctly.

function [f,g] = ras(x)
f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2));
if nargout > 1
    g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2));
    g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2));
end
end

Check that the ras function correctly computes the gradient near a random initial point. To see details of the calculation, set the Display name-value parameter to "on".

rng default
x0 = randn(1,2);
valid = checkGradients(@ras,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.32446e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

The badras function computes the gradient incorrectly.

function [f,g] = badras(x)
f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2));
if nargout > 1
    g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2));
    g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2));
end
end

checkGradients correctly reports that the badras function computes gradients incorrectly.

valid = checkGradients(@badras,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 0.494224.
  Supplied derivative element (1,1): 92.9841
  Finite-difference derivative element (1,1): 47.0291

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  logical

   0

Use the correct gradient to find a local minimum of ras(x).

rng default
x0 = randn(1,2);
options = optimoptions("fminunc",SpecifyObjectiveGradient=true);
[x,fval,exitflag,output] = fminunc(@ras,x0,options)
Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.

x =

    0.9975    0.9975


fval =

   11.9949


exitflag =

     1


output = 

  struct with fields:

       iterations: 9
        funcCount: 13
         stepsize: 5.7421e-05
     lssteplength: 1
    firstorderopt: 1.2426e-05
        algorithm: 'quasi-newton'
          message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'

The solver takes 9 iterations and only 13 function evaluations to reach a local minimum. Without the gradient, the solver takes more function evaluations.

[x,fval,exitflag,output] = fminunc(@ras,x0) % No options
Local minimum found.
...
output = 

  struct with fields:

       iterations: 9
        funcCount: 39
...

This time, the solver takes 39 function evaluations to reach essentially the same solution as before.

Check Jacobian of Vector Objective Function

The lsqnonlin, lsqcurvefit, and fsolve functions use vector-valued objective functions. (The objective function that these solvers try to minimize is the sum of squares of the vector-valued objective.) A Jacobian is the first derivative of a vector-valued objective function. The Jacobian J of a function F(x) with m components, where the variable x has n components, is an m-by-n matrix. J(i,j) is the partial derivative of Fi(x) with respect to xj.

For example, the vecfun code calculates the gradient for a function of four parameters (a, b, c, and d in the code) with a number of variables that depends on the input data t. The t vector corresponds to a set of times, and each time leads to two entries in the objective function F(x,t).

function [F,J] = vecfun(x,t)
t = t(:); % Reshape t to a column vector
a = x(1);
b = x(2);
c = x(3);
d = x(4);
nt = length(t);
F = [a*exp(-b*t) + c,...
    c*exp(-d*t)]; % numel(F) = 2*nt
if nargout > 1
    J = zeros(2*nt,4);
    J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c
    J(1:nt,2) = (-t.*a.*exp(-b*t));
    J(1:nt,3) = ones(nt,1);
    J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t)
    J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t));
end
end

Validate the gradient calculation in vecfun.

t = [-1/2 1/2 1]; % Three times
x = [1 2 3 4]; % Arbitrary x point
valid = checkGradients(@(x)vecfun(x,t),x,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.51598e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

Create artificial response data corresponding to the vecfun function at a set of times. Use the Jacobian to help fit parameters of the function to the data.

t = linspace(1,5);
x0 = [1 2 3 4];
rng default
x01 = rand(1,4);
y = vecfun(x0,t);
y = y + 0.1*randn(size(y)); % Add noise to response
options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true);
[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)
Local minimum possible.

lsqcurvefit stopped because the final change in the sum of squares relative to 
its initial value is less than the value of the function tolerance.

x =

    0.9575    1.4570    2.9894    3.7289


resnorm =

    2.2076


exitflag =

     3


output = 

  struct with fields:

      firstorderopt: 1.0824e-04
         iterations: 11
          funcCount: 12
       cgiterations: 0
          algorithm: 'trust-region-reflective'
           stepsize: 0.0111
            message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.'
       bestfeasible: []
    constrviolation: []

The returned solution vector, [0.9575,1.4570,2.9894,3.7289], is close to the original vector [1,2,3,4]. The solver takes just 12 function evaluations. Without the Jacobian, the solver takes more evaluations.

[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No options
Local minimum possible.
...
output = 

  struct with fields:

      firstorderopt: 1.0825e-04
         iterations: 11
          funcCount: 60
...

This time, the solver takes 60 function evaluations to reach essentially the same solution as before.

Modify Finite-Difference Options and checkGradients Arguments

Sometimes checkGradients can give an incorrect result:

  • For a correct gradient, checkGradients can give a false report of an invalid check. Typically, this occurs because the function has a relatively large second derivative, so the finite-difference estimate is inaccurate. The false report can also occur because the value of the Tolerance name-value argument is too small.

  • For an incorrect gradient, checkGradients can give a false report of a valid check. Typically, this false report occurs because the value of the Tolerance name-value argument is too large, or because the results contain a coincidentally matching derivative.

To check the gradient more carefully when you get a false report of an invalid check, change the finite differencing options. For example, checkGradients incorrectly reports that the ras gradient is incorrect starting from the point [-2,4].

valid = checkGradients(@ras,[-2,4],Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.88497e-06.
  Supplied derivative element (1,2): 6.21515
  Finite-difference derivative element (1,2): 6.21516

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  logical

   0

Set the FiniteDifferenceType option to "central" and test again.

opts = optimoptions("fmincon",FiniteDifferenceType="central");
valid = checkGradients(@ras,[-2,4],opts,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.10182e-09.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

Central finite differences are typically more accurate. In this case, the relative difference between the computed gradient and the central finite-difference estimate is about 1e-9. The default forward finite difference gives a relative difference of about 2e-6.

Check the validity of the gradient by loosening the tolerance from the default 1e-6 to 1e-5.

valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.88497e-06.

checkGradients successfully passed.
____________________________________________________________


valid =

  logical

   1

In this case, checkGradients correctly reports that the gradient is valid. The looser tolerance does not cause the badras function to pass the check.

valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 0.400823.
  Supplied derivative element (1,2): 4.4368
  Finite-difference derivative element (1,2): 6.21516

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-05).
____________________________________________________________


valid =

  logical

   0

Check Derivatives of Nonlinear Constraints

The unitdisk2 function correctly calculates a constraint function and its gradient to keep x inside the disk of radius 1.

function [c,ceq,gc,gceq] = unitdisk2(x)
c = x(1)^2 + x(2)^2 - 1;
ceq = [ ];

if nargout > 2
    gc = [2*x(1);2*x(2)];
    gceq = [];
end
end

The unitdiskb function incorrectly calculates the gradient of the constraint function.

function [c,ceq,gc,gceq] = unitdiskb(x)
c = x(1)^2 + x(2)^2 - 1;
ceq = [ ];

if nargout > 2
    gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2
    gceq = [];
end
end

Set the IsConstraint name-value argument to true and check the gradient at the point x0 = randn(1,2).

rng default
x0 = randn(1,2);
valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")
____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.53459e-08.

checkGradients successfully passed.
____________________________________________________________


valid =

  1×2 logical array

   1   1

checkGradients correctly reports that the gradient of unitdisk2 is valid.

valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")
____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.
  Supplied derivative element (2,1): 1.8324
  Finite-difference derivative element (2,1): 3.66479

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________


valid =

  1×2 logical array

   0   1

checkGradients correctly reports that the gradient of unitdiskb is not valid.

Check Gradients in Script

You can use the checkGradients function to stop a script when a finite-difference approximation does not match the corresponding gradient function.

rng default
x0 = randn(1,2);
opts = optimoptions("fmincon",FiniteDifferenceType="central");
assert(checkGradients(@ras,x0,opts,Display="on"))
assert(all(checkGradients(@unitdisk2,x0,opts,...
    IsConstraint=true,Display="on")))
[x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.52301e-10.

checkGradients successfully passed.
____________________________________________________________

____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.44672e-11.

checkGradients successfully passed.
____________________________________________________________


Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.

x =

    0.4987    0.4987


fval =

   10.4987

Run the same script with the incorrect unitdiskb constraint function. Note that the script stops early.

rng default
x0 = randn(1,2);
opts = optimoptions("fmincon",FiniteDifferenceType="central");
assert(checkGradients(@ras,x0,opts,Display="on"))
assert(all(checkGradients(@unitdiskb,x0,opts,...
    IsConstraint=true,Display="on")))
[x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 7.52301e-10.

checkGradients successfully passed.
____________________________________________________________

____________________________________________________________

Nonlinear inequality constraint derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.
  Supplied derivative element (2,1): 1.8324
  Finite-difference derivative element (2,1): 3.66479

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________

Error using assert
Assertion failed.

See Also

Related Topics