Function File: [x0,v,nev] cg_min ( f,df,args,ctl )

NonLinear Conjugate Gradient method to minimize function f.

Arguments

  • f : string : Name of function. Return a real value
  • df : string : Name of f’s derivative. Returns a (R*C) x 1 vector
  • args: cell : Arguments passed to f.
  • ctl : 5-vec : (Optional) Control variables, described below

Returned values

  • x0 : matrix : Local minimum of f
  • v : real : Value of f in x0
  • nev : 1 x 2 : Number of evaluations of f and of df

Control Variables

  • ctl(1) : 1 or 2 : Select stopping criterion amongst :
  • ctl(1)==0 : Default value
  • ctl(1)==1 : Stopping criterion : Stop search when value doesn’t improve, as tested by ctl(2) > Deltaf/max(|f(x)|,1) where Deltaf is the decrease in f observed in the last iteration (each iteration consists R*C line searches).
  • ctl(1)==2 : Stopping criterion : Stop search when updates are small, as tested by ctl(2) > max { dx(i)/max(|x(i)|,1) |i in 1..N } where dx is the change in the x that occured in the last iteration.
  • ctl(2) : Threshold used in stopping tests. Default=10*eps
  • ctl(2)==0 : Default value
  • ctl(3) : Position of the minimized argument in args Default=1
  • ctl(3)==0 : Default value
  • ctl(4) : Maximum number of function evaluations Default=inf
  • ctl(4)==0 : Default value
  • ctl(5) : Type of optimization:
  • ctl(5)==1 : "Fletcher-Reves" method
  • ctl(5)==2 : "Polak-Ribiere" (Default)
  • ctl(5)==3 : "Hestenes-Stiefel" method

ctl may have length smaller than 4. Default values will be used if ctl is not passed or if nan values are given.

Example:

function r=df( l ) b=[1;0;-1]; r = -( 2*l{1} - 2*b + rand(size(l{1}))); endfunction
function r=ff( l ) b=[1;0;-1]; r = (l{1}-b)’ * (l{1}-b); endfunction
ll = { [10; 2; 3] };
ctl(5) = 3;
[x0,v,nev]=cg_min( "ff", "df", ll, ctl )

Comment: In general, BFGS method seems to be better performin in many cases but requires more computation per iteration See also http://en.wikipedia.org/wiki/Nonlinear_conjugate_gradient.

See also: bfgsmin.

Demonstration 1

The following code

 P = 15; # Number of parameters
 R = 20; # Number of observations (must have R >= P)
 
 obsmat = randn (R, P);
 truep = randn (P, 1);
 xinit = randn (P, 1);
 obses = obsmat * truep;
 
 msq = @(x) mean (x (!isnan(x)).^2);
 ff  = @(x) msq (obses - obsmat * x{1}) + 1;
 dff = @(x) 2 / rows (obses) * obsmat.' * (-obses + obsmat * x{1});
 
 tic;
 [xlev,vlev,nlev] = cg_min (ff, dff, xinit) ;
 toc;
 
 printf ("  Costs :     init=%8.3g, final=%8.3g, best=%8.3g\n", ...
         ff ({xinit}), vlev, ff ({truep}));
 
 if (max (abs (xlev-truep)) > 100*sqrt (eps))
   printf ("Error is too big : %8.3g\n", max (abs (xlev-truep)));
 else
   printf ("All tests ok\n");
 endif

Produces the following output

warning: The function `poly_2_ex' will be removed from future versions of the optim package since it is not related to optimization.
warning: called from
    poly_2_ex at line 33 column 5
    brent_line_min at line 151 column 46
    cg_min at line 135 column 23
    get_output at line 50 column 5
    __html_help_text__ at line 67 column 28
    generate_package_html>wrote_html at line 842 column 5
    generate_package_html at line 207 column 11

Elapsed time is 0.306824 seconds.
  Costs :     init=    24.6, final=       1, best=       1
All tests ok

Demonstration 2

The following code

 N = 1 + floor (30 * rand ());
 truemin = randn (N, 1);
 offset  = 100 * randn ();
 metric = randn (2 * N, N); 
 metric = metric.' * metric;
 
 if (N > 1)
   [u,d,v] = svd (metric);
   d = (0.1+[0:(1/(N-1)):1]).^2;
   metric = u * diag (d) * u.';
 endif
 
 testfunc = @(x) sum((x{1}-truemin)'*metric*(x{1}-truemin)) + offset;
 dtestf = @(x) metric' * 2*(x{1}-truemin);
 
 xinit = 10 * randn (N, 1);
 
 [x, v, niter] = cg_min (testfunc, dtestf, xinit);
 
 if (any (abs (x-truemin) > 100 * sqrt(eps)))
   printf ("NOT OK 1\n");
 else
   printf ("OK 1\n");
 endif
 
 if (v-offset > 1e-8)
   printf ("NOT OK 2\n");
 else
   printf ("OK 2\n");
 endif
 
 printf ("nev=%d  N=%d  errx=%8.3g   errv=%8.3g\n",...
         niter (1), N, max (abs (x-truemin)), v-offset);

Produces the following output

OK 1
OK 2
nev=298  N=19  errx=3.86e-07   errv=5.83e-13

Demonstration 3

The following code

 P = 2; # Number of parameters
 R = 3; # Number of observations
 
 obsmat = randn (R, P);
 truep  = randn (P, 1);
 xinit  = randn (P, 1);
 
 obses = obsmat * truep;
 
 msq = @(x) mean (x (!isnan(x)).^2);
 ff = @(xx) msq (xx{3} - xx{2} * xx{1}) + 1;
 dff = @(xx) 2 / rows(xx{3}) * xx{2}.' * (-xx{3} + xx{2}*xx{1});
 
 tic;
 x = {xinit, obsmat, obses};
 [xlev, vlev, nlev] = cg_min (ff, dff, x);
 toc;
 
 xinit_ = {xinit, obsmat, obses};
 xtrue_ = {truep, obsmat, obses};
 printf ("  Costs :     init=%8.3g, final=%8.3g, best=%8.3g\n", ...
         ff (xinit_), vlev, ff (xtrue_));
 
 if (max (abs(xlev-truep)) > 100*sqrt (eps))
   printf ("Error is too big : %8.3g\n", max (abs (xlev-truep)));
 else
   printf ("All tests ok\n");
 endif

Produces the following output

Elapsed time is 0.051146 seconds.
  Costs :     init=    1.59, final=       1, best=       1
All tests ok

Package: optim