4.8.1. CMA-ES

Implementation of the Covariance Matrix Adaptation Evolution Strategy [1]. The optimizer can be used out of the box:

>>> optimizer = CMAOptimizer(sigma=0.15, popsize=15)
>>> optimization = Optimization('name', JobCollection, DataSet, interface, optimizer)
>>> optimization.optimize()
>>> print(optimization.results.x)
array([x0, ..., xN])
>>> print(optimization.results.fx)
1234.5
class CMAOptimizer(sigma: float = None, sampler: str = 'full', verbose=True, restartfile: str = None, exact_restart: bool = False, _storeC=False, **cmasettings)

Implementation of the Covariance Matrix Adaptation Evolution Strategy

Parameters:

sigma : 0 < float < 1
Standard deviation for all parameters (all parameters must be scaled accordingly).
Defines the search space as the std dev from an initial x0. Larger values will sample a initially wider Gaussian. Default value is 0.2
sampler : optional, str, ‘vkd’ or ‘vd’
Allows the use of GaussVDSampler and GaussVkDSampler settings.
verbose : bool
Produce additional output on the status of the optimization.
restartfile : str
Every 20 epochs as well as at the end of an optimization, the CMA instance will be pickled into cmadata/restart.pkl. Use this argument to load such a dump from the filepath and restart an optimization.
Note that this will only work for exactly the same problem, which will not be checked when loading.
When using a restart file, the sampler and cmasettings keywords will be ignored.
exact_restart : bool, default False
If True, the optimization will exactly continue the optimization from the point where the restartfile was written. In this case, multiple optimizations restarted from the same file will result in the exact same sampling trajectory. If False, the random state will be reset.
cmasettings : optional kwargs

cma module-specific settings as k,v pairs. See cma.s.pprint(cma.CMAOptions()) (below) for a list of available options. Most useful keys are: timeout, tolstagnation, popsize. Additionally, the following options are supported through this class:

  • minsigma: Termination if sigma < minsigma
  • maxsigma: Limit sigma to min(sigma,maxsigma). Defaults to 10*sigma
__init__(sigma: float = None, sampler: str = 'full', verbose=True, restartfile: str = None, exact_restart: bool = False, _storeC=False, **cmasettings)

Initialize with the above parameters.

reset()

This method is called when the Optimization() class is initialized and should reset a previously used optimizer instance.

4.8.1.1. List of valid cmasettings

>>> import cma
>>> cma.s.pprint(cma.CMAOptions())
AdaptSigma                True                                  or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA
CMA_active                True                                  negative update, conducted after the original update
CMA_cmean                 1                                     learning rate for the mean value
CMA_const_trace           False                                 normalize trace, 1, True, "arithm", "geom", "aeig", "geig" are valid
CMA_diagonal              0*100*N/popsize**0.5                  nb of iterations with diagonal covariance matrix, True for always
CMA_eigenmethod           np.linalg.eigh                        or cma.utils.eig or pygsl.eigen.eigenvectors
CMA_elitist               False                                 or "initial" or True, elitism likely impairs global search performance
CMA_mirrors               popsize < 6                           values <0.5 are interpreted as fraction, values >1 as numbers (rounded), otherwise about 0.16 is used
CMA_mirrormethod          2                                     0=unconditional, 1=selective, 2=selective with delay
CMA_mu                    None                                  parents selection parameter, default is popsize // 2
CMA_on                    1                                     multiplier for all covariance matrix updates
CMA_sampler               None                                  a class or instance that implements the interface of `cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass`
CMA_sampler_options       {}                                    options passed to `CMA_sampler` class init as keyword arguments
CMA_rankmu                1.0                                   multiplier for rank-mu update learning rate of covariance matrix
CMA_rankone               1.0                                   multiplier for rank-one update learning rate of covariance matrix
CMA_recombination_weights None                                  a list, see class RecombinationWeights, overwrites CMA_mu and popsize options
CMA_dampsvec_fac          np.Inf                                tentative and subject to changes, 0.5 would be a "default" damping for sigma vector update
CMA_dampsvec_fade         0.1                                   tentative fading out parameter for sigma vector update
CMA_teststds              None                                  factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production
CMA_stds                  None                                  multipliers for sigma0 in each coordinate, not represented in C, makes scaling_of_variables obsolete
CSA_dampfac               1                                     positive multiplier for step-size damping, 0.3 is close to optimal on the sphere
CSA_damp_mueff_exponent   0.5                                   zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option
CSA_disregard_length      False                                 True is untested, also changes respective parameters
CSA_clip_length_value     None                                  poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.
CSA_squared               False                                 use squared length for sigma-adaptation
BoundaryHandler           BoundTransform                        or BoundPenalty, unused when ``bounds in (None, [None, None])``
bounds                    [None, None]                          lower (=bounds[0]) and upper domain boundaries, each a scalar or a list/vector
conditioncov_alleviate    [1e8, 1e12]                           when to alleviate the condition in the coordinates and in main axes
fixed_variables           None                                  dictionary with index-value pairs like {0:1.1, 2:0.1} that are not optimized
ftarget                   -inf                                  target function value, minimization
integer_variables         []                                    index list, invokes basic integer handling: prevent std dev to become too small in the given variables
is_feasible               is_feasible                           a function that computes feasibility, by default lambda x, f: f not in (None, np.NaN)
maxfevals                 inf                                   maximum number of function evaluations
maxiter                   100 + 150 * (N+3)**2 // popsize**0.5  maximum number of iterations
mean_shift_line_samples   False                                 sample two new solutions colinear to previous mean shift
mindx                     0                                     minimal std in any arbitrary direction, cave interference with tol*
minstd                    0                                     minimal std (scalar or vector) in any coordinate direction, cave interference with tol*
maxstd                    inf                                   maximal std in any coordinate direction
pc_line_samples           False                                 one line sample along the evolution path pc
popsize                   4+int(3*np.log(N))                    population size, AKA lambda, number of new solution per iteration
randn                     np.random.randn                       randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss
scaling_of_variables      None                                  depreciated, rather use fitness_transformations.ScaleCoordinates instead (or possibly CMA_stds).
                                                                Scale for each variable in that effective_sigma0 = sigma0*scaling. Internally the variables are divided by scaling_of_variables and sigma is unchanged, default is `np.ones(N)`
seed                      time                                  random number seed for `numpy.random`; `None` and `0` equate to `time`, `np.nan` means "do nothing", see also option "randn"
signals_filename          None                                  cma_signals.in
termination_callback      None                                  a function returning True for termination, called in `stop` with `self` as argument, could be abused for side effects
timeout                   inf                                   stop if timeout seconds are exceeded, the string "2.5 * 60**2" evaluates to 2 hours and 30 minutes
tolconditioncov           1e14                                  stop if the condition of the covariance matrix is above `tolconditioncov`
tolfacupx                 1e3                                   termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0
tolupsigma                1e20                                  sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates "creeping behavior" with usually minor improvements
tolfun                    1e-11                                 termination criterion: tolerance in function value, quite useful
tolfunhist                1e-12                                 termination criterion: tolerance in function value history
tolstagnation             int(100 + 100 * N**1.5 / popsize)     termination if no improvement over tolstagnation iterations
tolx                      1e-11                                 termination criterion: tolerance in x-changes
transformation            None                                  depreciated, use cma.fitness_transformations.FitnessTransformation instead.
                                                                [t0, t1] are two mappings, t0 transforms solutions from CMA-representation to f-representation (tf_pheno),
                                                                t1 is the (optional) back transformation, see class GenoPheno
typical_x                 None                                  used with scaling_of_variables
updatecovwait             None                                  number of iterations without distribution update, name is subject to future changes
verbose                   3                                     verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented
verb_append               0                                     initial evaluation counter, if append, do not overwrite output files
verb_disp                 100                                   verbosity: display console output every verb_disp iteration
verb_filenameprefix       outcmaes                              output filenames prefix
verb_log                  1                                     verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions
verb_plot                 0                                     in fmin(): plot() is called every verb_plot iteration
verb_time                 True                                  output timings on console
vv                        {}                                    ? versatile set or dictionary for hacking purposes, value found in self.opts["vv"]

4.8.1.2. References

[1]Hansen, N. and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation 2001, 9 (2), 159-195