7.10.2. Optimization

The Optimization class is where the other components – Job Collection, Data Set, Parameter Interfaces and optimizer – come together. It is responsible for the selection, generation, execution and evaluation of new jobs for every new parameter set.

See also

Architecture Quick Reference for an overview

A Optimization instance will usually be initialized once every other component is defined:

>>> interface     = ReaxFFParameters('path/to/ffield.ff')
>>> jc            = JobCollection('path/to/jobcol.yml')
>>> training_set  = DataSet('path/to/data_set.yml')
>>> optimizer     = CMAOptimizer(popsize=15)
>>> optimization  = Optimization(jc, training_set, interface, optimizer)

Once initialized, the following will run a complete optimization:

>>> optimization.optimize()

After instantiation, a summary of all relevant settings can be printed with summary():

>>> optimization.summary()
Optimization() Instance Settings:
=================================
Workdir:                           opt
JobCollection size:                20
Interface:                         ReaxFFParameters
Active parameters:                 207
Optimizer:                         CMAOptimizer

Evaluators:
-----------
Name:                              training_set (_LossEvaluator)
Loss:                              SSE
Evaluation interval:              1

Data Set entries:                  20
Data Set jobs:                     20
Batch size:                        None

CPU cores:                         6
Use PIPE:                          True
---
===

7.10.2.1. Optimization Setup

The optimization can be further controlled by providing a number of optional keyword arguments to the Optimization instance. While the full list of arguments is documented in the API section below, the most relevant ones are presented here.

parallellevels

An instance of the ParallelLevels class describing how the optimization is to be parallelized.

constraints

Constraints additionally define the parameter search space by checking if every solution is consistent with the definition.

validation

Percentage of the training_set entries to be used for validation.

loss

The loss function to be used for this optimization instance.

batch_size

Instead of evaluating all properties in the training_set, evaluate a maximum of randomly picked batch_size entries per iteration.

7.10.2.2. Optimization API

class Optimization(job_collection: scm.params.core.jobcollection.JobCollection, data_sets: Union[scm.params.core.dataset.DataSet, Sequence[scm.params.core.dataset.DataSet]], parameter_interface: scm.params.parameterinterfaces.base.BaseParameters, optimizer: Optional[Union[scm.glompo.optimizers.baseoptimizer.BaseOptimizer, scm.glompo.opt_selectors.baseselector.BaseSelector]] = None, workdir: str = 'optimization', plams_workdir_path: Optional[str] = None, validation: Optional[float] = None, constraints: Optional[Sequence[scm.params.parameterinterfaces.base.Constraint]] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, verbose: bool = True, skip_x0: bool = False, logger_every: Optional[Union[dict, int]] = None, loss: Union[scm.params.core.lossfunctions.Loss, Sequence[scm.params.core.lossfunctions.Loss]] = 'sse', batch_size: Optional[Union[int, Sequence[int]]] = None, use_pipe: Union[bool, Sequence[bool]] = True, data_set_names: Optional[Sequence[str]] = None, eval_every: Union[int, Sequence[int]] = 1, maxjobs: Union[None, Sequence[int]] = None, maxjobs_shuffle: Union[bool, Sequence[bool]] = False, resume_checkpoint: Optional[Union[str, pathlib.Path]] = None, **glompo_kwargs)

Brings ParAMS components together and allows for configuration of optimization manager.

For compatibility the signature remains the same.

For most parameters the meaning of the parameters remains the same, only those that have changed are documented below.

Parameters

optimizer

Accepts a single optimizer as normal, but now also accepts a GloMPO BaseSelector collection of optimizers if you would like to use more than one.

Note

The worker argument of the optimizers will be overwritten by the product of all values in parallel except for optimizations.

Important

Since multiple optimizers can be started in parallel the instance given to this argument will only be used as a template from which other instances will be made. This means the instance given here will not be used for optimization. Keep this in mind if you intend to retain a reference to the optimizer instance for later post-processing.

title

The working directory for this optimization. Once optimize() is called, will NOT switch to it. (see glompo_kwargs)

verbose

Active GloMPO’s logging progress feedback.

glompo_kwargs

GloMPO related arguments sent to GloMPOManager.setup().

The following extra keywords are allowed:

'scaler'

Extra keyword which specifies the type of scaling used by function. Defaults to a linear scaling of all parameters between 0 and 1 if none of the used optimizers requests a particular scaling. An error will be raised if there is a conflict between any combination of this keyword and those mandated by the optimizers.

The following keywords will be ignored if provided:

'opt_selector'

Constructed from optimizer.

'bounds'

Automatically extracted from parameter_interface.

'task'

It is constructed within this class from job_collection, data_set, parameter_interface.

'working_dir'

title will be used as this parameter.

'overwrite_existing'

No overwriting allowed according to ParAMS behavior. title will be incremented until a non-existent directory is found.

'max_jobs'

Will be calculated from parallel.

'backend'

Only 'threads' are allowed within ParAMS.

'is_log_detailed'

This must be True for the sake of ParAMS internals.

__init__(job_collection: scm.params.core.jobcollection.JobCollection, data_sets: Union[scm.params.core.dataset.DataSet, Sequence[scm.params.core.dataset.DataSet]], parameter_interface: scm.params.parameterinterfaces.base.BaseParameters, optimizer: Optional[Union[scm.glompo.optimizers.baseoptimizer.BaseOptimizer, scm.glompo.opt_selectors.baseselector.BaseSelector]] = None, workdir: str = 'optimization', plams_workdir_path: Optional[str] = None, validation: Optional[float] = None, constraints: Optional[Sequence[scm.params.parameterinterfaces.base.Constraint]] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, verbose: bool = True, skip_x0: bool = False, logger_every: Optional[Union[dict, int]] = None, loss: Union[scm.params.core.lossfunctions.Loss, Sequence[scm.params.core.lossfunctions.Loss]] = 'sse', batch_size: Optional[Union[int, Sequence[int]]] = None, use_pipe: Union[bool, Sequence[bool]] = True, data_set_names: Optional[Sequence[str]] = None, eval_every: Union[int, Sequence[int]] = 1, maxjobs: Union[None, Sequence[int]] = None, maxjobs_shuffle: Union[bool, Sequence[bool]] = False, resume_checkpoint: Optional[Union[str, pathlib.Path]] = None, **glompo_kwargs)

Initialize self. See help(type(self)) for accurate signature.

classmethod read(input_text: Union[str, scm_libbase_internal.InputFile, pathlib.Path], **kwargs)scm.params.core.parameteroptimization.Optimization

Create and optimization instance by reading an AMS style input file.

optimize()scm.glompo.optimizers.baseoptimizer.MinimizeResult

Start the optimization given the initial parameters.

initial_eval()float

Evaluate x0 before the optimization.

Returns

float

Error value using parameters as loaded from the parameter interface.

Raises

ValueError

If fx is a non-finite value.

summary(file=None)

Prints a summary of the current instance

__str__()

Return str(self).

delete()

Remove the working directory from disk.

_relog_bests(task: scm.params.core.opt_components._Step)

Evaluate the saved best points for the points being restarted and the overall best and use them to prime new Loggers. This ensures the correct ‘best’ value is returned even if that evaluation does not appear in the ‘running_*.txt’ files.