Source code for scm.input_classes.drivers.paramssensitivity

from __future__ import annotations
from pathlib import Path
from typing import Iterable, Literal, Sequence
from scm.pisa.block import DriverBlock,EngineBlock,FixedBlock,FreeBlock,InputBlock
from scm.pisa.key import BoolKey,FloatKey,FloatListKey,IntKey,IntListKey,MultipleChoiceKey,PathStringKey,StringKey,BoolType

[docs]class ParAMSSensitivity(DriverBlock):
    r"""
    
    
    :ivar EngineCollection: Path to (optional) JobCollection Engines YAML file.
    :vartype EngineCollection: str | StringKey
    :ivar FilterInfiniteValues: If Yes, removes points from the calculation with non-finite loss values.
        
        Non-finite points can cause numerical issues in the sensitivity calculation.
    :vartype FilterInfiniteValues: BoolType | BoolKey
    :ivar JobCollection: Path to JobCollection YAML file.
    :vartype JobCollection: str | StringKey
    :ivar NumberBootstraps: Number of repeats of the calculation with different sub-samples.
        
        A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.
    :vartype NumberBootstraps: int | IntKey
    :ivar NumberCalculationSamples: Number of samples from the full set available to use in the calculation.
        
        If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.
    :vartype NumberCalculationSamples: int | IntKey
    :ivar NumberSamples: Number of samples to generate during the sampling procedure.
    :vartype NumberSamples: int | IntKey
    :ivar ParameterInterface: Path to parameter interface YAML file.
    :vartype ParameterInterface: str | StringKey
    :ivar RandomSeed: Random seed to use during the sampling procedure (for reproducibility).
    :vartype RandomSeed: int | IntKey
    :ivar ResultsDirectory: Directory in which output files will be created.
    :vartype ResultsDirectory: str | Path | StringKey
    :ivar RunReweightCalculation: Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters.
        
        Note: The Gaussian kernel is recommended for the loss values kernel in this case.
    :vartype RunReweightCalculation: BoolType | BoolKey
    :ivar RunSampling: Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution.
        
        Such a set of samples serves as the input to the sensitivity calculation.
    :vartype RunSampling: BoolType | BoolKey
    :ivar SampleWithReplacement: Sample from the available data with or without replacement.
        
        This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.
    :vartype SampleWithReplacement: BoolType | BoolKey
    :ivar SamplesDirectory: Path to an 'optimization' directory containing the results of a previously run sampling.
        
        First looks for a 'glompo_log.h5' file. If not found, will look for 'running_loss.txt' and 'running_active_parameters.txt' in a sub-directory. The sub-directory used will depend on the DataSet Name.
        
        For the Reweight calculation only a 'glompo_log.h5' file (with residuals) may be used.
    :vartype SamplesDirectory: str | Path | StringKey
    :ivar SaveResiduals: During the sampling, save the individual difference between reference and predicted values for every sample and training set item.
        Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested.
        
        Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.
    :vartype SaveResiduals: BoolType | BoolKey
    :ivar SetToAnalyze: Name of the data set to use for the sensitivity analysis.
    :vartype SetToAnalyze: Literal["TrainingSet", "ValidationSet"]
    :ivar Task: Task to run.
        
        Available options:
        •MachineLearning: Optimization for machine learning models.
        •Optimization: Global optimization powered by GloMPO
        •Generate Reference: Run jobs with reference engine to get reference values
        •Single Point: Evaluate the current configuration of jobs, training data, and parameters
        •Sensitivity: Measure the sensitivity of the loss function to each of the active parameters
    :vartype Task: Literal["Optimization", "GenerateReference", "SinglePoint", "Sensitivity", "MachineLearning"]
    :ivar DataSet: Configuration settings for each data set in the optimization.
    :vartype DataSet: ParAMSSensitivity._DataSet
    :ivar LossValuesKernel: Kernel applied to the parameters for which sensitivity is being measured.
    :vartype LossValuesKernel: ParAMSSensitivity._LossValuesKernel
    :ivar ParametersKernel: Kernel applied to the parameters for which sensitivity is being measured.
    :vartype ParametersKernel: ParAMSSensitivity._ParametersKernel
    """
[docs]    class _DataSet(FixedBlock):
        r"""
        Configuration settings for each data set in the optimization.
        
        :ivar BatchSize: Number of data set entries to be evaluated per epoch. Default 0 means all entries.
        :vartype BatchSize: int | IntKey
        :ivar EvaluateEvery: This data set is evaluated every n evaluations of the training set.
            
            This will always be set to 1 for the training set. For other data sets it will be adjusted to the closest multiple of LoggingInterval%General, i.e., you cannot evaluate an extra data set more frequently than you log it.
        :vartype EvaluateEvery: int | IntKey
        :ivar LossFunction: Loss function used to quantify the error between model and reference values. This becomes the minimization task.
            
            Available options:
            • mae: Mean absolute error
            • rmse: Root mean squared error
            • sse: Sum of squared errors
            • sae: Sum of absolute errors
        :vartype LossFunction: Literal["mae", "rmse", "sse", "sae"]
        :ivar MaxJobs: Limit each evaluation to a subset of n jobs. Default 0 meaning all jobs are used.
        :vartype MaxJobs: int | IntKey
        :ivar MaxJobsShuffle: Use a different job subset every for every evaluation.
        :vartype MaxJobsShuffle: BoolType | BoolKey
        :ivar Name: Unique data set identifier.
            
            The first occurrence of DataSet will always be called training_set.
            The second will always be called validation_set.
            These cannot be overwritten.
            
            Later occurrences will default to data_set_xx where xx starts at 03 and increments from there. This field can be used to customize the latter names.
        :vartype Name: str | StringKey
        :ivar Path: Path to DataSet YAML file.
        :vartype Path: str | StringKey
        :ivar UsePipe: Use AMS Pipe for suitable jobs to speed-up evaluation.
        :vartype UsePipe: BoolType | BoolKey
        """
        def __post_init__(self):
            self.BatchSize: int | IntKey = IntKey(name='BatchSize', comment='Number of data set entries to be evaluated per epoch. Default 0 means all entries.', default=0)
            self.EvaluateEvery: int | IntKey = IntKey(name='EvaluateEvery', comment='This data set is evaluated every n evaluations of the training set.\n\nThis will always be set to 1 for the training set. For other data sets it will be adjusted to the closest multiple of LoggingInterval%General, i.e., you cannot evaluate an extra data set more frequently than you log it.', default=1)
            self.LossFunction: Literal["mae", "rmse", "sse", "sae"] = MultipleChoiceKey(name='LossFunction', comment='Loss function used to quantify the error between model and reference values. This becomes the minimization task.\n\nAvailable options:\n• mae: Mean absolute error\n• rmse: Root mean squared error\n• sse: Sum of squared errors\n• sae: Sum of absolute errors', default='sse', choices=['mae', 'rmse', 'sse', 'sae'])
            self.MaxJobs: int | IntKey = IntKey(name='MaxJobs', comment='Limit each evaluation to a subset of n jobs. Default 0 meaning all jobs are used.', default=0)
            self.MaxJobsShuffle: BoolType | BoolKey = BoolKey(name='MaxJobsShuffle', comment='Use a different job subset every for every evaluation.', default=False)
            self.Name: str | StringKey = StringKey(name='Name', comment='Unique data set identifier.\n\nThe first occurrence of DataSet will always be called training_set.\nThe second will always be called validation_set.\nThese cannot be overwritten.\n\nLater occurrences will default to data_set_xx where xx starts at 03 and increments from there. This field can be used to customize the latter names.', default='')
            self.Path: str | StringKey = StringKey(name='Path', comment='Path to DataSet YAML file.')
            self.UsePipe: BoolType | BoolKey = BoolKey(name='UsePipe', comment='Use AMS Pipe for suitable jobs to speed-up evaluation.', default=True)
[docs]    class _LossValuesKernel(FixedBlock):
        r"""
        Kernel applied to the parameters for which sensitivity is being measured.
        
        :ivar Alpha: Cut-off parameter for the Threshold kernel between zero and one.
            
            All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space. 
        :vartype Alpha: float | FloatKey
        :ivar Gamma: Bandwidth parameter for the conjunctive-Gaussian kernel.
        :vartype Gamma: float | FloatKey
        :ivar Sigma: Bandwidth parameter for the Gaussian kernel.
            
            If not specified or -1, calculates a reasonable default based on the number of parameters being tested.
        :vartype Sigma: float | FloatKey
        :ivar Type: Name of the kernel to applied to the parameters for which sensitivity is being measured.
        :vartype Type: Literal["Gaussian", "ConjunctiveGaussian", "Threshold", "Polynomial", "Linear"]
        :ivar Polynomial: Settings for the Polynomial kernel.
        :vartype Polynomial: ParAMSSensitivity._LossValuesKernel._Polynomial
        """
[docs]        class _Polynomial(FixedBlock):
            r"""
            Settings for the Polynomial kernel.
            
            :ivar Order: Maximum order of the polynomial.
            :vartype Order: int | IntKey
            :ivar Shift: Free parameter (≥ 0) trading off higher-order versus lower-order effects.
            :vartype Shift: float | FloatKey
            """
            def __post_init__(self):
                self.Order: int | IntKey = IntKey(name='Order', comment='Maximum order of the polynomial.', default=1)
                self.Shift: float | FloatKey = FloatKey(name='Shift', comment='Free parameter (≥ 0) trading off higher-order versus lower-order effects.', default=0.0)
        def __post_init__(self):
            self.Alpha: float | FloatKey = FloatKey(name='Alpha', comment='Cut-off parameter for the Threshold kernel between zero and one.\n\nAll loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space. ')
            self.Gamma: float | FloatKey = FloatKey(name='Gamma', comment='Bandwidth parameter for the conjunctive-Gaussian kernel.', default=0.3)
            self.Sigma: float | FloatKey = FloatKey(name='Sigma', comment='Bandwidth parameter for the Gaussian kernel.\n\nIf not specified or -1, calculates a reasonable default based on the number of parameters being tested.')
            self.Type: Literal["Gaussian", "ConjunctiveGaussian", "Threshold", "Polynomial", "Linear"] = MultipleChoiceKey(name='Type', comment='Name of the kernel to applied to the parameters for which sensitivity is being measured.', default='ConjunctiveGaussian', choices=['Gaussian', 'ConjunctiveGaussian', 'Threshold', 'Polynomial', 'Linear'])
            self.Polynomial: ParAMSSensitivity._LossValuesKernel._Polynomial = self._Polynomial(name='Polynomial', comment='Settings for the Polynomial kernel.')
[docs]    class _ParametersKernel(FixedBlock):
        r"""
        Kernel applied to the parameters for which sensitivity is being measured.
        
        :ivar Alpha: Cut-off parameter for the Threshold kernel between zero and one.
            
            All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space. 
        :vartype Alpha: float | FloatKey
        :ivar Gamma: Bandwidth parameter for the conjunctive-Gaussian kernel.
        :vartype Gamma: float | FloatKey
        :ivar Sigma: Bandwidth parameter for the Gaussian kernel.
        :vartype Sigma: float | FloatKey
        :ivar Type: Name of the kernel to applied to the parameters for which sensitivity is being measured.
        :vartype Type: Literal["Gaussian", "ConjunctiveGaussian", "Threshold", "Polynomial", "Linear"]
        :ivar Polynomial: Settings for the Polynomial kernel.
        :vartype Polynomial: ParAMSSensitivity._ParametersKernel._Polynomial
        """
[docs]        class _Polynomial(FixedBlock):
            r"""
            Settings for the Polynomial kernel.
            
            :ivar Order: Maximum order of the polynomial.
            :vartype Order: int | IntKey
            :ivar Shift: Free parameter (≥ 0) trading off higher-order versus lower-order effects.
            :vartype Shift: float | FloatKey
            """
            def __post_init__(self):
                self.Order: int | IntKey = IntKey(name='Order', comment='Maximum order of the polynomial.', default=1)
                self.Shift: float | FloatKey = FloatKey(name='Shift', comment='Free parameter (≥ 0) trading off higher-order versus lower-order effects.', default=0.0)
        def __post_init__(self):
            self.Alpha: float | FloatKey = FloatKey(name='Alpha', comment='Cut-off parameter for the Threshold kernel between zero and one.\n\nAll loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space. ')
            self.Gamma: float | FloatKey = FloatKey(name='Gamma', comment='Bandwidth parameter for the conjunctive-Gaussian kernel.', default=0.1)
            self.Sigma: float | FloatKey = FloatKey(name='Sigma', comment='Bandwidth parameter for the Gaussian kernel.', default=0.3)
            self.Type: Literal["Gaussian", "ConjunctiveGaussian", "Threshold", "Polynomial", "Linear"] = MultipleChoiceKey(name='Type', comment='Name of the kernel to applied to the parameters for which sensitivity is being measured.', default='Gaussian', choices=['Gaussian', 'ConjunctiveGaussian', 'Threshold', 'Polynomial', 'Linear'])
            self.Polynomial: ParAMSSensitivity._ParametersKernel._Polynomial = self._Polynomial(name='Polynomial', comment='Settings for the Polynomial kernel.')
    def __post_init__(self):
        self.EngineCollection: str | StringKey = StringKey(name='EngineCollection', comment='Path to (optional) JobCollection Engines YAML file.', default='job_collection_engines.yaml')
        self.FilterInfiniteValues: BoolType | BoolKey = BoolKey(name='FilterInfiniteValues', comment='If Yes, removes points from the calculation with non-finite loss values.\n\nNon-finite points can cause numerical issues in the sensitivity calculation.', default=True)
        self.JobCollection: str | StringKey = StringKey(name='JobCollection', comment='Path to JobCollection YAML file.', default='job_collection.yaml')
        self.NumberBootstraps: int | IntKey = IntKey(name='NumberBootstraps', comment='Number of repeats of the calculation with different sub-samples.\n\nA small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.', gui_name='Repeat calculation n times: ', default=1)
        self.NumberCalculationSamples: int | IntKey = IntKey(name='NumberCalculationSamples', comment='Number of samples from the full set available to use in the calculation.\n\nIf not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.', gui_name='Number of samples per repeat: ')
        self.NumberSamples: int | IntKey = IntKey(name='NumberSamples', comment='Number of samples to generate during the sampling procedure.', gui_name='Generate n samples: ', default=1000)
        self.ParameterInterface: str | StringKey = StringKey(name='ParameterInterface', comment='Path to parameter interface YAML file.', default='parameter_interface.yaml')
        self.RandomSeed: int | IntKey = IntKey(name='RandomSeed', comment='Random seed to use during the sampling procedure (for reproducibility).')
        self.ResultsDirectory: str | Path | StringKey = PathStringKey(name='ResultsDirectory', comment='Directory in which output files will be created.', gui_name='Working directory: ', default='results', ispath=True)
        self.RunReweightCalculation: BoolType | BoolKey = BoolKey(name='RunReweightCalculation', comment='Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters.\n\nNote: The Gaussian kernel is recommended for the loss values kernel in this case.', default=False)
        self.RunSampling: BoolType | BoolKey = BoolKey(name='RunSampling', comment='Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution.\n\nSuch a set of samples serves as the input to the sensitivity calculation.', default=False)
        self.SampleWithReplacement: BoolType | BoolKey = BoolKey(name='SampleWithReplacement', comment='Sample from the available data with or without replacement.\n\nThis only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.', default=True)
        self.SamplesDirectory: str | Path | StringKey = PathStringKey(name='SamplesDirectory', comment="Path to an 'optimization' directory containing the results of a previously run sampling.\n\nFirst looks for a 'glompo_log.h5' file. If not found, will look for 'running_loss.txt' and 'running_active_parameters.txt' in a sub-directory. The sub-directory used will depend on the DataSet Name.\n\nFor the Reweight calculation only a 'glompo_log.h5' file (with residuals) may be used.", default='', ispath=True, gui_type='directory')
        self.SaveResiduals: BoolType | BoolKey = BoolKey(name='SaveResiduals', comment='During the sampling, save the individual difference between reference and predicted values for every sample and training set item.\nRequired for the Reweight calculation, and will be automatically activated if the reweight calculation is requested.\n\nSaving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.', default=False)
        self.SetToAnalyze: Literal["TrainingSet", "ValidationSet"] = MultipleChoiceKey(name='SetToAnalyze', comment='Name of the data set to use for the sensitivity analysis.', gui_name='Analyze: ', default='TrainingSet', choices=['TrainingSet', 'ValidationSet'])
        self.Task: Literal["Optimization", "GenerateReference", "SinglePoint", "Sensitivity", "MachineLearning"] = MultipleChoiceKey(name='Task', comment='Task to run.\n\nAvailable options:\n•MachineLearning: Optimization for machine learning models.\n•Optimization: Global optimization powered by GloMPO\n•Generate Reference: Run jobs with reference engine to get reference values\n•Single Point: Evaluate the current configuration of jobs, training data, and parameters\n•Sensitivity: Measure the sensitivity of the loss function to each of the active parameters', default='Optimization', choices=['Optimization', 'GenerateReference', 'SinglePoint', 'Sensitivity', 'MachineLearning'])
        self.DataSet: ParAMSSensitivity._DataSet = self._DataSet(name='DataSet', comment='Configuration settings for each data set in the optimization.', unique=False, gui_type='Repeat at least once')
        self.LossValuesKernel: ParAMSSensitivity._LossValuesKernel = self._LossValuesKernel(name='LossValuesKernel', comment='Kernel applied to the parameters for which sensitivity is being measured.')
        self.ParametersKernel: ParAMSSensitivity._ParametersKernel = self._ParametersKernel(name='ParametersKernel', comment='Kernel applied to the parameters for which sensitivity is being measured.')
Electronic Structure

ADF

Periodic DFT

DFTB & MOPAC

Interatomic Potentials

ReaxFF

Machine Learning Potentials

Force Fields

kMC and Microkinetics

Bumblebee: OLED stacks

Fluid Thermodynamics

COSMO-RS

Workflows and Utilities

OLED workflows

ChemTraYzer2

Conformers

Reactions Discovery

AMS Driver

Properties

PES Exploration

Molecular Dynamics

Monte Carlo

Interfaces

ParAMS

PLAMS

GUI

VASP

Downloads

Windows

Mac

Linux

Documentation

Overview

Tutorials

Installation Manual

Brochures

Other Resources

Changelog

Webinars

Workshops

Knowledgebank

FAQ

Pricing and licensing

Source code for scm.input_classes.drivers.paramssensitivity