5.4. Task: Sensitivity

Task Sensitivity runs an HSIC sensitivity analysis quantifying the effect of active parameters on the loss function.

To understand this task better, see the tutorial on Parameter sensitivity analysis.

5.4.1. Generating Samples

The first step of the sensitivity analysis is obtaining i.i.d uniformly-distributed random samples of the parameter space.

Samples can be automatically generated as part of the sensitivity calculation by using the RunSampling key:

RunSampling
Type

Bool

Default value

No

Description

Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution. Such a set of samples serves as the input to the sensitivity calculation.

Note

If you chose to generate samples, then the shared collection keys must also be defined so that ParAMS can construct the loss function to sample.

If you generating samples, you may select the number and manner in which you would like to sample:

NumberSamples
Type

Integer

Default value

1000

GUI name

Generate n samples:

Description

Number of samples to generate during the sampling procedure.

RandomSeed
Type

Integer

Description

Random seed to use during the sampling procedure (for reproducibility).

SaveResiduals
Type

Bool

Default value

No

Description

During the sampling, save the individual difference between reference and predicted values for every sample and training set item. Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested. Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.

Tip

You can reuse the residuals to calculate new loss values:

  • with different loss functions;

  • after changes to weights/sigmas; and

  • after removing training set items.

This allows you to tweak and tailor your training set and run a new sensitivity calculation without having to resample.

5.4.2. Loading Samples

Instead of generating samples, they can be loaded:

SamplesDirectory
Type

String

Default value

Description

Path to an ‘optimization’ directory containing the results of a previously run sampling. First looks for a ‘glompo_log.h5’ file. If not found, will look for ‘running_loss.txt’ and ‘running_active_parameters.txt’ in a sub-directory. The sub-directory used will depend on the DataSet Name. For the Reweight calculation only a ‘glompo_log.h5’ file (with residuals) may be used.

5.4.3. Kernels

The most important configurable options are the kernels applied to the parameter values and loss values.

We generally recommend:

  • applying the Gaussian kernel to the parameter values (in order to capture dis/similarity between parameter sets);

  • applying the conjunctive-Gaussian kernel to the loss values (in order to focus the weight of the distribution on good minima); and

  • using polynomial or linear kernels only if you have a specific type of relationship/dependency you would like to investigate.

ParametersKernel
   Alpha float
   Gamma float
   Polynomial
      Order integer
      Shift float
   End
   Sigma float
   Type [Gaussian | ConjunctiveGaussian | Threshold | Polynomial | Linear]
End
ParametersKernel
Type

Block

Description

Kernel applied to the parameters for which sensitivity is being measured.

Alpha
Type

Float

Description

Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma
Type

Float

Default value

0.1

Description

Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial
Type

Block

Description

Settings for the Polynomial kernel.

Order
Type

Integer

Default value

1

Description

Maximum order of the polynomial.

Shift
Type

Float

Default value

0.0

Description

Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma
Type

Float

Default value

0.3

Description

Bandwidth parameter for the Gaussian kernel.

Type
Type

Multiple Choice

Default value

Gaussian

Options

[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]

Description

Name of the kernel to applied to the parameters for which sensitivity is being measured.

LossValuesKernel
Type

Block

Description

Kernel applied to the parameters for which sensitivity is being measured.

Alpha
Type

Float

Description

Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma
Type

Float

Default value

0.3

Description

Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial
Type

Block

Description

Settings for the Polynomial kernel.

Order
Type

Integer

Default value

1

Description

Maximum order of the polynomial.

Shift
Type

Float

Default value

0.0

Description

Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma
Type

Float

Description

Bandwidth parameter for the Gaussian kernel. If not specified or -1, calculates a reasonable default based on the number of parameters being tested.

Type
Type

Multiple Choice

Default value

ConjunctiveGaussian

Options

[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]

Description

Name of the kernel to applied to the parameters for which sensitivity is being measured.

5.4.4. Other settings

The sensitivity analysis can be run on either the training set or the validation set:

SetToAnalyze
Type

Multiple Choice

Default value

TrainingSet

Options

[TrainingSet, ValidationSet]

GUI name

Analyze:

Description

Name of the data set to use for the sensitivity analysis.

Generally, all of the samples generated are not used simultaneously in the sensitivity calculation. This is because the more samples used, the slower the calculation. It is also difficult to know if one has enough samples to capture an accurate approximation of the true sensitivity.

Therefore, suppose one has generated 10000 samples. It is often better to run 10 repeats (bootstraps) of the calculation with 1000 points each than it is to run a single calculation with 10000 points. The first configuration will calculate faster, and provide a spread of the data to evaluate the robustness of the result.

To specify the number of times you would like the calculation repeated:

NumberBootstraps
Type

Integer

Default value

1

GUI name

Repeat calculation n times:

Description

Number of repeats of the calculation with different sub-samples. A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.

To specify how many points to use from our sample set in each calculation:

NumberCalculationSamples
Type

Integer

GUI name

Number of samples per repeat:

Description

Number of samples from the full set available to use in the calculation. If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.

To specify how the points are chosen from our sample set:

SampleWithReplacement
Type

Bool

Default value

Yes

Description

Sample from the available data with or without replacement. This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.

To remove parameter sets which produced non-finite loss values:

FilterInfiniteValues
Type

Bool

Default value

Yes

Description

If Yes, removes points from the calculation with non-finite loss values. Non-finite points can cause numerical issues in the sensitivity calculation.

Finally, we have included an extension to the sensitivity calculation:

RunReweightCalculation
Type

Bool

Default value

No

Description

Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters. Note: The Gaussian kernel is recommended for the loss values kernel in this case.

Warning

The reweight calculation is experimental.

5.4.5. Technical details

For technical details of the sensitivity calculation see the API documentation