4.14.4. Weights schemes

A weights scheme returns individual datapoint weights for data that is an array or matrix of several data points.

For example, if the data consists of the forces of the atoms, then it is an Nx3 matrix, where N is the number of atoms.

With a weights scheme, not all force components need to be weighted equally.

The weights then depend on the reference values. They do not depend on the calculated value with the parametrized engine. They also do not depend on atomic positions or any other structural information.

The parameters of a weights scheme may need to be changed depending on which unit the reference data is expressed in.

The weights scheme requires that you specify a normalization, which affects the sum of returned weights.

Example 1: apply a weights scheme when calling a results importer.

ri = ResultsImporter()

# only fit force components between -0.03 and 0.03 force units
# the sum of the weights will equal the number of nonzero weights
ri.add_singlejob('finished-ams-job.results', properties={
     'forces': {
         'weights_scheme': WeightsSchemeClip(normalization='nonzero', min=-0.03, max=0.03)
     }
})

print(sc.data_set[-1])

Example 2: apply a weights scheme to many data_set entries

# data_set is of type DataSet

# first filter the data_set to some suitable subset
# for example, all data_set entries having the "Group: Forces" metadata.
subset = data_set.from_metadata('Group', 'Forces')

# only fit force components between -0.03 and 0.03 force units
# the sum of the weights will equal the number of nonzero weights
subset.apply_weights_scheme(WeightsSchemeClip(normalization='nonzero', min=-0.03, max=0.03))

print(subset)

4.14.4.1. Types of weights schemes

Set weights to 0 outside given range (WeightsSchemeClip)

# the weights for force components smaller than -0.03 Ha/bohr or bigger than 0.03 Ha/bohr become 0
# if the forces in the reference data are expressed in Ha/bohr
# note: the min and max need to be expressed in the same unit as the force components in the data_set!
WeightsSchemeClip(min=-0.03,max=0.03)

Boltzmann weighting

WeightsSchemeBoltzmann(normalization=1.0, temperature=3000)

Gaussian weighting

WeightsSchemeGaussian(normalization='dim0', center=0, width=0.01)

4.14.4.2. Examples of weight schemes

In this example there are

  • 66 atoms
  • 198 force components, all within the range [-0.06,0.06] Ha/bohr

The figures show the weight for each force component depending on its value, for different weight schemes.

The sum of the weights is printed in the figure title.

../../_images/weights_scheme_000.png ../../_images/weights_scheme_001.png ../../_images/weights_scheme_002.png ../../_images/weights_scheme_003.png ../../_images/weights_scheme_004.png ../../_images/weights_scheme_005.png ../../_images/weights_scheme_006.png ../../_images/weights_scheme_007.png ../../_images/weights_scheme_008.png ../../_images/weights_scheme_009.png

4.14.4.3. Weights schemes API

class WeightsScheme(normalization=1.0)
__init__(normalization=1.0)

Parent class for different weight schemes.

normalization : float or str

‘numelements’: the sum of weights will equal the number of elements in the weights matrix

‘nonzero’: the sum of weights will equal the number of nonzero elements in the weights matrix

‘dim0’: the sum of weights will equal the length of the first dimension of the weights matrix (for example, the number of atoms if the data consists of forces)

‘dim1’: the sum of weights will equal the length of the second dimension of the weights matrix (for example, 3 if the data consists of forces)

float: the sum of weights will equal the given number

get_weights(arr)

Returns the weights for a given data matrix arr.

arr : np.ndarray
A numpy array with data
normalize(weights, normalization=None)

This normalize method does not modify the weights, but returns a number. The function is called like

weights *= self.normalize(weights)
__str__()

Return str(self).

class WeightsSchemeUniform(normalization=1.0)
__init__(normalization=1.0)

All weights become the same.

get_weights(arr)

Returns the weights for a given data matrix arr.

arr : np.ndarray
A numpy array with data
__str__()

Return str(self).

class WeightsSchemeClip(normalization=1.0, min=-inf, max=inf)
__init__(normalization=1.0, min=-inf, max=inf)

All weights for entries < min or > max become 0. The remaining weights all get the same value.

get_weights(arr)

Returns the weights for a given data matrix arr.

arr : np.ndarray
A numpy array with data
__str__()

Return str(self).

class WeightsSchemeGaussian(normalization=1.0, mean=0.0, stdev=0.1, absolute=False)
__init__(normalization=1.0, mean=0.0, stdev=0.1, absolute=False)

Apply Gaussian weighting. exp(-(arr-mean)**2/(2*stdev**2)), normalize by normalization

normalization : float or str
See docs for WeightsScheme
mean : float or str

Center of gaussian

‘max’: maximum value

‘min’: minimum value

float: numeric value

stdev : float
Standard deviation (“sigma”) of gaussian
absolute : bool
If True, there’s a mean both at +mean at -mean. The two distributions do not overlap: the weight of each datapoint is calculated from its nearest mean.
get_weights(arr)

Returns the weights for a given data matrix arr.

arr : np.ndarray
A numpy array with data
__str__()

Return str(self).

class WeightsSchemeBoltzmann(normalization=1.0, temperature=5000, kB=3.167e-06, subtract_min=False)
__init__(normalization=1.0, temperature=5000, kB=3.167e-06, subtract_min=False)

Apply Boltzmann weighting.

If not subtract_min: exp(-data/(kB*temperature))

If subtract_min: exp(-(data-minimum)/(kB*temperature))

normalization : float or str
Normalization scheme
temperature : float
Temperature in K
kB: float
Boltzmann constant in data_unit/K. Default: the normal value in Ha/K.
subtract_min : bool

Whether to subtract the minimum element before calculating the Boltzmann weights

Set subtract_min = True if you’re passing in raw (total) energies

Set subtract_min = False if you’re passing in relative energies (e.g. with the add_trajectory_singlepoints ResultsImporter)

get_weights(arr)

Returns the weights for a given data matrix arr.

arr : np.ndarray
A numpy array with data
__str__()

Return str(self).