8.10.6. Loss Functions¶

The loss function measures how good a set of parameters are. The loss function is a single real-valued number. The smaller the loss function value, the better the parameters are.

The following loss functions are available:

name	string	class
Sum of Squared Errors (default, recommended)	‘sse’, ‘rss’	`SSE()`
Sum of Absolute Errors	‘sae’,	`SAE()`
Mean Absolute Error	‘mad’, ‘mae’	`MAE()`
Root Mean Squared Error	‘rmsd’, ‘rmse’	`RMSE()`

Important

The value of the loss function will be weighted by the data_set entry weights, and the residuals will first be normalized by the data_set entry sigma. So for example using the “Mean Absolute Error” as the loss function will not necessarily calculate a physically meaningful mean absolute error. Instead, use the DataSetEvaluator to calculate meaningful MAE and RMSE.

By default the following equations are used to calculate the loss function value, where \(N\) is the number of data_set entries, \(w\) is the weight, \(\sigma\) is the sigma value, \(y\) is the predicted value and \(\hat{y}\) is the reference value. For array reference values, there are \(M_i\) elements of the array for data_set entry \(i\).

loss	scalar values	array values
SSE	\(\sum_{i=1}^N w_i\left(\frac{y_i - \hat{y}_i}{\sigma _i}\right)^2\)	\(\sum_{i=1}^N\sum_{j=1}^{M_i} w_{i,j}\left(\frac{y_{i,j} - \hat{y}_{i,j}}{\sigma _i}\right)^2\)
SAE	\(\sum_{i=1}^N w_i\frac{\|y_i - \hat{y}_i\|}{\sigma _i}\)	\(\sum_{i=1}^N\sum_{j=1}^{M_i} w_{i,j}\frac{\|y_{i,j} - \hat{y}_{i,j}\|}{\sigma _i}\)
MAE	\(\frac{1}{N}\sum_{i=1}^N w_i\frac{\|y_i - \hat{y}_i\|}{\sigma _i}\)	\(\frac{1}{N}\sum_{i=1}^N\frac{1}{M_i}\sum_{j=1}^{M_i} w_{i,j}\frac{\|y_{i,j} - \hat{y}_{i,j}\|}{\sigma _i}\)
RMSE	\(\sqrt{\frac{1}{N}\sum_{i=1}^N w_i\left(\frac{y_i - \hat{y}_i}{\sigma _i}\right)^2}\)	\(\sqrt{\frac{1}{N}\sum_{i=1}^N\frac{1}{M_i}\sum_{j=1}^{M_i} w_{i,j}\left(\frac{y_{i,j} - \hat{y}_{i,j}}{\sigma _i}\right)^2}\)

Note that for MAE and RMSE loss functions with array reference values, averages are first calculated over the individual arrays, and the \(N\) averages are then again averaged. It is also common to use a different definition, in which the \(N\) arrays are first concatenated and only a single average is calculated over this larger array. This second way is referred to as “other” in the below example, and is the default for the MAE and RMSE reported by the Data Set Evaluator.

Example:

from scm.params.core.lossfunctions import SSE, SAE, MAE, RMSE
import numpy as np

sse = SSE()
sae = SAE()
mae_default = MAE()
mae_other = MAE(lambda x,w=1: np.sum(w*np.abs(x)), lambda resids: sum(len(i) for i in resids))
rmse_default = RMSE()
rmse_other = RMSE(lambda x,w=1: np.sum(w*np.square(x)), lambda resids: sum(len(i) for i in resids) )

x = [[1, -3], [3, -4, 5]]
print("Residuals: {}".format(x))
print("SSE: {} (1 + 9 + 9 + 16 + 25)".format(sse(x)[0]))
print("SAE: {} (1 + 3 + 3 + 4 + 5)".format(sae(x)[0]))
print("MAE (default): {} [(2+4)/2]".format(mae_default(x)[0]))
print("MAE (other): {} [(1+3+3+4+5)/5]".format(mae_other(x)[0]))
print("RMSE (default): {:.3f} sqrt[(5+16.667)/2]".format(rmse_default(x)[0]))
print("RMSE (other): {:.3f} sqrt[(1+9+9+16+25)/5]".format(rmse_other(x)[0]))

Residuals: [[1, -3], [3, -4, 5]]
SSE: 60.0 (1 + 9 + 9 + 16 + 25)
SAE: 16.0 (1 + 3 + 3 + 4 + 5)
MAE (default): 3.0 [(2+4)/2]
MAE (other): 3.2 [(1+3+3+4+5)/5]
RMSE (default): 3.291 sqrt[(5+16.667)/2]
RMSE (other): 3.464 sqrt[(1+9+9+16+25)/5]

8.10.6.1. Specifying the loss function¶

The loss function can be passed to an Optimization in one of the following ways:

my_optimization = Optimization(*args, loss='sse') # As the string keyword

from scm.params.core.lossfunctions import SSE # Loss functions are not imported automatically
my_optimization = Optimization(*args, loss=SSE()) # Or directly

A loss function can also be passed to DataSet.evaluate() in the same manner.

8.10.6.2. Technical information¶

Each loss function class (SAE(), MAE(), RMSE(), SSE()) derives from the Loss base class. The __call__ method takes two arguments: residuals and weights.

residuals is a list of residuals vectors between reference and predicted properties. When called from DataSet.evaluate(), each item has been normalized by the sigma value of the corresponding data_set entry: \((\boldsymbol{y} - \boldsymbol{\hat{y}})/\boldsymbol{\sigma}\).
weights is a list of a set of weights \(\boldsymbol{w}\).

The __call__ method returns a 2-tuple consisting of the loss function value (float) and a list of contributions.

8.10.6.3. Sum of Squares Error¶

class SSE(inner_f=<function SSE.<lambda>>, norm_f=None)¶

Residual Sum of Squares or Sum of Squared Error loss. This loss function is commonly used for ReaxFF parameter fitting.

(8.1)¶\[L_\mathrm{SSE} = \sum_{i=1}^N (y_i - \hat{y}_i)^2\]

Accessible with the strings 'sse', 'rss'.

Default Parameters:

inner_f : lambda x: np.sum(x**2)
norm_f : None

8.10.6.4. Sum of Absolute Errors¶

class SAE(inner_f=<function SAE.<lambda>>, norm_f=None)¶

Sum of absolute errors (SAE) loss.

(8.2)¶\[L_\mathrm{SAE} = \sum_{i=1}^N | y_i - \hat{y}_i |\]

Accessible with the string 'sae'

Default Parameters:

inner_f : lambda x: np.sum(np.abs(x))
norm_f : None

8.10.6.5. Mean Absolute Error¶

class MAE(inner_f=<function MAE.<lambda>>, norm_f=<built-in function len>)¶

Mean Absolute Error (MAE, MAD) loss.

(8.3)¶\[L_\mathrm{MAE} = \frac{1}{N} \sum_{i=1}^N | y_i - \hat{y}_i |\]

Accessible with the strings 'mae', 'mad'.

Default Parameters:

inner_f : lambda x: np.mean(np.abs(x))
norm_f : len

8.10.6.6. Root-Mean-Square Error¶

class RMSE(inner_f=<function RMSE.<lambda>>, norm_f=<built-in function len>)¶

Root-Mean-Square Error (RMSE, RMSD) loss.

(8.4)¶\[L_\mathrm{RMSE} = \sqrt{ \frac{1}{N} \sum_{i=1}^N \big( (y_i - \hat{y}_i) \big)^2 }\]

Accessible with the strings 'rmse', 'rmsd'.

Default Parameters:

inner_f : lambda x: np.mean(x**2)
norm_f : len

8.10.6.7. Loss Function API¶

User-specific loss functions can be defined by inheriting from the base class below. Please make sure that your loss returns a tuple of two vlaues: fx and contributions (see below). The latter should contain a percentual per-element contribution of residuals to the overall loss function value.

Note that although the residuals are depicted as a single vector throughout the documentation, the data structure that a Loss receives is a List[1d array], where every element in the list stores the (weighted) residuals vector of the respective Data Set entry.

class Loss(inner_f, norm_f)¶

Base class for the mathematical definition of a loss function.

__init__(inner_f, norm_f)¶

Initialize the loss instance. Derived classes should call super().__init__(inner_f, norm_f).

Parameters

inner_fcallable: When the loss instance is called, this callable should be applied to each element i of the residuals (see __call__()), following the signature inner_f(i) -> float.
norm_fcallable or None: When the loss instance is called, this callable should be applied to the complete residuals list (see __call__()), following the signature norm_f(residuals) -> float. It can be considered a normalization function for losses that require it (such as all mean losses), such that the returned loss function value is fx/norm_f(residuals). If None, the returned normalization factor will be 1.

abstract __call__(residuals: List[numpy.ndarray], weights: List[numpy.ndarray]) → float¶

When DataSet.evaluate() is called, reference and predicted values are extracted for each entry and combined into a list of residuals where every entry represents \((y_i-\hat{y}_i)/\sigma_i\).

The loss computes a metric given this residuals vector, where each entry is weighted by weights.
This method should return two values: the numerical loss, and a 1d array of per-entry contributions to the former.

Parameters

residualsList of 1d arrays: List of \((y_i-\hat{y}_i)/sigma_i\) elements.
weightsList of 1d arrays: List of \(w_i\) elements. Each item in the list should have as many elements as the corresponding item in residuals.

Returns

loss: float: Total calculated loss
contributions: ndarray: 1d array of per-entry contributions to the overall loss. Should have the same length as residuals.

__repr__()¶: Allow string representations of built-in losses.