8.2. Extractors and Comparators¶
8.2.1. Available Extractors¶
Extractor 
Name 
Default sigma 
Unit 

Angle 
2.0 
° 

Average distance 
0.05 
Å 

BandGap 
0.002 
Ha 

BandStructure 
0.002 
Ha 

Bulk modulus 
0.001 
hartree/bohr³ 

Cell angles 
2.0 
° 

Cell lengths 
0.05 
Å 

Cell volume 
50.0 
Å³ 

Charges 
0.1 
au 

Dihedral 
2.0 
° 

dipole_moment 
0.01 
e*bohr 

Distance 
0.05 
Å 

Distance vector 
0.05 
Å 

Energy 
0.002 
Ha 

Forces 
0.003 
Ha/bohr 

Hessian 
1.0 
Ha/bohr² 

PES 
0.002 
Ha 

PES compared 
0.002 
Ha 

PESScan angle 
2.0 
° 

PESScan dihedral 
2.0 
° 

PESScan distance 
0.05 
bohr 

RMSD 
0.1 
Å 

Shear modulus 
0.001 
Ha/bohr³ 

Stress tensor 
0.0001 
au 

Stress tensor 1D 
0.0001 
hartree/bohr 

Stress tensor 2D 
0.0001 
hartree/bohr² 

Stress tensor 3D 
0.0001 
hartree/bohr³ 

Stress tensor diagonal 2D 
0.0001 
hartree/bohr² 

Stress tensor diagonal 3D 
0.0001 
hartree/bohr³ 

Stress tensor offdiagonal 2D 
0.0001 
hartree/bohr² 

Stress tensor offdiagonal 3D 
0.0001 
hartree/bohr³ 

Vibrational frequencies 
5.0 
cm⁻¹ 

Young modulus 
0.001 
hartree/bohr³ 
8.2.1.1. Angle¶
Extractor: angle

extract
(amsresult, atom1: int, atom2: int, atom3: int, mic: bool = True) → float¶ Extract the angle between three atoms (atom2 is the central atom). Atom indexing starts with 0.
mic: whether to use the minimum image convention for periodic systems.
 Unit:
deg
 Sigma:
2.0
8.2.1.2. Average distance¶
Extractor: average_distance

extract
(amsresults, atomList: list) → float¶ Extract the average interatomic distance between multiple atom pairs, defined by their indices [p1a1, p1a2, p2a1, p2a2, …], atom indexes zero based
 Unit:
angstrom
 Sigma:
0.05
8.2.1.3. BandGap¶
Extractor: bandgap

extract
(amsresult) → float¶ Extract the energy of the system.
 Unit:
hartree
 Sigma:
2e3
8.2.1.4. BandStructure¶
Extractor: bandstructure

extract
(amsresult, bands: Optional[List[int]] = None, relative_to='min', only_high_symmetry_points=False, spin='up') → float¶ Extracts the band structure. The energies are returned relative to the lowest energy.
 bands: list of int
Band indices starting with 0. If None are given, all bands are used. Note that the number of bands may be different between reference engine and parametrized engine, so you should always explicitly specify the bands.
 relative_to: str
“min”, “prev”, “prev_intra” (equivalent to “prev”), “prev_inter”, “absolute”, or a 2tuple with zerobased indices of the value to use as reference
 only_high_symmetry_points: bool
If True, only the energies at the highsymmetry points (the first coordinate along each edge) is used.
 spin: str
If “up” (or if no spindown bands are calculated) then use the spinup bands. Otherwise, use the spindown bands. For nonspinpolarized calculations, set spin=”up”.
Returns: 2D numpy array with shape (nEnergies, nBands). Each column corresponds to a different band.
 Unit:
hartree
 Sigma:
2.0e3
8.2.1.5. Bulk modulus¶
Extractor: bulkmodulus

extract
(amsresult) → float¶ Extract the bulk modulus.
 Unit:
hartree/bohr^3
 Sigma:
1e3 (about 30 GPa)
8.2.1.6. Cell angles¶
Extractor: cell_angles

extract
(amsresults, index=None) → Union[float, numpy.ndarray]¶ Returns all or one angle between lattice vectors
index =
0: alpha for 3D system, gamma for 2D system (degrees)
1: beta for 3D system (degrees)
2: gamma for 3D system (degrees)
None: all of the above Unit:
degree
 Sigma:
2.0
 Compared value type:
np.ndarray if index is None else float
8.2.1.7. Cell lengths¶
Extractor: cell_lengths

extract
(amsresults, index=None) → Union[float, numpy.ndarray]¶ Returns the lengths of all or one lattice vector.
index =
0: a (angstrom)
1: b (angstrom)
2: c (angstrom)
None: all of the above Unit:
angstrom
 Sigma:
0.05
 Compared value type:
np.ndarray if index is None else float
8.2.1.8. Cell volume¶
Extractor: cell_volume

extract
(amsresults) → float¶ Returns the cell volume. System must be periodic in 3D.
 Unit:
angstrom^3
 Sigma:
 Compared value type:
float
8.2.1.9. Charges¶
Extractor: charges

extract
(amsresult, atomindex: Optional[int] = None) → Union[numpy.ndarray, float]¶ Extract the atomic charges.
Charges at one specific atom can be requested with atomindex (atom indexing starts with 0). Unit:
au
 Sigma:
0.1
8.2.1.10. Dihedral¶
Extractor: dihedral

extract
(amsresults, atom1: int, atom2: int, atom3: int, atom4: int, mic: bool = True) → float¶ Extract the dihedral angle as defined by the four atom ids (atom indexing starts with 0), i.e. the angle between the planes formed by atom1atom2atom3 and atom2atom3atom4.
mic: whether to use the minimum image convention for periodic systems.
 Unit:
deg
 Sigma:
2.0
8.2.1.11. dipole_moment¶
Extractor: dipole_moment

extract
(amsresults, unit='e*bohr') → numpy.ndarray¶ Extract the Cartesian dipole moment vector xyz coordinates.
 Unit:
e*bohr
 Sigma:
defaults to 1e2
8.2.1.12. Distance¶
Extractor: distance

extract
(amsresults, atom1: int, atom2: int, mic: bool = True) → float¶ Extract the interatomic distance between two atoms, defined by their indices atom1 and atom2 (atom indexing starts with 0).
If mic==True, the minimum image convention is used for periodic systems.
 Unit:
angstrom
 Sigma:
0.05
8.2.1.13. Distance vector¶
Extractor: distance_vector

extract
(amsresults, atom1: int, atom2: int) → numpy.ndarray¶ Extract the distance vector between two atoms atom2  atom1, defined by their molecule indices (atom indexing starts with 0).
 Unit:
angstrom
 Sigma:
0.05
8.2.1.14. Energy¶
Extractor: energy

extract
(amsresult) → float¶ Extract the energy of the system.
 Unit:
hartree
 Sigma:
2e3
8.2.1.15. Forces¶
Extractor: forces

extract
(amsresult, atindex: Optional[int] = None, xyz: Optional[int] = None) → Union[float, numpy.ndarray]¶ Extract the atomic forces of a system (array), or the specific component xyz at index atindex (float).
 Unit:
hartree/bohr
 Sigma:
0.01
 Properties
 atindexoptional, int
Atom index, starting with 0
 xyzoptional, 0 <= int <= 2
x(0), y(1) or z(2) component to be extracted.
8.2.1.16. Hessian¶
Extractor: hessian

extract
(amsresult) → numpy.ndarray¶ Extract the Cartiesian Hessian matrix.
 Unit:
Ha/bohr^2
 Sigma:
Undefined (defaults to 1.0)
8.2.1.17. PES¶
Extractor: pes

extract
(results, relative_to='min') → numpy.ndarray¶ Extracts the results of a PES Scan.
If
relative_to
is an integer, energies relative to that index (starting with 0) will be returned.If
relative_to
is a string and equal to “previous”, then the first element returned is 0 and every subsequent element is the difference to the previous one. Example: pes is [2,0,1,4] then pes(relative_to=”previous”) will become [0, 2, 1, 3]If
relative_to
is a string and a valid attribute of a numpy array (e.g. ‘min’ or ‘max’), then the return value of that function will be subtracted.If
relative_to
is None, the energies are returned unmodified. Unit:
Ha
 Sigma:
2e3
 Compared value type:
ndarray
8.2.1.18. PES compared¶
Extractor: pes_compared

extract
(results) → numpy.ndarray¶ Extracts and compares the results of a PES Scan. The return value r is calculated as
\[ \begin{align}\begin{aligned}\mu = \frac{1}{N} \sum_i^N y_i  \hat{y}_i\\r = \sqrt{ \sum_i^N (\hat{y}_i  y_i + \mu)^2 }\end{aligned}\end{align} \]where N is the number of points in the PES scan.
 Unit:
au
 Sigma:
2e3
 Compared value type:
float
8.2.1.19. PESScan angle¶
Extractor: pesscan_angle
8.2.1.20. PESScan dihedral¶
Extractor: pesscan_dihedral
8.2.1.21. PESScan distance¶
Extractor: pesscan_distance
8.2.1.22. RMSD¶
Extractor: rmsd

extract
(amsresult, ignore_hydrogen=False)¶ Uses the Kabsch algorithm to align and calculate the rootmeansquare deviation of the atomic positions given two systems. Assumes all elements and their order is the same in both systems. When providing an external reference value, the object must be a 2d numpy array of the shape (Natoms, 3) (each element should store the atomic coordinates). Ignores hydrogen atoms if ignore_hydrogen is set to True.
 Unit:
angstrom
 Sigma:
0.1
8.2.1.23. Shear modulus¶
Extractor: shearmodulus

extract
(amsresult) → float¶ Extract the shear modulus.
 Unit:
hartree/bohr^3
 Sigma:
1e3 (about 30 GPa)
8.2.1.24. Stress tensor¶
Extractor: stresstensor

extract
(amsresult) → numpy.ndarray¶ Extracts the stress tensor
 Unit:
au. For 3D systems this mean hartree/bohr^3, for 2D systems hartree/bohr^2 and for 1D systems hartree/bohr.
 Sigma:
1e4
8.2.1.25. Stress tensor 1D¶
Extractor: stresstensor_1d

extract
(amsresult) → numpy.ndarray¶ Extracts the stress tensor for a 1D system (1 element).
 Unit:
hartree/bohr.
 Sigma:
1e4
8.2.1.26. Stress tensor 2D¶
Extractor: stresstensor_2d

extract
(amsresult) → numpy.ndarray¶ Extracts the stress tensor for a 2D system (4 elements).
 Unit:
hartree/bohr^2.
 Sigma:
1e4
8.2.1.27. Stress tensor 3D¶
Extractor: stresstensor_3d

extract
(amsresult) → numpy.ndarray¶ Extracts the stress tensor for a 3D system (9 elements).
 Unit:
hartree/bohr^3.
 Sigma:
1e4
8.2.1.28. Stress tensor diagonal 2D¶
Extractor: stresstensor_diagonal_2d

extract
(amsresult) → numpy.ndarray¶ Extracts the diagonal elements of the stress tensor for a 2D system (2 elements).
 Unit:
hartree/bohr^2.
 Sigma:
1e4
8.2.1.29. Stress tensor diagonal 3D¶
Extractor: stresstensor_diagonal_3d

extract
(amsresult) → numpy.ndarray¶ Extracts the diagonal elements of the stress tensor for a 3D system (3 elements).
 Unit:
hartree/bohr^3.
 Sigma:
1e4
8.2.1.30. Stress tensor offdiagonal 2D¶
Extractor: stresstensor_offdiagonal_2d

extract
(amsresult) → numpy.ndarray¶ Extracts the offdiagonal element of the stress tensor for a 2D system (1 element).
 Unit:
hartree/bohr^2.
 Sigma:
1e4
8.2.1.31. Stress tensor offdiagonal 3D¶
Extractor: stresstensor_offdiagonal_3d

extract
(amsresult) → numpy.ndarray¶ Extracts the offdiagonal elements of the stress tensor for a 3D system (3 elements).
 Unit:
hartree/bohr^3.
 Sigma:
1e4
8.2.1.32. Vibrational frequencies¶
Extractor: vibfreq

extract
(results) → numpy.ndarray¶ Extracts the vibrational frequencies.
 Unit:
1/cm
 Sigma:
5.0
8.2.1.33. Young modulus¶
Extractor: youngmodulus

extract
(amsresult) → float¶ Extract the Young’s modulus.
 Unit:
hartree/bohr^3
 Sigma:
1e3 (about 30 GPa)
8.2.2. Custom Extractors¶
The DataSet section described how an arbitrary linear combination of properties extracted from jobs \(P(J)\) can be added to a Data Set for fitting. In this chapter, we are going to describe what happens when entries are added and evaluated by the Data Set and how the user can extend the number of properties that can be fitted with ParAMS.
We have briefly described the extractors in the Data Set:
They are a collection of Python code snippets available to the DataSet
instance which define
how to extract an individual property \(P\) from a job.
The extractors available to each DataSet
can be checked with the extractors
attribute:
>>> ds = DataSet()
>>> ds.extractors
{'angles', 'vibfreq', 'charges', 'distance', 'stresstensor', 'energy', 'forces', 'hessian', 'dihedral'}
Any expression passed to the DataSet.add_entry()
method can be constructed from the available extractors,
for example "angles('myJob1')"
.
The design of the Data Set allows for an easy extension of the extractors to suit personal needs beyond what is already provided in the package. Users are encouraged to define additional extractors whenever a new property becomes relevant for the fitting process. In the following example, we will add the functionality to extract and evaluate elastic tensors to the Data Set:
>>> 'elastictensor' in ds.extractors
False
Start by creating an elastictensor.py in an empty directory of your choice with the following contents:
from numpy import ndarray, asarray
sigma = 1e2
def extract(amsresults) > ndarray:
return asarray(amsresults.get_engine_results('ElasticTensor'))
This is a minimal and complete definition of our extractor.
The code snippet can be saved under any accessible path and
used by providing DataSet
with the corresponding directory.
The base file name will be used as the extractor’s name
(make sure not include extractors with the same names).
Here, we have saved the above under path/to/extractors/:
>>> ds = DataSet(more_extractors='path/to/extractors/')
>>> 'elastictensor' in ds.extractors
True
Because the Data Set natively works with plams.AMSResults
,
we were able to take a shortcut in the creation of our extractor:
All we needed to do in this case is wrap a PLAMS method around the extract()
function.
Note that in addition to the function definition, we also provided a sigma variable.
This serves as a default value when a related entry is added with through DataSet.add_entry()
and should roughly be in the same order of magnitude as the “accepted accuracy” for this property.
ParAMS will evaluate all entries according to \((w/\sigma)(y\hat{y})\), where \(w\) is the weight.
Providing a sigma is not mandatory but highly recommended, any extractor without a sigma will set it’s value to 1.
Important
An extractor stored as basename.py will be available through the basename string at runtime
For every extractor, the
extract()
function returns the property value from a jobParAMS expects the first argument passed to any
extract()
function to be aplams.AMSResults
instance (see PLAMS documentation)It is recommended to define a sigma:float variable, which should be in the same order of magnitude as the prediction accuracy for this property
However, because the definition of extract()
is completely up to the user,
it can be made to read and process data from any other source as well.
Assuming that a property is stored in a text file, an extractor that is independent of
plams.AMSResults
could look like this:
def extract(_, filepath):
# AMSResults instance: `_` is ignored.
with open(filepath) as f:
return float(f.read())
8.2.3. Supported Data Structures¶
The above definition of our elastic tensor extractor returns a numpy array to the DataSet
instance
when evaluated, however, in theory extractors can return an arbitrary data structure which calls for
additional processing to make all results consistent.
Internally, each time DataSet.evaluate()
is called, reference and predicted results are reduced
to one weighted vector of residuals
\((\boldsymbol{w}/\boldsymbol{\sigma})(\boldsymbol{y}  \boldsymbol{\hat{y}})\),
which is then passed to a loss function and evaluated.
This implies that any two return values of the same extractor support subtraction and multiplication
(which is why we convert the the elastic tensor to a numpy array before returning).
For anything that does not, a custom compare()
function has to be implemented
alongside extract()
.
8.2.4. Custom Comparators¶
There might be cases where an extractor either returns
a data type that does not support mathematical operations
or the quality of a reference and predicted value can not be measured
by a simple subtraction \(y  \hat{y}\).
In such cases it is necessary to define an additional compare()
in the extractor:
# dictextractor.py
from typing import Dict
sigma = 0.1
def extract(amsresults) > Dict:
return amsresults.some_dict_property()
def compare(y:Dict, yhat:Dict, au_to_ref:float) > float:
y = list(y.values())
yhat = list(yhat.values())
# Unit conversion must be handled by the custom comparator:
yhat = [i*au_to_ref for i in yhat]
for i,ihat in zip(y, yhat):
...
return ...
A custom comparator defined in such a way will automatically be used in combination with the extractor.
Important
Whenever the
extract()
function returns a data type that does not support mathematical operations, acompare()
function must be additionally defined.compare()
always follows the signaturecompare(y:Any, yhat:Any, au_to_ref:float) > Union[ndarray,float,int]
Note that
compare()
should handle possible unit conversions by applying the equivalent ofyhat = au_to_ref*yhat