# Technical topics¶

## Input syntax¶

The AMS driver reads its input from standard input, i.e. what is called STDIN on Unix-like systems. Technically it is possible to run AMS and type the input file in interactively. This is however highly impractical and most people run AMS from a small shell script that contains the AMS text input and sends it directly to the AMS executable:

#!/bin/sh

$ADFBIN/ams << EOF ... AMS text input goes here: Block Keywork value OtherKeyword value End EOF  This section of the AMS manual documents the syntax of the text input. ### General remarks on input structure and parsing¶ • Most keys are optionals. Defaults values will be used for keys that are not specified in the input • Keys/blocks can either be unique (i.e. they can appear in the input only once) or non-unique. (i.e. they can appear multiple times in the input) • The order in which keys or blocks are specified in the input does not matter. Possible exceptions to this rule are a) the content of non-standard blocks b) some non-unique keys/blocks) • Comments in the input file start with one of the following characters: #, !, ::: # this is a comment ! this is also a comment :: yet another comment  • Empty lines are ignored • The input parsing is case insensitive (except for string values): # this: UseSymmetry false # is equivalent to this: USESYMMETRY FALSE  • Indentation does not matter and multiple spaces are treaded as a single space (except for string values): # this: UseSymmetry false # is equivalent to this: UseSymmetry false  ### Keys¶ Key-value pairs have the following structure: KeyName Value  Possible types of keys: bool key The value is a single Boolean (logical) value. The value can be True (equivalently Yes) or False (equivalently No.). Not specifying any value is equivalent to specifying True. Example: KeyName Yes  integer key The value is a single integer number. Example: KeyName 3  float key The value is a single float number. For scientific notation, the E-notation is used (e.g. $$-2.5 \times 10^{-3}$$ can be expressed as -2.5E-3). The decimal separator should be a dot (.), and not a comma (,). Example: KeyName -2.5E-3  string key The value is a string, which can include white spaces. Only ASCII characters are allowed. Example: KeyName Lorem ipsum dolor sit amet  multiple_choice key The value should be a single word among the list options for that key (the options are listed in the documentation of the key). Example: KeyName SomeOption  integer_list key The value is list of integer numbers. Example: KeyName 1 6 0 9 -10  float_list key The value is list of float numbers. The convention for float numbers is the same as for Float keys. Example: KeywordName 0.1 1.0E-2 1.3  ### Blocks¶ Blocks give a hierarchical structure to the input, grouping together related keys (and possibly sub-blocks). In the input, blocks generally span multiple lines, and have the following structure: BlockName KeyName1 value1 KeyName2 value2 ... End  Headers For some blocks it is possible (or necessary) to specify a header next to the block name: BlockName someHeader KeyName1 value1 KeyName2 value2 ... End  Compact notation It is possible to specify multiple key-value pairs of a block on a single line using the following notation: # This: BlockName KeyName1=value1 KeyName2=value2 # is equivalent to this: BlockName KeyName1 value1 KeyName2 value2 End  Notes on compact notation: • The compact notation cannot be used for blocks with headers. • Spaces (blanks) between the key, the equal sign and the value are ignored. However, if a value itself needs to contain spaces (e.g. because it is a list, or a number followed by a unit), the entire value must be put in either single or double quotes: # This is OK: BlockName Key1=value Key2 = "5.6 [eV]" Key3='5 7 3 2' # ... and equivalent to: BlockName Key1 value Key2 5.6 [eV] Key3 5 7 3 2 End # This is NOT OK: BlockName Key1=value Key2 = 5.6 [eV] Key3=5 7 3 2  Non-standard Blocks A special type of block is the non-standard block. These blocks are used for parts of the input that do not follow the usual key-value paradigm. A notable example of a non-standard block is the Atoms block (in which the atomic coordinates and atom types are defined). ### Units¶ Some keys have a default unit associated (not all keys have units). For such keys, the default unit is mention in the key documentation. One can specify a different unit within square brackets at the end of the line: KeyName value [unit]  For example, assuming the key EnergyThreshold has as default unit Hartree, then the following definitions are equivalent: # Use defaults unit: EnergyThreshold 1.0 # use eV as unit: EnergyThreshold 27.211 [eV] # use kcal/mol as unit: EnergyThreshold 627.5 [kcal/mol] # Hartree is the atomic unit of energy: EnergyThreshold 1.0 [a.u.]  Available units: • Energy: Hartree, Joule, eV, kJ/mol, kcal/mol, cm1, MHz • Length: Bohr, Angstrom, meter • Angles: radian, degree • Mass: el, proton, atomic, kg • Pressure: atm, Pascal, GPa, a.u., bar, kbar ## Double parallelism¶ AMS is a parallel program using MPI for efficient execution on distributed memory machines, aka compute clusters. For most jobs, the AMS driver part of a calculation is computationally not particularly costly and most of the execution time is spent inside of the compute engines. Therefore the main parallelization of AMS is inside of the engines, making sure that a good performance is obtained for tasks such as molecular dynamics or geometry optimizations, which consist of a series of interdependent engine invocations: We need to have completed step $$n$$ before we can continue with step $$n+1$$. However, not all workloads are of this sequentially dependent type. Some jobs have a lot of independent work, that can be done in parallel. This kind of trivial parallelizability can be exploited at the AMS driver level: Instead of having all cores collaborate on a single PES point and then doing all needed PES points sequentially, we can just distribute the available PES points over the all the available cores. Normally this leads to a better parallel scaling than the default parallelization inside of the engines: Parallelizing the engines is relatively complicated and often requires a lot of communication between cores. Parallelizing on the driver level on the other hand is very easy, and often the only communication required is at the very end of the calculation, when results are collected. Note that it is perfectly possible to combine both the in-engine parallelization and the driver level parallelism: At the driver level we could split our e.g. in total 32 cores into 4 groups of 8 cores, and then have each group of 8 use the in-engine parallelization to collaborate on a specific calculation. This is especially useful if the total number of cores is larger than then number of independent calculations we have to do. It might also be that we have a very large number of calculations to do, but not enough memory to let every core work alone on its own calculation, as would be ideal from a parallel scaling point of view. Because of the two levels of parallelism – both at the driver and the engine level – we call this setup double parallelization. Double parallelization is used for the calculation of the PES point properties which are derivatives, if these need to be done numerically: • Numerical calculation of forces / nuclear gradients. With a double sided derivative this requires $$6 \times n_\text{atoms}$$ independent calculations on geometries with one atom displaced along a cartesian coordinate. • Numerical calculation of the stress tensor for periodic systems. This requires up to 12 calculations for a double sided derivative along the 6 strain directions, but might require less in case some of the strains are symmetry equivalent. • Numerical calculation of the Hessian and normal modes of vibration. This is currently only supported for engines that calculate nuclear gradients analytically and done by numerically differentiating this first (analytic) derivative. As such it requires $$6 \times n_\text{atoms}$$ independent calculations on geometries with one atom displaced along a cartesian coordinate. • Numerical calculation of the elastic tensor. This requires 84 independent geometry optimizations on systems with differently strained lattices, with each optimization having a variable number of steps. • Numerical calculation of phonons. This requires at most $$6 \times n_\text{atoms}$$ displacements, but might require less in case some of the displacements are symmetry equivalent. Note that the displacements are done in a super cell system, which for many engines will increase the memory requirements, but also improve the in-engine parallel scalability. In order to use double parallelization it has to be enabled explicitly in the input. This is done for the above mentioned properties individually, as one might want a different grouping strategy for each property. For each property there is a separate Parallel block somewhere in the input (e.g. ElasticTensor%Parallel for the calculation of the elastic tensor), which has the following keywords: Parallel nGroups integer nCoresPerGroup integer nNodesPerGroup integer End  Note that only one of them should be specified in the input, depending of course on what is the desired strategy for parallelization. nGroups n Splits all cores evenly into n groups. We recommend choosing n such that it divides the total number of cores without a remainder. nCoresPerGroup n Each group consists of n cores. As such nCoresPerGroup 1 results in the maximum possible parallelism at the driver level. We recommend choosing n such that it divides the total number of cores without a remainder. nNodesPerGroup n Makes groups from all cores within n nodes, e.g. nNodesPerGroup 1 would make every cluster node into a separate group. Note that this option should only be used on homogeneous compute clusters, where all used nodes have the same number of cores. Otherwise cores from different nodes will be grouped together in very surprising and unintended ways, probably resulting in suboptimal performance. The optimal grouping strategy and number of groups depends on the total number of cores used in the calculation, the amount of independent tasks to be done in parallel, as well as the parallel scalability of the engine itself. In practice it can be a bit tricky. Suppose, as an example, that we want to calculate the elastic properties of a bulk material on a 32 core machine. The calculation of the elastic tensor should be done on a relaxed geometry, including relaxed lattice degrees of freedom. We therefore first perform a geometry optimization, before calculating the elastic tensor. In AMS this can easily be done with the following input: Task GeometryOptimization GeometryOptimization OptimizeLattice True End Properties ElasticTensor True End  But what is the most optimal parallel setup for this calculation? First we recognize that performing a lattice optimization requires the calculation of the stress tensor at every step of the optimization. Assuming that our bulk system does not have any symmetries AMS can exploit, the numerical calculation of the stress tensor (which most engines can not calculate analytically) would require 12 independent strained calculations for every step in the geometry optimization. Once the geometry optimization is converged, we have to perform 84 independent geometry optimizations to determine the elements of the elastic tensor. In summary, the graph of dependencies between all these tasks looks like this: How do we best parallelize this? For the main steps, e.g. GOStep1 there is no question: We have nothing to do in parallel and all 32 cores work on it together to finish it as quickly as possible. For the numerical calculation of the stress tensor we have 12 tasks that can be done in parallel by the 32 cores in our machine. Now 12 obviously does not divide 32 without a remainder, so there is no way to split into equally sized groups and do all 12 strains in parallel. The greatest common divisor of 12 and 32 is 4, so it’s probably best to split into 4 groups of 8 cores each. This is done with nGroups 4. Each group would then do 3 of the 12 strained calculations sequentially, using the in-engine parallelization to speed up the individual calculations. Once the stress tensor is computed in this way all groups merge and all 32 cores work together on GOStep2. This splitting and merging now continues until the geometry optimization is converged. For the elastic tensor we now have 84 tasks to perform in parallel, where each task is a completely separate geometry optimization (without optimizing the lattice) of a strained system. 84 tasks is more than double the number of cores we have. In this case it is probably best to just run as parallel as possible at the driver level and make 32 “groups” of just one core to throw the 84 tasks at. This is easily done by setting nCoresPerGroup 1 in the ElasticTensor block. Putting everything together we should add the following to our input file in order to optimally utilize our machine for this example calculation: NumericalDifferentiation Parallel nGroups 4 End End ElasticTensor Parallel nCoresPerGroup 1 End End  ## Running AMS on compute clusters¶ AMS is parallelized with MPI and can therefore be run in parallel on distributed memory machines, aka compute clusters. See the installation manual for general documentation on how to set up and run all the programs from the Amsterdam Modeling Suite on compute clusters. In this section we give some more advice that is specific to the AMS driver and its engines. Normally users use the login node to prepare their jobs and input files somewhere in their home directory, and also want the results of their jobs to end up there. Quite often, compute clusters are set up such that the user’s home directory is also mounted on the compute nodes, usually via NFS (Network File System). Before the introduction of the AMS driver it was not recommended to cd to the home directory in the submission script and have the compute nodes execute the job directly there. This was simply due to the fact that a lot of file I/O was done on temporary files in the present working directory, which in this case would be on a slow network-mounted file system. On the other hand, with AMS, switching to the home directory is the preferred way of running on a cluster where the home directory is mounted on the compute nodes. Running in the home directory mounted over NFS does not come with a performance penalty for AMS, but has many advantages. This is because AMS and its engines are already built under the assumption that access to this directory is slow. Basically there are three directories that are used by the AMS driver and its engines: 1. The starting directory, i.e. the present working directory at the time the AMS driver is started. This folder is generally read-only for AMS, except for creating the results directory there at the beginning of a calculation. Note that all relative paths in the AMS input, e.g. for loading results from previous calculations, are relative to the starting directory. The starting directory is assumed to be on a slow filesystem, but since data is normally only read once from there in the beginning of a calculation, this is in practice not a problem. 1. The results directory, where the results of a calculation as well as important intermediate steps (e.g. restart files) are collected. It also contains the log file which can be used to monitor a running calculations. The results directory is assumed to be on a slow filesystem, so AMS and its engines will be very careful not to do much disk I/O there. Generally something is only written to the results directory when AMS is sure that it should remain on disk when the calculation finishes. The results directory can also contain some intermediate restart files, so the contents of the result directory should be all that is needed in case the calculation crashes or is killed before it finishes normally. 1. The scratch directory, the location of which is set with the $SCM_TMPDIR environment variable, see also the installation manual. This directory should be put on a fast disk, e.g. an SSD in the compute node, as it will be used to store temporary results on disk. Users do not really need to care or know about the temporary files in the scratch directory. Normally, any files and directories created in the scratch directory are cleaned up at the end of the calculation. In case of errors, AMS tries to copy anything useful (e.g. the text output of all the different ranks) to the results directory in order to make finding the problem easier. However, for some kinds of crashes (or if the SIGKILL signal is sent to AMS), the cleanup of the scratch directory might not be performed, in which case users might want to manually check or remove the amstmp_* folders in the scratch directory.

With this setup there is no performance penalty for running directly on a network mounted home directory: Results will just be put there immediately, instead of being copied there at the end of a calculation.

Normally all batch systems provide an environment variable that is set to the directory from which the job was submitted, which is then where one should cd in the run script:

#!/bin/sh

if [ -z "$PBS_O_WORKDIR" ]; then # PBS batch system cd "$PBS_O_WORKDIR"
elif [ -z "$SLURM_SUBMIT_DIR" ]; then # Slurm batch system cd "$SLURM_SUBMIT_DIR"
elif [ -z "..." ]; then
# add other batch systems as necessary ...
cd "..."
fi

export AMS_JOBNAME=myJob

# Normal AMS text input, but with all paths
# relative to where the job was submitted from, e.g.:

With this runscript the AMS driver would make a myJob.results folder in the directory where the job was submitted from, and there is no need to copy results around manually in the run script. Furthermore this runscript always produces exactly the same files in the same locations, no matter if it is run interactively or submitted to a compute node through the batch system. Furthermore all paths in the input file can be specified relative to the location from where the runscript is submitted (normally the folder in which the runscript is located). This removes the need to copy or specify absolute paths to previous results, e.g. when restarting calculations. Finally, files useful for monitoring the running calculation are also conveniently there and not hidden somewhere on the compute node.