General description

Starting from the adf2005.01 version the utility pdb2adf is available in the official release. Previously this utility could be found on the contributed software page. Starting from adf2008.01 there is support for the NEWQMMM subkey if the environment variable SCM_PDB2ADF is set to NEW.

The pdb2adf utility was written to read a PDB file, which contains the atomic coordinates of a protein structure, and transform it into an ADF inputfile, particularly for use with QM/MM calculations. Starting from the current release it can also be used for setting up a solvent shell around a solute molecule.

The PDB files are generally used for protein structures, and are formatted according to certain rules, see: http://www.wwpdb.org/docs.html, and the part about the official PDB format below.

For every residue/molecule present in the PDB file, there should be a fragment file available, either in the general ADF library ($ADFRESOURCES/pdb2adf directory), or in the local directory where the pdb2adf program is being called. Fragment files in the local directory take higher priority than those in the general ADF library. The fragment files are formatted, based loosely on AMBER parameter files, and contain information about the residues; e.g., the atoms present, with their general and forcefield atomnames, atomic charges, connections to other atoms for creating their positions when not found on the PDB file, etc.; see part about fragment files below. Available in the ADF library are fragment files for amino acid residues, including those at the N- or C-terminal residue, three solvents (water, methanol, chloroform), some ions that are present frequently in protein structures (copper, fluoride), etc.

Also present in the ADF library are solvent box files that can be used to place a layer of solvents surrounding the protein, or a solute. Available are the three solvents mentioned above.

After reading the PDB and corresponding fragment files, the program tries to figure out which atoms are missing, and will add those; it uses the information provided on the fragment files to do so. For certain amino acid residues, there are several protonation states possible, e.g. histidine can be protonated at the N-delta position, at the N-epsilon position, or on both. The default option is to choose the fully charged option for aspartate (Asp), glutamate (Glu), lysine (Lys) residues, and decide for each histidine (His) and cysteine (Cys) residue individually what the protonation state should be. In those individual cases, the distances of neighboring molecules/residues are given that may help determine the protonation state. See the protein example below.

After all that is setup properly, a list is given with residue names/numbers, from which you can choose those that should be placed in the QM system; afterwards, for each of the selected QM residues, a choice should be made where to cut-off the QM part. The most appropriate point to cut-off seems to be at the C-alpha position, except when dealing with a proline (Pro). The latter residue is cyclic, e.g. the sidechain is connected to the C-alpha carbon ! For that residue, it may be better to include the C-alpha, H-alpha, and backbone carbonyl group of the preceding residue in the QM part.

The program will try to use to replace the ”.pdb” extension of the PDB file by ”.pdb2adf” for the ADF inputfile to be made; for convenience, the program also writes out an ”.p2a.pdb” file with the complete system as it being made by the program. This file can then be visualized by conventional viewer programs (such as iMol, VMD, Molekel, ADFview) for visual inspection if everything has been carried out correctly.

Given below are two examples, one for the application of a protein, the other how to set up a solvent shell run.

Things to notice

  • The current QM/MM implementation in ADF is limited to a total of 1000 QMMM atoms; currently, a new implementation is underway that is more flexible, and that doesn’t have this limit. This new implementation is available with the NEWQMMM subkey, work in progress.
  • The NEWQMMM format is used if the environment variable SCM_PDB2ADF is set to NEW.
  • The pdb2adf program uses AMBER parameter files, and is setup to work with the AMBER force field, version AMBER95, which is designed for and works well for biosystems.
  • For questions, remarks, contact: support@scm.com.

Official PDB format

Columns Data Type Field Definition
1 - 6 Record name ‘ATOM’ or ‘HETATM’  
7 - 11 Integer serial Atom serial number.
13 - 16 Atom name Atom name.
17 Character altLoc Alternate location indicator.
18 - 20 Residue name resName Residue name.
22 Character chainID Chain identifier.
23 - 26 Integer resSeq Residue sequence number.
27 AChar iCode borderleft for insertion of residues.
31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms.
39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms.
47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms.
55 - 60 Real(6.2) occupancy Occupancy.
61 - 66 Real(6.2) tempFactor Temperature factor.
73 - 76 LString(4) segID Segment identifier, left-justified.
77 - 78 LString(2) element Element symbol, right-justified.
79 - 80 LString(2) charge Charge on the atom.

Typical examples from PDB-files:

         1         2         3         4         5         6         7         8

ATOM     76  O   GLY    A9       6.671  55.354  35.873  1.00 14.75      A
ATOM     77  N   ASN   A10       6.876  53.257  36.629  1.00 16.09      A
ATOM     62  O   GLY A   9       6.791  55.214  35.719  1.00 15.61      4AZU 153
ATOM     63  N   ASN A  10       6.892  53.135  36.555  1.00 12.64      4AZU 154

The pdb2adf utility is flexible, and should be able to read most PDB files, even those with incomplete or erroneous line formats. From every ATOM/HETATM line, it tries to read:

  • atom number
  • atom name
  • residuename
  • chain identifier
  • residue number
  • X,Y,Z coordinates

Hints for proper formatting:

  • always group together atoms that belong to one residue
  • always give the atom name on columns 13-16
  • when specifying a chain-id use only letters (or a blank)

Contents of fragment file

Given below is the contents of the fragment file for water. The first line is a comment line, the only important parameter is the NOCONNECT keyword, which indicates that the program should not try to make any connections to other residues/molecules. Then follow three lines, that define the orientation in space of the residue; they are not used for general fragments, but are relevant and important for amino acid residues and DNA nucleotides. Finally, for each atom in the molecule, there should be a line with its number in the fragment; its name to be used in PDB files; the AMBER forcefield atomtype; a dummy atomname; connections and coordinates (bond, angle, dihedral angle) to other atoms in the molecule that can be used to give the position of the atom if it is not present in the PDB file; the atomic charge; and after the exclamation mark (!) the connections to other atoms in this fragment, or other fragments in case of amino acid residues/DNA nucleotides. The current version does not use the latter connections yet, but the next version will probably use them.

HOH  Water molecule  NOCONNECT
   1   DUMM  DU    M      0   0   0       0.0000      0.0000      0.0000
   2   DUMM  DU    M      1   0   0       1.4490      0.0000      0.0000
   3   DUMM  DU    M      2   1   0       1.5220    111.1000      0.0000
   4   O     OW    O      0   0   0       0.0000      0.0000      0.0000  -0.8340  !  5  6
   5   H1    HW    H      4   0   0       0.9572      0.0000      0.0000   0.4170  !  4
   6   H2    HW    H      4   5   0       0.9572    104.5200      0.0000   0.4170  !  4

Contents of solvent box files

The first line is a comment line, followed by a line with the total number of atoms in the solvent box and the dimensions of the box (in Angstroms); then for each atom in the box, the atom name, which must match the PDB atomname, and the Cartesian coordinates, again in Angstroms.