AMBER Page
AMBER Prep file formats

1.  PREP


Usage:

      prep [-O] -i input -o output -p params


-O   Overwrite output files if they exist.

_______________________________________________________________


     NOTE: Leap replaces Prep, Link, Edit and Parm with a  much
simpler, single program.

     The  purpose  of  this module is to add  new  residues  to
the  standard  AMBER residue database, create new databases, or
to  create  new residues as individual LINK-readable files.  It
is not necessary to run PREP if all residues needed for a simu-
lation  are  already  present in the standard  AMBER  database,
described in the LINK documentation.  A residue  is  the  basic
molecular  unit  of  the AMBER simulation package.  It is typi-
cally an amino acid or nucleic acid unit, but could be a  pros-
thetic group, a small molecule, or a single ion.

     Tree  Structure:  The geometry of the residue is described
by a "tree" structure  to enable the LINK  module  to  success-
fully connect it to a larger structure.  The atoms in a residue
are  classified into five topological types:   "Main",  "Side",
"Branch",  "3", "4", "5" "6" and "End" types.  They are denoted
as M, S, B, 3 4 5 6 and E respectively.

     Main atoms describe the principal   "path"   through   the
residue,   starting  at  the connection to the previous residue
and ending at the connection to the next  residue.   The   LINK
module   will   connect  the last main atom of a residue to the
first main atom of the next residue  in   the   molecule.    If
there   is  only  one residue in a molecule, the main atoms are
typically the longest continuous  non-intersecting  chain.  The
main   type  atoms   can have  1, 2, 3, or 4 atoms connected to
them.

     Any atom that is not a main atom is described  by  one  of
the   other topological types: "E", "S", "B", "3", "4", "5", or
"6".  An "E" atom has only one connection to other atoms,  thus
is  a "dead end"  for  any branch from any other atom type.  An
"S" atom must have a total of two connections to  other  atoms,
a   "B" atom  must have a total of three connections, and a "3"
atom actually has a total of four connections; the same applies
for  "4" "5" and "6".  The topological types described here can
only describe acyclic systems.  In order to describe the topol-
ogy  of  cyclic systems, explicit loop closing bonds are speci-
fied using the LOOP  command  described  below.   Loop  closing
bonds  are  not  counted as connections when  assigning  M,  E,
S, B, 3, 4, 5, 6 topological types.  If an atom has  more  than
four  connections, it is not defined in the present tree struc-
ture.

     Dummy atoms: PREP requires that three dummy atoms  precede
the  actual  atoms of the residue.  These atoms are simply used
to define the space axes for  the  residue.   The  three  dummy
atoms  must be given the topological type "M", and they must be
assigned a force field atom type that defines them   as   dummy
atoms.   The   symbol   "DU"   is  recommended to be consistent
with the standard database.   It  is  necessary  to  have   the
three  initial   dummy   atoms   whether  internal or cartesian
coordinates are given as input.

     It is important for the proper functioning  of  the   EDIT
module   that  dummy  atoms be left in the first residue of the
system, but that they be removed in  any  subsequent  residues.
Therefore  you  should specify the "NOMIT" flag for any initial
residue, and the "OMIT" flag for all others.  In  typical   use
of   AMBER,  peptide systems are either started with the acetyl
residue ACE, which carries the dummy  atoms   with  it  in  the
standard  database,  or  they  are  started  with  a charged N-
terminal  residue,  which  also  has  dummy  atoms.   Likewise,
nucleic acid systems are generally started with the HB residue,
which also has dummy  atoms.    The  other   residues  in   the
database   have  had  their  dummy atoms stripped at the end of
PREP through the use of the "NOMIT" option.

     In the examples below, topological  types   are   assigned
and   the  atoms  are numbered in correct tree structure order.
An actual PREP input file appears at the end  of   this   docu-
ment.


Example 1:



            M(1)--M(2)  \
                  |      | <--- 3 dummy atoms to define space axes
                  M(3)  /
                  |
         Res ---- M(4)-- M(5)-- M(7)-- M(11)-- Res
          n-1            |      |      |        n+1
                  ^      |      |      |
                  |      E(6)   S(8)   B(12)-- S(14)-- E(15)
            first real          |      |
               atom             |      |
                        E(10)-- S(9)   E(13)


Example 2:


            M(1)--M(2)  \
                  |      | <--- 3 dummy atoms to define space axes
                  M(3)  /
                  |
         Res ---- M(4)-- M(6)-- M(13)-- M(23)--- Res
          n-1     |      |      |       |         n+1
                  |      |      |       |
                  E(5)   S(7)   S(14)   E(24)
                         |      |
                         |      |
          E(10)-- B(9)-- S(8)   S(15)   E(18)
                  |             |       |
                  |             |       |
                  S(11)         S(16)-- 3(17)-- S(21)-- E(22)
                  |                     |
                  |                     |
                  E(12)                 S(19)-- E(20)


     Note  on  Tree  Ordering: The tree structure begins at the
first dummy atom, and traverses the main chain until  a  branch
point  (node)  is  reached.  That branch is traversed until its
end or until the next node is reached.  When you come to a node
with  more  than  one  branch (topological type "B" or "3"), it
doesn't matter which branch is traversed first as long  as  you
return to the next higher node when an end is reached.

     PREP  input  files  for  standard peptide and nucleic acid
residues are typically maintained in several  large  files  for
generation  of  the  standard database  for  the  LINK  module.
Note  that it is not necessary to run the  PREP  module  unless
non-standard  residues  are  needed.  Non-standard residue data
may be output as individual files or appended to  the  standard
database if desired.

     The LINK module is currently dimensioned to handle a maxi-
mum of 150 atoms per residue.

     Note that smaller, neutral residues are  most  appropriate
unless  an  infinite cutoff is desired, because the first atoms
in each residue are used in applying the cutoff. The larger the
residue,  the more unbalanced the cutoff, i.e. the greater dif-
ference between head-to-head and tail-to-tail orientations.

     This module was originally written by P. K. Weiner at UCSF
and  overhauled by U. C. Singh in 1984. The data base structure
was completely modified.  Prep was revised for Rev A by  George
Seibel in 1989.

Input  description: This section describes the residue(s) input
file  which is  read through unit 5.  The input is free  format
and it is assumed that the different fields are separated by at
least one space (including character  fields).   The  character
variables  are  always  left  justified.   If a character field
contains more than four characters the rest are ignored. If  it
contains less extra blanks are added to  it.   Since blanks are
separators between fields signs  have  to  immediately  precede
numbers.


     ------------------------------------------------------------------------

        - 1 -       CONTROL FOR DATA BASE GENERATION

              The data base is a direct access file containing the
              standard residues and a directory of their names.  It
              is named DB4.DAT in the version 4 AMBER distribution,
              and is found in the DAT directory.  The LINK module
              will search this file for a residue before searching
              the external files for it.  The LINK module can only
              access one database per run.  Thus if any user supplied
              residues are needed, they can be accessed by LINK as
              individual files.  The data base can also be appended
              with user supplied residues if desired.

              IDBGEN , IREST , ITYPF

                  FORMAT(3I)

        IDBGEN      Flag for data base generation
         = 0  No database generation.  Output will be individual files.
              This is the standard procedure if you want to create a
              single small molecule.
         = 1  A new data base will be generated or the existing database
              will be appended.

        IREST       Flag for the type of generation (assuming IDBGEN = 1)
         = 0  New data base
         = 1  Appending an existing data base

        ITYPF       Force field type code (used in LINK stage)
              Ignored if IDBGEN = 0   The following codes are used in
              the standard database:
         = 1  United atom model
         = 2  All atom model
       = 100  United atom charged N-terminal amino acid residues
       = 101  United atom charged C-terminal amino acid residues
       = 200  All atom charged N-terminal amino acid residues
       = 201  All atom charged C-terminal amino acid residues

              Note:  This variable allows you to have several different
              models for the same residue name stored in one database.
              These models could differ in topology, charge, or other
              factors.  The charged terminal residues are selected
              internally by LINK if the IFTPRO flag is set.  The
              database can hold up to 510 residues.

     ------------------------------------------------------------------------

        - 2 -       NAMDBF

                  FORMAT(A80)

        NAMDBF      Name of the data base file (maximum 80 characters)
              if NOT data base generation leave a BLANK CARD

     ------------------------------------------------------------------------

        - 3 -      TITLE

                  FORMAT(20A4)

        TITLE      Descriptive header for the residue

     ------------------------------------------------------------------------

        - 4 -      NAMF

                 FORMAT(A80)

        NAMF       Name of the output file if an individual residue file is
             being generated.  If database is being generated or
             appended this card IS read but ignored.

     ------------------------------------------------------------------------

        - 5 -      NAMRES , INTX , KFORM

                 FORMAT(2A,I)

        NAMRES     A unique name for the residue of maximum 4 characters

        INTX       Flag for the type of coordinates to be saved for the
             LINK module
      'INT'  internal coordinates will be output (preferable)
      'XYZ'  cartesian coordinates will be output

        KFORM      Format of output for individual residue files
        = 0  formatted output (recommended for debugging)
        = 1  binary output

     ------------------------------------------------------------------------

        - 6 -      IFIXC , IOMIT , ISYMDU , IPOS

                 FORMAT(4A)

        IFIXC      Flag for the type of input geometry of the residue(s)

      'CORRECT' The geometry is input as internal coordinates with
                correct order according to the tree structure.
                NOTE: the tree structure types ('M', 'S', etc) and order
                must be defined correctly: NA(I), NB(I), and NC(I) on card
                8 are always ignored.
      'CHANGE'  It is input as cartesian coordinates or part cartesian
                and part internal.  Cartesians should precede internals
                to ensure that the resulting coordinates are correct.
                Coordinates need not be in correct order, since each
                is labeled with its atom number. NOTE: NA(I), NB(I), and
                NC(I) on card 8 must be omitted for cartesian coordinates
                with this option.

        IOMIT      Flag for the omission of dummy atoms

      'OMIT'    dummy atoms will be deleted after generating all the
                information (this is used for all but the first residue
                in the system)
      'NOMIT'   they will not be deleted (dummy atoms are retained for
                the first residue of the system.  others are omitted)

        ISYMDU     Symbol for the dummy atoms.  The symbol must be
             be unique.  It is preferable to use 'DU' for it

        IPOS       Flag for the position of dummy atoms to be deleted

      'ALL'     all the dummy atoms will be deleted
      'BEG'     only the beginning dummy atoms will be deleted

     ------------------------------------------------------------------------

        - 7 -      CUT

                 FORMAT(F)

        CUT        The cutoff distance for loop closing bonds which
             cannot be defined by the tree structure.  Any pair of
             atoms within this distance is assumed to be bonded.
             We recommend that CUT be set to 0.0 and explicit loop
             closing bonds be defined below.

     ------------------------------------------------------------------------

        - 8 -      I , IGRAPH(I) , ISYMBL(I) , ITREE(I) , NA(I) , NB(I) ,
             NC(I) , R(I) , THETA(I) , PHI(I) , CHG(I) , I = 1, NATOM

                 FORMAT(I,3A,3I,4F)

        I          The actual number of the atom in the tree.

             If IFIXC .eq. 'CHANGE' then this number is important
             since the corresponding coordinates are stored at that
             location.  If IFIXC .eq. 'CORRECT' then atoms are in
             the correct order according to the tree structure.

        NOTE:  PREP always expects three dummy atoms for the beginning.

        IGRAPH(I)  A unique atom name for the atom I. If coordinates are
             read in at the EDIT stage, this name will be used for
             matching atoms.  Maximum 4 characters.

        ISYMBL(I)  A symbol for the atom I which defines its force field
             atom type and is used in the module PARM for assigning
             the force field parameters.

        ITREE(I)   The topological type (tree symbol) for atom I
             (M, S, B, E, or 3)

        NA(I)      The atom number to which atom I is connected.
             Read but ignored for internal coordinates; If cartesian
             coordinates are used, this must be omitted.

        NB(I)      The atom number to which atom I makes an angle along
             with NA(I).
             Read but ignored for internal coordinates; If cartesian
             coordinates are used, this must be omitted.

        NC(I)      The atom number to which atom I makes a dihedral along
             with NA(I) and NB(I).
             Read but ignored for internal coordinates; If cartesian
             coordinates are used, this must be omitted.

        R(I)       If IFIXC .eq. 'CORRECT' then this is the bond length
             between atoms I and NA(I)
             If IFIXC .eq. 'CHANGE' then this is the X coordinate
             of atom I

        THETA(I)   If IFIXC .eq. 'CORRECT' then it is the bond angle
             between atom NB(I), NA(I) and I
             If IFIXC .eq. 'CHANGE' then it is the Y coordinate of
             atom I

        PHI(I)     If IFIXC .eq. 'CORRECT' then it is the dihedral angle
             between NC(I), NB(I), NA(I) and I
             If IFIXC .eq. 'CHANGE' then it is the Z coordinate of
             atom I

        CHRG(I)    The partial atomic charge on atom I

        This section is terminated by one BLANK CARD if IFIXC = 'CORRECT'.
        This section is terminated by TWO BLANK CARDS if IFIXC = 'CHANGE'.

     ------------------------------------------------------------------------

        - 9 -      IOPR

                 FORMAT(A4)

        IOPR       Flag to read additional information about the residue.
             There are four options available.  The order in which
             they are specified is not important.  Format is keyword
             on its own line, followed by data on succeeding lines,
             terminated by a BLANK CARD.

         'CHARGE'  Control to read additional partial atomic charges.
             These will override charges specified above in section 8.
             The charges are read in format(5F) for the non-dummy
             atoms.  A BLANK CARD terminates this section.   It is
             less error-prone to specify charges as in section 8.

          'LOOP'   Control to read explicit loop closing bonds (in
             addition to the loops generated based on the cutoff
             criterion).  If this option is used it is preferable
             to set the cutoff criterion to zero.  The loop closing
             atoms are read in format(2A) as their atom (IGRAPH) names.
             A BLANK CARD terminates this section.

       'IMPROPER'  Control for reading the improper torsion angles.  A
             proper torsion I - J - K - L has I bonded to J bonded
             to K bonded to L.  An IMPROPER torsion is any torsion in
             which this is not the case.  Improper torsions are used to
             keep the asymmetric centers from racemizing in the united
             atom model where all the C-H hydrogens are omitted.  They
             can also be used to enforce planarity.  The normal case is:

                                J
                                |
                                K
                               / \
                              I   L

                         Improper I-J-K-L

             where the central atom (K) is the third atom in the improper
             and the order of the other three is determined alphabetically
             by atom type and if types are the same by atom number.
             The improper torsions should be defined in such a way that
             the proper torsions are not duplicated.  The atoms making the
             improper torsions are read as their atom (IGRAPH) names.
             '-M' can be used in place of an atom name to indicate the
             last main chain atom in the previous residue, and '+M' for
             the first main chain atom in the next residue. NOTE: -M and
             +M cannot be used in the 4th position ('L') owing to internal
             data representation limitations.  A BLANK CARD terminates
             this section.

          'DONE'   Control to exit from this section.


       NOTE: If extra blank cards are found between different options they
       are ignored.  Control will exit only when the 'DONE' option is
       found.  If it is desired to process another residue place the
       appropriate information after the 'DONE' card.

     ------------------------------------------------------------------------

        -10 -      KSTOP

                 FORMAT(A4)

        KSTOP      Control to exit from the program

      'STOP' Exit from the program.  It has to be placed immediately
             following the 'DONE' card.

             The program can never make a graceful exit if this card
             is missing since it is working inside an infinite loop.


_______________________________________________________________

         Example input for phenylalanine (united atom)



                          Res n-1          O
                             \            /
                              N----CA----C
                             /     |      \
                           HN      CB     Res n+1
                                   |
                                   CG
                                 /    \
                               CD1    CD2
                                |      |
                               CE1    CE2
                                 \    /
                                   CZ




         0    0    1
     
          PHENYLALANINE PREP INPUT EXAMPLE (title)
     PHE
     PHE  INT    1
     CORRECT  OMIT DU   BEG
       0.0
         1 DUMM   DU    M    0   -1   -2   0.0000    0.0000    0.0000  0.000
         2 DUMM   DU    M    1    0   -1   1.4490    0.0000    0.0000  0.000
         3 DUMM   DU    M    2    1    0   1.5220  111.1000    0.0000  0.000
         4 N      N     M    3    2    1   1.3350  116.6000  180.0000 -0.5200
         5 HN     H     E    4    3    2   1.0100  119.8000    0.0000  0.2480
         6 CA     CH    M    4    3    2   1.4490  121.9000  180.0000  0.2140
         7 CB     C2    S    6    4    3   1.5250  111.1000   60.0000  0.0380
         8 CG     CA    S    7    6    4   1.5100  115.0000  180.0000  0.0110
         9 CD1    CD    S    8    7    6   1.4000  120.0000  180.0000 -0.0110
        10 CE1    CD    S    9    8    7   1.4000  120.0000  180.0000  0.0040
        11 CZ     CD    S   10    9    8   1.4000  120.0000    0.0000 -0.0030
        12 CE2    CD    S   11   10    9   1.4000  120.0000    0.0000  0.0040
        13 CD2    CD    E   12   11   10   1.4000  120.0000    0.0000 -0.0110
        14 C      C     M    6    4    3   1.5220  111.1000  180.0000  0.5260
        15 O      O     E   14    6    4   1.2290  120.5000    0.0000 -0.5000

     IMPROPER
     -M  CA  N   HN
     CA  +M  C   O
     CB  CA  N   C

     LOOP
     CG  CD2

     DONE
     STOP