File: example3.tex

package info (click to toggle)
ball 1.5.0%2Bgit20180813.37fc53c-11
links: PTS, VCS
area: main
in suites: bookworm
size: 239,924 kB
sloc: cpp: 326,149; ansic: 4,208; python: 2,303; yacc: 1,778; lex: 1,099; xml: 958; sh: 322; javascript: 164; makefile: 88
file content (227 lines) | stat: -rw-r--r-- 10,251 bytes
parent folder | download | duplicates (9)
\section{A Simple AMBER Calculation}

\index{Energy minimization}
\index{optimizing hydrogens}

Having introduced the basics of handling proteins in the last chapter, we now
turn towards real-life examples: performing a molecular dynamics simulation
with a protein structure from PDB. Again, we will be using BPTI, reading it
from a PDB file as in the previous example:

\begin{lstlisting}{}
	PDBFile	infile("pdb4pti.ent");
	System S;
	infile >> S;
	infile.close();
\end{lstlisting}

\noindent
The file we chose to read here is the original file as obtained from
the PDB, therefore it contains neither hydrogen atoms, nor bonds.
The BALL class \class{FragmentDB} provides a convenient way to solve
both problems. \class{FragmentDB} is an extendible database of residue
structures. By comparing the residues in the PDB file with the reference
templates in the \class{FragmentDB}, we can identify the missing bonds
and hydrogen atoms. A matching between atoms is computed based on the names,
so the atom names have to adhere to the PDB naming convention.
For deviating naming schemes, \class{FragmentDB} provides a member instance
\member{}{normalize\_names}, which tries to convert the names to the PDB 
naming convention. \member{}{normalize\_names} is a processor, so we
can \member{Composite}{apply} it to any given kernel data structure:

\begin{lstlisting}{}
	FragmentDB db("fragments/Fragments.db");
	S.apply(db.normalize_names);
\end{lstlisting}

\noindent
Instantiating \class{FragmentDB} usually takes a few seconds to parse the
fragment database and the naming conversion tables stored in
\directory{BALL/data/Fragments}. The resulting data may then be used by two
very handy processors: \processor{add\_hydrogens} and \processor{build\_bonds}.
As their names suggest, those processors add the missing hydrogen atoms and
rebuild missing bonds. We will now add the missing atoms and bonds of BPTI 
through simple application of the two processors:

\begin{lstlisting}{}
	S.apply(db.add_hydrogens);
	S.apply(db.build_bonds);
\end{lstlisting}
\index{adding hydrogens}
\index{building bonds}

\noindent
Now we have constructed a complete protein structure of BPTI. We can verify
this by applying a \class{ResidueChecker} to the system.
\class{ResidueChecker} is a processor that performs a number of consistency
checks on a given kernel data structure:
\begin{itemize}
	\item check for missing atoms
	\item check for overlapping atoms (closer than 0.5 \AA)
	\item check for integrality of residue charges (not relevant here)
	\item check bond lengths (should be within 15\% of the template bond lengths)
\end{itemize}
The information on missing atoms and bond lengths is taken from an instance
of \class{FragmentDB}:\index{fragment database}

\begin{lstlisting}{}
	ResidueChecker rc(db);
	S.apply(rc);
\end{lstlisting}
\index{checking residues}
	
\noindent
If the \class{ResidueChecker} notices a problem with the structure, it will 
print a warning. In our case, it was (hopefully) correct, so nothing will
happen.

Although our protein structure is correct, the positions of the added hydrogen
atoms are only approximations of the true positions.
\class{Add\-Hydrogens\-Processor} tries to set the positions based on the
positions given in the \class{FragmentDB} templates, which usually deviate
from their optimal position in the given structure. We can relax those
hydrogen positions using a molecular mechanics calculation.
We will use the AMBER force field~\cite{AMBER95} and optimize the hydrogen 
positions while keeping the heavy atoms rigid. The AMBER force field is
implemented in the \class{AmberFF} class. Instantiating a force field and
setting up a calculation is very simple:\index{AMBER force field}

\begin{lstlisting}{}
	AmberFF amber(S);
\end{lstlisting}

\noindent
This constructor-call creates a new \class{AmberFF}, reads the parameter
file (the default one, \file{amber94.ini}, which corresponds to the AMBER
file {\tt parm94.dat}), and constructs some internal data structures.
This particular instance of the force field is now {\em bound} to the 
system and all its actions will apply to that system, unless a different
system is specified in a call to \method{setup}.

We now want to optimize the hydrogens only. This can be achieved through the
{\em selection} concept of the kernel classes. Whether an atom is selected or
not has different meanings at different stages of the force field calculation.
When calling \method{setup} (as the above constructor does implicitly),
the force field will ignore all unselected atoms and only selected atoms will
be part of the computation. There is one special case: If the whole system is
unselected, it will be treated as if it had been selected completely (this is
just for convenience).

{\em After} \method{setup} has been called, the selection gets a different
meaning. It now indicates which atoms are to be optimized (in an energy
minimization), moved (in an MD simulation) or considered for the force and
energy evaluations at all. Thus by deselecting the whole system (the default)
we ensure that all atoms are considered by the force field. If we want to
optimize the hydrogen atoms only, we have to select them. One way to do that
is the \class{Selector} class. Given a BALL {\em kernel expression} (see the
documentation for \class{Expression} for details), it will select all atoms
for which the expression evaluates to true. In our case, the expression 
{\tt "element(H)"} describes the set of atoms we are interested in:
\index{selecting atoms}

\begin{lstlisting}{}
	Selector hydrogen_selector("element(H)");
	S.apply(hydrogen_selector);
\end{lstlisting}

\noindent
An energy minimization of those hydrogens is done using the
\class{Conjugate\-Gradient\-Minimizer}. It is not hard to figure out that this
class indeed implements a conjugate gradient energy minimization.
In a similar fashion as the force field is bound to a system, the 
energy minimizer instances are bound to the force field. Again, we can use a
constructor to do most of the work:

\begin{lstlisting}{}
	ConjugateGradientMinimizer cgm(amber);
	cgm.setEnergyOutputFrequency(1);
	cgm.setMaxGradient(5.0);
	std::cout << "Minimizer options:" << std::endl;
	cgm.options.dump();
	cgm.minimize(100);
\end{lstlisting}

\noindent
The minimizer object {\tt cgm} has a variety of options controlling its
behavior (please consult the reference manual for all possible options).
After instantiating it, we first adjust its {\em energy output frequency},
\ie the number of iterations performed before a status message is printed.
This status message contains information on the current energy and the
gradient. \method{setMaxGradient} adjusts the convergence criterion for the
minimizer: as soon as the RMS gradient of the system falls below 5~kJ/(mol
\AA), the minimization is aborted. Please note that all energies in BALL
are in kJ/mol, all positions and distances in {\AA} and therefore the gradient
in kJ/(mol \AA). The current settings of the minimizer are all stored in 
the member \member{EnergyMinimizer}{options} (a public instance of 
\class{Options}), so the internal state of the minimizer is readily obtained 
by dumping \member{EnergyMinimizer}{options} to {\tt cout}.

Finally, a call to \method{minimize} with argument 100 will cause the
minimizer to run for
(at most) 100 iterations. The result is a structure of BPTI containing all
the atoms at optimized positions, so now we can perform an MD simulation of
the whole protein. Obviously, we now have to deselect the hydrogen atoms.
Selection is recursive, so deselecting or selecting a residue will
select/deselect all its atoms, selecting a system will select all its
molecules, and so on. For a more detailed description, please read Section
\ref{section:kernelclasses}. The easiest way to deselect all atoms is
therefore to deselect the whole system:

\begin{lstlisting}{}
  S.deselect();
\end{lstlisting}

\noindent
Similar to energy minimization, molecular dynamics (MD) simulation is also
implemented as its own class, \class{MolecularDynamics}. There are two derived
classes: \class{CanonicalMD} and \class{MicroCanonicalMD}, implementing an MD
simulation in the canonical ensemble (isothermal) and the microcanonical
ensemble (adiabatic). 

For a protein immersed in water, the canonical ensemble is the obvious choice.
We will furthermore run the simulation in a cubic box with periodic boundary 
conditions. So the first step is to set up that box and fill it with water.
Luckily, water is the default solvent in BALL, so all you have to do is
let the force field know that you want to set up a box with water around the
current system:

\begin{lstlisting}{}
  amber.options[PeriodicBoundary::Option::PERIODIC_BOX_ENABLED] 
      = true;
  amber.options[PeriodicBoundary::Option::PERIODIC_BOX_ADD_SOLVENT] 
      = true;
  amber.setup(S);
\end{lstlisting}

\noindent
Setting the option {\tt PERIODIC\_BOX\_ENABLED} will cause the force field to
enable periodic boundary conditions at the next call to \method{setup}. In
analogy, setting the option {\tt PERIODIC\_BOX\_ADD\_SOLVENT} will add the 
default solvent to the box. By default, the box is defined as the bounding box
of the system augmented by 5~\AA{} in each direction. You can also specify 
arbitrary bounding boxes or different solvents. Please refer to the 
documentation of \class{PeriodicBoundary} for a more detailed description.

We can now instantiate the molecular dynamics object, set up a run at 300~K and
perform a few  MD simulation steps:

\begin{lstlisting}{}
  CanonicalMD md(amber);
  md.setReferenceTemperature(300);
  md.simulate(10);
  std::cout << "Simulation settings:" << std::endl;
  md.options.dump();
\end{lstlisting}

\noindent
As in the force field and the energy minimizer, the MD simulation object
stores all its settings in an \class{options} object. The
\method{simulate} method simulates a given number of MD
steps. 

The source code for the complete example can be found as {\tt tutorial3.C}
in \directory{BALL/source/TUTORIAL}. Further documentation is available for all
classes in the BALL Reference Manual. The header files required for molecular
mechanics reside in \directory{BALL/include/BALL/MOLMEC} and its
subdirectories.