File: example2.tex

package info (click to toggle)
ball 1.5.0%2Bgit20180813.37fc53c-11
links: PTS, VCS
area: main
in suites: bookworm
size: 239,924 kB
sloc: cpp: 326,149; ansic: 4,208; python: 2,303; yacc: 1,778; lex: 1,099; xml: 958; sh: 322; javascript: 164; makefile: 88
file content (101 lines) | stat: -rw-r--r-- 4,496 bytes
parent folder | download | duplicates (9)
\section{Handling Proteins}

In the second example we take a step towards real world applications. Instead
of constructing our own molecules, we now read a protein from a \newterm{PDB
file}. The PDB format~\cite{PDB} is a standardized file format for molecular 
structure data. In our case, we will read BPTI (bovine pancreatic trypsin 
inhibitor), a small protein:

\begin{lstlisting}{}
  PDBFile infile("bpti.pdb");
  System S;
  infile >> S;
  infile.close();
\end{lstlisting}

\noindent
In the first line, we create a \class{PDBFile} object and assign it to a file
named {\tt bpti.pdb}. Then we create an empty system {\tt S} and read the
contents of the PDB file into the system using the stream operator {\tt >>}.
\index{operator {\tt >>}} This use of the stream operators is possible for all
file formats in BALL (see also the first example on 
Page~\pageref{useofstreams}). Hence, you can easily switch file formats by 
changing just the type of the {\tt infile} object (\eg replace it by 
\class{HINFile} to read HyperChem files).

We can now verify whether the file was correctly read. BPTI should have 454
atoms. As mentioned before, for each kernel object containing atoms
(\ie classes that are derived from \class{AtomContainer}), we can obtain the
actual number of atoms by calling \method{countAtoms}:

\begin{lstlisting}{}
  cout << "# of atoms in BPTI: " << S.countAtoms() << endl;
\end{lstlisting}

\noindent
Now, we are interested in the sequence of BPTI. Since BPTI contains only a
single chain, we might just traverse all residues of this chain and write
their names to {\tt cout}. This is done by \newterm{iterators}. Iterators are
objects that can be used to traverse container objects (\eg lists, arrays, or
-- in our case -- proteins). Iterators ``point'' to an element of a container
object. They may be incremented to get the next element and they may be
dereferenced similar to pointers by using {\tt *} or {\tt ->}. In fact,
C pointers can be thought of as iterators.
The use of iterators in BALL is similar to the
use of iterators in the \newterm{Standard Template Library} (\Index{STL}).
But there is a significant difference. STL containers usually contain objects of
one single type (\eg a list of strings). In BALL kernel objects, this is
different. A system may contain atoms, proteins, molecules, residues, \etc
This leads to a difference in the interface. A typical {\tt for} loop using STL
iterators to access all elements of a list looks as follows:

\begin{lstlisting}{}
  list<string> string_list = ...;
  list<string>::iterator list_it;
	
  for (list_it = string_list.begin(); 
       list_it != string_list.end();
       ++list_it)
  {
    cout << *list_it << endl;
  }
\end{lstlisting}

\noindent
Here, we use a list iterator ({\tt list\_it}) to traverse the whole list ({\tt
string\_list}). The method {\tt begin()} returns an iterator that points to
the first element of the list. In the {\tt for} loop we increment the iterator
until it equals the iterator returned by {\tt end()}. This method returns a
past--the--end iterator, \ie the iterator points beyond the last element of
the container. In the body of the {\tt for} loop, we access the list element
the iterator points to using the {\tt *} operator. \index{operator {\tt *}}

Traversing a BALL kernel data structure is as simple as traversing a list with
STL. But since kernel objects may contain objects of a variety of types,
we have to define over which objects we intend to iterate.
For example iterating over all residues of our system requires a
\class{ResidueIterator}. Clearly, we also need different {\tt begin()} and
{\tt end()} methods for all data types. Hence, the loop that prints the
sequence of our protein reads as follows:

\begin{lstlisting}{}
	ResidueIterator res_it;
	for (res_it = S.beginResidue(); 
			 res_it != S.endResidue();
			 ++res_it)
	{
		cout << res_it->getName() << " ";
	}
	cout << endl;
\end{lstlisting}

\noindent
This loop iterates over all residues in {\tt S} and uses the method {\tt
Residue::}\method{getName()} to access the residues name. All proteins and
residues are traversed in the order in which they appear in the PDB file.
Since the PDB format requires the residues to start with the N-terminus,
the sequence is printed  in the correct order (N-terminus to C-terminus). 

The source code for this example can be found as {\tt tutorial2.C} in
\directory{BALL/source/TUTORIAL}. The PDB file {\tt pbti.pdb} can also be found
in this directory.