1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377
|
\chapter[Data Customization]{Data Customization}
\label{chap:data-customization}
\index{data~customization}
In this chapter, the user will be walked trough an example of how new
polymer chemistry definition data can be generated and included in the
automatic ``data detection system'' of \mXp\ (that is how new polymer
chemistry definitions should be registered with the system).
Customization is typically performed by the normal user (not the
Administrator nor the Root of the machine) and as such new data are
typically stored in the user's ``home'' directory. On \OSname{UNIX}
machines, the ``home'' directory is usually the
\filename{/home/username} directory, where username is the logging
user~name. On \OSname{MS-Windows}, that directory is typically the
\filename{C:/Documents and Settings/username}\footnote{Although
\OSname{MS-Windows} pathnames use a back~slash, in this book these
are composed using forward slashes for a number of valid reasons.
The reader only needs to replace back~slashes with the forward
variety.}, once again with username being the logon user~name.
In the next sections we will refer to that ``home directory'' (be it
on \OSname{UNIX} or \OSname{MS-Windows} machines) as the \$HOME
directory, as this the standard environment variable describing that
directory in \OSname{GNU/Linux}.
When \mXp\ is executed, it automatically tries to read data
configuration files from the home directory (in the
\filename{.massxpert} directory). Once this is done, it reads all the
data configuration files in the installation directory (typically, on
\OSname{GNU/Linux} that would be the configuration data in the
\filename{/usr/local/share/massxpert} directory or, on
\OSname{MS-Windows}, the \filename{c:/Program Files/massxpert}
directory).
We said above that \mXp\ tries to read the data configuration files
from the home directory. But upon its very first execution, right
after installation, that directory does not exist, and in fact \mXp\
creates that directory for us to populate it some day with interesting
new data.
The \filename{\$HOME/.massxpert} directory should have a structure
mimicking the one that was created upon installation of the software,
that is, it should contain the following two directories: \medskip
\begin{itemize}
\item \filename{pol-chem-defs}
\item \filename{plugins}
\end{itemize}
\noindent Those are the directories where the user is invited to store
her personal data. In order to start a new definition, one might
simply copy there one of the polymer chemistry definitions that are
shipped with \mXp. What should be copied? An entire polymer chemistry
definition directory, like for example the following:\medskip
\noindent
\filename{/usr/local/share/massxpert/pol-chem-defs/protein-1-letter}
\smallskip or \smallskip
\noindent \filename{C:/Program
Files/massxpert/data/pol-chem-defs/protein-1-letter}
\bigskip
\noindent Once that polymer chemistry definition is copied, one may
start studying how it actually works. This directory contains the
following kinds of files: \medskip
\begin{itemize}
\item \filename{protein-1-letter.xml}: the polymer chemistry
definition file. This is the file that is read upon selection of the
corresponding polymer chemistry definition name in \xpd. If the
polymer chemistry definition is not yet registered with the system
(described later), then open that file by browsing to it by clicking
the \guilabel{Cancel} button.\footnote{See
chapter~\ref{chap:xpertdef}, page~\pageref{chap:xpertdef}.};
\item \fileformat{svg} files: \textit{scalar vector graphics} files
used to render graphically the sequence in the sequence editor. For
example, \filename{arginine.svg} contains the graphical
representation of the arginine monomer. There are such graphics
files also for the modifications (like, for example, the
\filename{sulphation.svg} contains the graphical representation of
the sulphation modification.
Figure~\ref{fig:pol-chem-defs-directory-protein-and-saccharide}
shows two examples of \fileformat{svg} files belonging to two
distinct polymer chemistry definitions;
\item \filename{chem\_pad.conf}: configuration file for the chemical
pad in the \xpc\ module;
\item \filename{monomer\_dictionary}: file establishing the relationship
between any monomer code of the polymer chemistry definition and the
graphical \fileformat{svg} file to be used to render graphically
that monomer in the sequence editor;
\item \filename{modification\_dictionary}: file establishing the
relationship between any monomer modification\footnote{See section
\ref{subsect:chemical-modification-monomers},
page~\pageref{subsect:chemical-modification-monomers}.} and the
graphical \fileformat{svg} file to be used to render graphically
that modification onto the modified monomer in the sequence editor;
\item \filename{cross\_linker\_dictionary}: file establishing the
relationship between any cross-link\footnote{See section
\ref{subsect:monomer-cross-link},
page~\pageref{subsect:monomer-cross-link}.} and the graphical
\fileformat{svg} file to be used to render graphically that
cross-link onto the cross-linked monomers in the sequence editor;
\item \filename{pka\_ph\_pi.xml}: file describing the acido-basic
data\footnote{See section \ref{sect:acido-basic-calculations},
page~\pageref{sect:acido-basic-calculations}.} pertaining to
ionizable chemical groups in the different entities of the polymer
chemistry definition;
\end{itemize}
\begin{figure}
\begin{center}
\includegraphics [height=0.75\textheight]
{figures/pol-chem-defs-directory-protein-and-saccharide.png}
\end{center}
\caption[The polymer chemistry definition directory]{\textbf{The
polymer chemistry definition directory.} Each monomer of the
polymer chemistry definition ought to have a corresponding
\fileformat{svg} file with which it has to be rendered graphically
should that monomer be inserted in the polymer sequence. This
example shows two \fileformat{svg} files corresponding to two
monomers each belonging to a different polymer chemistry
definition.}
\label{fig:pol-chem-defs-directory-protein-and-saccharide}
\end{figure}
\noindent The polymer sequence editor is not a classical editor. There
is no font in this editor: when the user starts keying-in a polymer
sequence in the editor, the small \fileformat{svg} graphics files are
rendered into raster \textit{vignettes} at both the proper resolution
and screen size and displayed in the sequence editor. The user is
totally in charge of designing the \fileformat{svg} graphics files for
each of the monomers defined in the polymer sequence editor. Of
course, reusing material is perfectly possible. There is one
constraint: that the \filename{monomer\_dictionary} file lists with
precision ``what code goes with what \fileformat{svg} graphics
file''. That file has the following contents, for example, for the
``protein-1-letter'' polymer chemistry definition, as shipped in the
\mXp\ package:
\begin{verbatim}
# This file is part of the massXpert project.
# The "massXpert" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.
# Copyright (C) 2006,2007 Filippo Rusconi
# This is the monomer_dictionary file where the correspondences
# between the codes of each monomer and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made.
# The format of the file is like this :
# -------------------------------------
# A%alanine.svg
# where A is the monomer code and alanine.svg is a
# resolution-independent svg file.
# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file.
# This file is case-sensitive.
A%alanine.svg
C%cysteine.svg
D%aspartate.svg
E%glutamate.svg
F%phenylalanine.svg
G%glycine.svg
H%histidine.svg
I%isoleucine.svg
K%lysine.svg
L%leucine.svg
M%methionine.svg
N%asparagine.svg
P%proline.svg
Q%glutamine.svg
R%arginine.svg
S%serine.svg
T%threonine.svg
V%valine.svg
W%tryptophan.svg
Y%tyrosine.svg
\end{verbatim}
\noindent What one sees from the contents of the file is that each
monomer code has an associated \fileformat{svg} file. For example,
when the user has to key-in a valine monomer, she keys-in the code
\kbdKey{V} and \xpe\ knows that the monomer vignette to show has to be
rendered using the \filename{valine.svg} file.
For the monomer modification graphical rendering, the situation is
somewhat different, as seen in the \filename{modification\_dictionary}
file:
\begin{verbatim}
# This file is part of the massXpert project.
# The "massXpert" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.
# Copyright (C) 2006,2007 Filippo Rusconi
# This is the modification_dictionary file where the correspondences
# between the name of each modification and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made. Also, the graphical operation that is to
# be performed upon chemical modification of a monomer is listed ('T'
# for transparent and 'O' for opaque). See the manual for details.
# The format of the file is like this :
# -------------------------------------
# Phosphorylation%T%phospho.svg
# where Phosphorylation is the name of the modification. T indicates
# that the visual rendering of the modification is a transparent
# process (O indicates that the visual rendering of the modification
# is a full image replacement 'O' like opaque). phospho.svg is a
# resolution-independent svg file.
# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file.
# This file is case-sensitive.
Phosphorylation%T%phospho.svg
Sulphation%T%sulpho.svg
AmidationAsp%O%asparagine.svg
Acetylation%T%acetyl.svg
AmidationGlu%O%glutamine.svg
Oxidation%T%oxidation.svg
\end{verbatim}
\noindent There are two ways to render a chemical modification of a
monomer: \medskip
\begin{itemize}
\item \textbf{Opaque} rendering: the initial monomer vignette is
replaced using the one listed in the file for the modification. This
is visible in the \verb|AmidationGlu\%O\%glutamine.svg| line: when a
monomer is (typically that would be a Glu monomer) is amidated, the
graphical representation of the modification process should involve
the \textit{replacement} of the old vignette in the sequence editor
with the new one (in the example, the new vignette should be
rendered using the \filename{glutamine.svg} file. In other words,
the process involves an ``\textbf{O}paque'' overlay of the vignette
for unmodified Glu with a vignette rendered by using the
\filename{glutamine.svg} file.
\item textbf{Transparent} rendering: the initial monomer vignette is
overlaid with one new vignette that is rendered using a
\fileformat{svg} file that is transparent (unless for the graphical
motif to be made visible, of course). One example is the
``Phosphorylation'' modification (line
\verb|Phosphorylation%T%phospho.svg|), for which the monomer being
phosphorylated has its vignette in the sequence editor overlaid with
a ``\textbf{T}ransparent'' one which only shows a small red 'P' and
that is rendered using the \filename{phospho.svg} file.
\end{itemize}
\noindent The way new \fileformat{svg} files might be edited is using
the following programs: \medskip
\begin{itemize}
\item \progname{Inkscape}: on \OSname{GNU/Linux} and \OSname{MS-Windows};
\item \progname{Karbon}: on \OSname{GNU/Linux};
\end{itemize}
\noindent In general, the best thing to do is to convert text to path,
so that the rendering is absolutely perfect.
\bigskip
\fbox{\parbox{0.9\textwidth}{It is absolutely essential, for the proper
working of the sequence editor, that the \fileformat{svg} files be
square (that is, width = height).}}
\bigskip
Once the new polymer chemistry has been correctly defined, it is time
to register that new definition to the system. To recap: all the files
for that definition should reside in a same directory, exactly the
same way as the files pertaining to a given polymer chemistry
definition are shipped in \mXp\ altogether in one directory. The name
of the new polymer chemistry definition should be unambiguous, with
respect to other registered polymer chemistry definitions.
The way a polymer chemistry definition is registered is by created a
personal polymer chemistry definition catalogue file, which must
comply with two requirements:\medskip
\begin{itemize}
\item Be named \filename{xxxxx-pol-chem-defs-cat}, with
\filename{xxxxx} being a discretionary string (this might well be
your name, for example). The requirement is that
\textbf{\filename{-pol-chem-defs-cat}} be the last part of the
filename. Please \textit{DO NOT USE} spaces, punctuation or
diacritical signs in your filenames. \textit{RESTRICT} yourself to
ASCII characters between [a-z], [0-9], `\_' and `-'.\footnote{This
is actually something very general as a recommendation in order to
not suffer from severe headaches when you expect it less\dots}
\item Be located in the \filename{\$HOME/.massxpert/pol-chem-defs}
directory and have the following format: \smallskip
\verb|dna=/path/to/definition/directory/dna/dna.xml|. In this
example, the ``dna'' polymer chemistry definition is being
registered as a file \filename{dna.xml} located in the
\filename{dna} directory, itself located in the
\filename{/path/to/definition/directory} directory;
\end{itemize}
\noindent Note that if a new polymer chemistry definition should be
made available system-wide, then it is logical that its directory be
placed along the ones shipped with \mXp\ and a new local catalogue file
might be created to register the new polymer chemistry definition.
At this point the new polymer chemistry definition might be
tested. Typically, that involves restarting the \mXp\ program and
creating a brand new polymer sequence of the new definition type. The
first step is to check if the new definition is successfully
registered with the system, that is, it should show up a an available
definition upon creation of the new polymer sequence. If not, then that
means that the catalogue file could not be found or parsed
correctly.
When problems like this one occurs, the first thing to do is to ensure
that the console window (on \OSname{MS-Windows} it is systematically
started along with the program; on \OSname{GNU/Linux} the way to have it is to
start the program from the shell) so as to look with attention at the
different messages that might help understanding what is failing.
Please, do not hesitate to submit bug reports (see the first pages of
this manual for the address where to post bug reports).
|