File: data-customization.tex

package info (click to toggle)
massxpert 2.3.6-1squeeze1
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 20,736 kB
  • ctags: 3,541
  • sloc: cpp: 44,108; xml: 7,381; sh: 604; makefile: 108; ansic: 7
file content (377 lines) | stat: -rw-r--r-- 15,274 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
\chapter[Data Customization]{Data Customization}
\label{chap:data-customization}
\index{data~customization}

In this chapter, the user will be walked trough an example of how new
polymer chemistry definition data can be generated and included in the
automatic ``data detection system'' of \mXp\ (that is how new polymer
chemistry definitions should be registered with the system).

Customization is typically performed by the normal user (not the
Administrator nor the Root of the machine) and as such new data are
typically stored in the user's ``home'' directory. On \OSname{UNIX}
machines, the ``home'' directory is usually the
\filename{/home/username} directory, where username is the logging
user~name. On \OSname{MS-Windows}, that directory is typically the
\filename{C:/Documents and Settings/username}\footnote{Although
  \OSname{MS-Windows} pathnames use a back~slash, in this book these
  are composed using forward slashes for a number of valid reasons.
  The reader only needs to replace back~slashes with the forward
  variety.}, once again with username being the logon user~name.

In the next sections we will refer to that ``home directory'' (be it
on \OSname{UNIX} or \OSname{MS-Windows} machines) as the \$HOME
directory, as this the standard environment variable describing that
directory in \OSname{GNU/Linux}.

When \mXp\ is executed, it automatically tries to read data
configuration files from the home directory (in the
\filename{.massxpert} directory). Once this is done, it reads all the
data configuration files in the installation directory (typically, on
\OSname{GNU/Linux} that would be the configuration data in the
\filename{/usr/local/share/massxpert} directory or, on
\OSname{MS-Windows}, the \filename{c:/Program Files/massxpert}
directory).

We said above that \mXp\ tries to read the data configuration files
from the home directory. But upon its very first execution, right
after installation, that directory does not exist, and in fact \mXp\
creates that directory for us to populate it some day with interesting
new data.

The \filename{\$HOME/.massxpert} directory should have a structure
mimicking the one that was created upon installation of the software,
that is, it should contain the following two directories: \medskip

\begin{itemize}

  \item \filename{pol-chem-defs}

  \item \filename{plugins}

\end{itemize}

\noindent Those are the directories where the user is invited to store
her personal data. In order to start a new definition, one might
simply copy there one of the polymer chemistry definitions that are
shipped with \mXp. What should be copied? An entire polymer chemistry
definition directory, like for example the following:\medskip

\noindent
\filename{/usr/local/share/massxpert/pol-chem-defs/protein-1-letter}

\smallskip or \smallskip

\noindent \filename{C:/Program
  Files/massxpert/data/pol-chem-defs/protein-1-letter}

\bigskip

\noindent Once that polymer chemistry definition is copied, one may
start studying how it actually works. This directory contains the
following kinds of files: \medskip

\begin{itemize}

\item \filename{protein-1-letter.xml}: the polymer chemistry
  definition file. This is the file that is read upon selection of the
  corresponding polymer chemistry definition name in \xpd. If the
  polymer chemistry definition is not yet registered with the system
  (described later), then open that file by browsing to it by clicking
  the \guilabel{Cancel} button.\footnote{See
    chapter~\ref{chap:xpertdef}, page~\pageref{chap:xpertdef}.};

\item \fileformat{svg} files: \textit{scalar vector graphics} files
  used to render graphically the sequence in the sequence editor. For
  example, \filename{arginine.svg} contains the graphical
  representation of the arginine monomer. There are such graphics
  files also for the modifications (like, for example, the
  \filename{sulphation.svg} contains the graphical representation of
  the sulphation modification.
  Figure~\ref{fig:pol-chem-defs-directory-protein-and-saccharide}
  shows two examples of \fileformat{svg} files belonging to two
  distinct polymer chemistry definitions;

\item \filename{chem\_pad.conf}: configuration file for the chemical
  pad in the \xpc\ module;

\item \filename{monomer\_dictionary}: file establishing the relationship
  between any monomer code of the polymer chemistry definition and the
  graphical \fileformat{svg} file to be used to render graphically
  that monomer in the sequence editor;

\item \filename{modification\_dictionary}: file establishing the
  relationship between any monomer modification\footnote{See section
    \ref{subsect:chemical-modification-monomers},
    page~\pageref{subsect:chemical-modification-monomers}.} and the
  graphical \fileformat{svg} file to be used to render graphically
  that modification onto the modified monomer in the sequence editor;

\item \filename{cross\_linker\_dictionary}: file establishing the
  relationship between any cross-link\footnote{See section
    \ref{subsect:monomer-cross-link},
    page~\pageref{subsect:monomer-cross-link}.} and the graphical
  \fileformat{svg} file to be used to render graphically that
  cross-link onto the cross-linked monomers in the sequence editor;

\item \filename{pka\_ph\_pi.xml}: file describing the acido-basic
  data\footnote{See section \ref{sect:acido-basic-calculations},
    page~\pageref{sect:acido-basic-calculations}.}  pertaining to
  ionizable chemical groups in the different entities of the polymer
  chemistry definition;


\end{itemize}

\begin{figure}
  \begin{center}
    \includegraphics [height=0.75\textheight]
    {figures/pol-chem-defs-directory-protein-and-saccharide.png}
  \end{center}
  \caption[The polymer chemistry definition directory]{\textbf{The
      polymer chemistry definition directory.} Each monomer of the
    polymer chemistry definition ought to have a corresponding
    \fileformat{svg} file with which it has to be rendered graphically
    should that monomer be inserted in the polymer sequence. This
    example shows two \fileformat{svg} files corresponding to two
    monomers each belonging to a different polymer chemistry
    definition.}
  \label{fig:pol-chem-defs-directory-protein-and-saccharide}
\end{figure}


\noindent The polymer sequence editor is not a classical editor. There
is no font in this editor: when the user starts keying-in a polymer
sequence in the editor, the small \fileformat{svg} graphics files are
rendered into raster \textit{vignettes} at both the proper resolution
and screen size and displayed in the sequence editor. The user is
totally in charge of designing the \fileformat{svg} graphics files for
each of the monomers defined in the polymer sequence editor. Of
course, reusing material is perfectly possible. There is one
constraint: that the \filename{monomer\_dictionary} file lists with
precision ``what code goes with what \fileformat{svg} graphics
file''. That file has the following contents, for example, for the 
``protein-1-letter'' polymer chemistry definition, as shipped in the
\mXp\ package:

\begin{verbatim}

# This file is part of the massXpert project.

# The "massXpert" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.

# Copyright (C) 2006,2007 Filippo Rusconi

# This is the monomer_dictionary file where the correspondences
# between the codes of each monomer and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made.

# The format of the file is like this :
# -------------------------------------

# A%alanine.svg

# where A is the monomer code and alanine.svg is a
# resolution-independent svg file.

# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file.

# This file is case-sensitive.

A%alanine.svg
C%cysteine.svg
D%aspartate.svg
E%glutamate.svg
F%phenylalanine.svg
G%glycine.svg
H%histidine.svg
I%isoleucine.svg
K%lysine.svg
L%leucine.svg
M%methionine.svg
N%asparagine.svg
P%proline.svg
Q%glutamine.svg
R%arginine.svg
S%serine.svg
T%threonine.svg
V%valine.svg
W%tryptophan.svg
Y%tyrosine.svg

\end{verbatim}


\noindent What one sees from the contents of the file is that each
monomer code has an associated \fileformat{svg} file. For example,
when the user has to key-in a valine monomer, she keys-in the code
\kbdKey{V} and \xpe\ knows that the monomer vignette to show has to be
rendered using the \filename{valine.svg} file.


For the monomer modification graphical rendering, the situation is
somewhat different, as seen in the \filename{modification\_dictionary}
file:

\begin{verbatim}

# This file is part of the massXpert project.

# The "massXpert" project is released ---in its entirety--- under the
# GNU General Public License and was started (in the form of the GNU
# polyxmass project) at the Centre National de la Recherche
# Scientifique (FRANCE), that granted me the formal authorization to
# publish it under this Free Software License.

# Copyright (C) 2006,2007 Filippo Rusconi

# This is the modification_dictionary file where the correspondences
# between the name of each modification and their graphic file (pixmap
# file called "image") used to graphicallly render them in the
# sequence editor are made. Also, the graphical operation that is to
# be performed upon chemical modification of a monomer is listed ('T'
# for transparent and 'O' for opaque). See the manual for details.

# The format of the file is like this :
# -------------------------------------

# Phosphorylation%T%phospho.svg

# where Phosphorylation is the name of the modification. T indicates
# that the visual rendering of the modification is a transparent
# process (O indicates that the visual rendering of the modification
# is a full image replacement 'O' like opaque). phospho.svg is a
# resolution-independent svg file.


# Each line starting with a '#' character is a comment and is ignored
# during parsing of this file. 

# This file is case-sensitive.

Phosphorylation%T%phospho.svg
Sulphation%T%sulpho.svg
AmidationAsp%O%asparagine.svg
Acetylation%T%acetyl.svg
AmidationGlu%O%glutamine.svg
Oxidation%T%oxidation.svg

\end{verbatim}

\noindent There are two ways to render a chemical modification of a
monomer: \medskip

\begin{itemize} 

\item \textbf{Opaque} rendering: the initial monomer vignette is
  replaced using the one listed in the file for the modification. This
  is visible in the \verb|AmidationGlu\%O\%glutamine.svg| line: when a
  monomer is (typically that would be a Glu monomer) is amidated, the
  graphical representation of the modification process should involve
  the \textit{replacement} of the old vignette in the sequence editor
  with the new one (in the example, the new vignette should be
  rendered using the \filename{glutamine.svg} file. In other words,
  the process involves an ``\textbf{O}paque'' overlay of the vignette
  for unmodified Glu with a vignette rendered by using the
  \filename{glutamine.svg} file.

\item textbf{Transparent} rendering: the initial monomer vignette is
  overlaid with one new vignette that is rendered using a
  \fileformat{svg} file that is transparent (unless for the graphical
  motif to be made visible, of course). One example is the
  ``Phosphorylation'' modification (line
  \verb|Phosphorylation%T%phospho.svg|), for which the monomer being
  phosphorylated has its vignette in the sequence editor overlaid with
  a ``\textbf{T}ransparent'' one which only shows a small red 'P' and
  that is rendered using the \filename{phospho.svg} file.

\end{itemize}


\noindent The way new \fileformat{svg} files might be edited is using
the following programs: \medskip

\begin{itemize}

\item \progname{Inkscape}: on \OSname{GNU/Linux} and \OSname{MS-Windows};

\item \progname{Karbon}: on \OSname{GNU/Linux};

\end{itemize}


\noindent In general, the best thing to do is to convert text to path,
so that the rendering is absolutely perfect.

\bigskip

\fbox{\parbox{0.9\textwidth}{It is absolutely essential, for the proper
    working of the sequence editor, that the \fileformat{svg} files be
    square (that is, width = height).}}

\bigskip

Once the new polymer chemistry has been correctly defined, it is time
to register that new definition to the system. To recap: all the files
for that definition should reside in a same directory, exactly the
same way as the files pertaining to a given polymer chemistry
definition are shipped in \mXp\ altogether in one directory. The name
of the new polymer chemistry definition should be unambiguous, with
respect to other registered polymer chemistry definitions.

The way a polymer chemistry definition is registered is by created a
personal polymer chemistry definition catalogue file, which must
comply with two requirements:\medskip

\begin{itemize}

\item Be named \filename{xxxxx-pol-chem-defs-cat}, with
  \filename{xxxxx} being a discretionary string (this might well be
  your name, for example). The requirement is that
  \textbf{\filename{-pol-chem-defs-cat}} be the last part of the
  filename. Please \textit{DO NOT USE} spaces, punctuation or
  diacritical signs in your filenames. \textit{RESTRICT} yourself to
  ASCII characters between [a-z], [0-9], `\_' and `-'.\footnote{This
    is actually something very general as a recommendation in order to
    not suffer from severe headaches when you expect it less\dots}

\item Be located in the \filename{\$HOME/.massxpert/pol-chem-defs}
  directory and have the following format: \smallskip

  \verb|dna=/path/to/definition/directory/dna/dna.xml|. In this
  example, the ``dna'' polymer chemistry definition is being
  registered as a file \filename{dna.xml} located in the
  \filename{dna} directory, itself located in the
  \filename{/path/to/definition/directory} directory;

\end{itemize}

\noindent Note that if a new polymer chemistry definition should be
made available system-wide, then it is logical that its directory be
placed along the ones shipped with \mXp\ and a new local catalogue file
might be created to register the new polymer chemistry definition.

At this point the new polymer chemistry definition might be
tested. Typically, that involves restarting the \mXp\ program and
creating a brand new polymer sequence of the new definition type. The
first step is to check if the new definition is successfully
registered with the system, that is, it should show up a an available
definition upon creation of the new polymer sequence. If not, then that
means that the catalogue file could not be found or parsed
correctly. 

When problems like this one occurs, the first thing to do is to ensure
that the console window (on \OSname{MS-Windows} it is systematically
started along with the program; on \OSname{GNU/Linux} the way to have it is to
start the program from the shell) so as to look with attention at the
different messages that might help understanding what is failing.

Please, do not hesitate to submit bug reports (see the first pages of
this manual for the address where to post bug reports).