File: libGkArrays.texi

package info (click to toggle)
libgkarrays 2.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 1,240 kB
  • ctags: 518
  • sloc: cpp: 4,244; makefile: 344; ansic: 183; sh: 12
file content (400 lines) | stat: -rw-r--r-- 15,014 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
\input texinfo @c -*-texinfo-*-
@documentlanguage en
@documentencoding UTF-8
@c This file uses the @command command introduced in Texinfo 4.0.
@c %**start of header
@setfilename libGkArrays.info
@settitle The Gk-arrays library aims at indexing k-factors from a huge set of sequencing reads.
@finalout
@setchapternewpage odd
@iftex
@afourpaper
@afourwide
@end iftex
@ifinfo
@firstparagraphindent insert
@end ifinfo
@setcontentsaftertitlepage
@c %**end of header

@include package.texi

@set mails @email{nicolas.philippe@@lirmm.fr}, @email{mikael.salson@@lifl.fr}, @email{thierry.lecroq@@univ-rouen.fr}, @email{Martine.Leonard@@univ-rouen.fr}, @email{eric.rivals@@lirmm.fr}
@set authormails Nicolas @sc{Philippe} @email{nicolas.philippe@@lirmm.fr}, Mika@"el @sc{Salson} @email{mikael.salson@@lifl.fr}, Thierry @sc{Lecroq} @email{thierry.lecroq@@univ-rouen.fr}, Martine @sc{L@'eonard} @email{Martine.Leonard@@univ-rouen.fr}, Eric @sc{Rivals} @email{eric.rivals@@lirmm.fr}
@set BUGREPORT @email{crac-gkarrays@@lists.gforge.inria.fr}
@iftex
@clear mails
@clear authormails
@clear BUGREPORT
@set mails <@email{nicolas.philippe@@lirmm.fr}>, <@email{mikael.salson@@lifl.fr}>, <@email{thierry.lecroq@@univ-rouen.fr}>, <@email{Martine.Leonard@@univ-rouen.fr}>, <@email{eric.rivals@@lirmm.fr}>
@set authormails Nicolas @sc{Philippe} <@email{nicolas.philippe@@lirmm.fr}>, Mika@"el @sc{Salson}    <@email{mikael.salson@@lifl.fr}>, Thierry @sc{Lecroq} <@email{thierry.lecroq@@univ-rouen.fr}>, Martine @sc{L@'eonard} <@email{Martine.Leonard@@univ-rouen.fr}>, Eric @sc{Rivals} <@email{eric.rivals@@lirmm.fr}>
@set BUGREPORT <@email{crac-gkarrays@@lists.gforge.inria.fr}>
@end iftex

@ifinfo
This file documents the Gk-Arrays library.

Copyright @copyright{} 2010-2013 -- @sc{IRB}/@sc{INSERM}
                        (Institut de Recherches en Bioth@'erapie /
                        Institut National de la Sant@'e et de la Recherche
                        Médicale),
                         @sc{lifl}/@sc{inria}
                        (Laboratoire d'Informatique Fondamentale de
                         Lille / Institut National de Recherche en
                         Informatique et Automatique),
                        @sc{lirmm}/@sc{cnrs}
                        (Laboratoire d'Informatique, de Robotique et de
                         Micro@'electronique de Montpellier /
                         Centre National de la Recherche Scientifique),
                         @sc{litis}
                        (Laboratoire d'Informatique, du Traitement de
                         l'Information et des Syst@`emes).

Authors: @value{authormails}
          
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

@ignore
Permission is granted to process this file through TeX and print the
results, provided the printed document carries copying permission
notice identical to this one except for the removal of this paragraph
(this paragraph not being relevant to the printed manual).

@end ignore
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation as circulated by @sc{c@'ea}, @sc{cnrs} and @sc{inria} at the
following @sc{url} @url{http://www.cecill.info}.
@end ifinfo

@titlepage
@title The Gk-arrays library
@subtitle The Gk-arrays library aims at indexing k-factors from a huge set of sequencing reads.
@author Nicolas @sc{Philippe}
@author Mika@"el @sc{Salson}
@author Thierry @sc{Lecroq}
@author Martine @sc{L@'eonard}
@author Eric @sc{Rivals}

@page
@vskip 0pt plus 1filll
Copyright @copyright{} 2010-2013 -- @sc{IRB}/@sc{INSERM}
                        (Institut de Recherches en Bioth@'erapie /
                        Institut National de la Sant@'e et de la Recherche
                        Médicale),
                         @sc{lifl}/@sc{inria}
                        (Laboratoire d'Informatique Fondamentale de
                         Lille / Institut National de Recherche en
                         Informatique et Automatique),
                        @sc{lirmm}/@sc{cnrs}
                        (Laboratoire d'Informatique, de Robotique et de
                         Micro@'electronique de Montpellier /
                         Centre National de la Recherche Scientifique),
                         @sc{litis}
                        (Laboratoire d'Informatique, du Traitement de
                         l'Information et des Syst@`emes).

Authors: @value{authormails}

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation as circulated by @sc{c@'ea}, @sc{cnrs} and @sc{inria} at the
following @sc{url} @url{http://www.cecill.info}.
@end titlepage
@headings on

@c Before using the EMACS texinfo-every-node-update (C-c C-u C-e)
@c and texinfo-all-menus-update (C-c C-u C-a) commands, follow these
@c instructions:
@c  - Go to the node Copying and uncomment its @chapter line.
@c  - Go to the node Table of contents and uncomment its @unnumbered
@c    line.
@c  - Do the update
@c  - Go to the node Copying and comment again its @chapter line.
@c  - Go to the node Table of contents and comment again its
@c    @unnumbered line.
@c  - Remove the inserted menu (a few lines below):
@c      @menu
@c      * Instructions::
@c      * Copying::
@c          ...
@c      * Table of contents::
@c      @end menu
@c  - Replace the line (just below):
@c    '@node Top, Top, (dir), (dir)' by
@c    '@node Top, Instructions'. 
@c  - Replace the line (near the end):
@c    '@node Problems, Table of contents, Copying, Top' by
@c    '@node Problems, Concept Index, Copying, Top'. 
@c  - Replace the line in the @ifinfo block (near the end):
@c    '@node Concept Index, Table of contents, Problems, Top' by
@c    '@node Concept Index, , Problems, Top'.
@c  - Replace the line (near the end):
@c    '@node Table of contents,  , Problems, Top' by
@c    '@node Table of contents,  , Concept Index, Top'.

@c All the nodes can be updated using the EMACS command
@c texinfo-every-node-update, which is normally bound to C-c C-u C-e.
@ifnotinfo
@node Top, Instructions
@end ifnotinfo

@ifinfo
@node Top, Instructions, (dir), (dir)
@end ifinfo

@ifhtml
@top Documentation of the Gk-Arrays library version @value{VERSION}
@end ifhtml
@ifnottex
@ifnothtml
@top

Documentation for the version @value{VERSION} of the Gk-Arrays library.
@end ifnothtml
@end ifnottex

@c All the menus can be updated with the EMACS command
@c texinfo-all-menus-update, which is normally bound to C-c C-u C-a.
@c * Table of Contents::		Contents of this manual. 
@menu
* Instructions::                How to Read This Manual
* Developing::                  How to develop using the Gk-Arrays library
* Copying::                     How you can copy and share the Gk-Arrays library
* Problems::                    Reporting Bugs
* Concept Index::               Concept Index
@ifnotinfo
* Table of contents::           Table of Contents
@end ifnotinfo
@end menu

@node Instructions, Developing, Top, Top
@chapter How to Read This Manual

@cindex reading
@cindex manual, how to read
@cindex how to read
@itemize
@item
In the Developing section, we explain how to use the Gk-arrays library in
your code.
@item
You will find in the Copying section the CeCILL-C License.
@end itemize

@node Developing, Copying, Instructions, Top
@chapter Developing

@cindex library
@cindex development
@cindex code
@cindex class
@cindex method
@cindex compilation
The Gk-arrays library is dedicated to indexing millions of reads from 
high-throughput sequencing experiments.
This library can index variable-length (as produced by Roche 454) or 
fixed-length reads.
It provides several functions to find reads or occurrences in reads of 
@math{k}-mers where @math{k} is specified at index creation.

@noindent If you use Gk-arrays, please don't forget to cite:@*
N. Philippe, M. Salson, T. Lecroq, M. L@'eonard, T. Commes, @'E. Rivals. Querying large read collections in main memory: a versatile data structure. In @i{BMC Bioinformatics}, 2011, doi:10.1186/1471-2105-12-242.

@section Installing Gk-arrays
@subsection Installation from the source code
@enumerate
@item 
Unpack the archive
@item
Enter the directory libGkArrays-@i{version-number}
@item
Type @t{./configure}
@item
If everything went fine, run @t{make}
@item
To install the library on your machine, type @t{make install} as an administrator
@item
Afterwards, you may want to run @t{ldconfig} as an administrator
@end enumerate

You can specify parameters to the configure script.
For instance you can choose to build a static version (quicker) of the library
rather than a shared version. Typing @t{./configure --help} will
provide you the list of available options.

@subsection Installation from the @t{deb} package
You just need to install the package using a dedicated program on your
distribution or by typing @t{dpkg -i @i{package-name}}
This will install the library and source file headers.

@section Using Gk-arrays in your code

First, you need to include the header file that defines all the functions
in the Gk-arrays:
@verbatim
#include <libGkArrays/gkArrays.h>
@end verbatim

The Gk-arrays consist of a @t{C++} class. To build an index
on a new collection of reads, you first need to build a new object from
that class.
The class is called @t{gkArrays} in the namespace @t{gkarrays}.

The constructor of that class has two compulsory parameters and one optional:
@enumerate
@item
The name of the file containing the reads in raw, FASTA, FASTQ format.
The file can also be gunzipped.
@item
The length of the @math{k}-mer you want to use, @i{ie} the value of @math{k}.
@item
(optional) The length of the reads. The length of the reads is automatically 
detected but you may want to specify a shorter length (cannot be larger than
the actual length).
@end enumerate
Hence, if filename holds the name of a file and 10 is the desired value of @math{k}, a valid index creation would be:
@verbatim
gkarrays::gkArrays *reads = new gkarrays::gkArrays(filename, 10);
@end verbatim

Then the main methods are (but you can report to the full documentation in 
the @t{docs/documentation} directory of the project or to the online 
documentation @url{http://crac.gforge.inria.fr/gkarrays/doc}):

@multitable @columnfractions .4 .6
@headitem Method @tab Purpose
@item @t{getNbTags()} @tab Number of reads indexed
@item @t{getTag(i)} @tab Return a @t{char*} containing the read itself
@item @t{getTagNumWithFactor(r, l)} @tab Read numbers (indexed from 0) containing the same @math{k}-mer as in read @t{r} at position @t{l}.
@item @t{getTagsWithFactor(r, l)} @tab Same as above but return the occurrences
as a pair (read number, position in the read).
@item @t{getNbTagsWithFactor(r, l, m)} @tab Return the number of occurrences
returned by @t{getTagsWithFactor(r, l)} (if @t{m} is @t{true}) or by
@t{getTagNumWithFactor(r, l)} (if @t{m} is @t{false}).
@end multitable


@section Compiling your code

At compilation time, you need to specify that you are using the 
Gk-arrays library.
This can be done by specifying @t{-lGkArrays} during the linking.
You may have to specify where the library is installed using the @t{-L} flag
if you didn't install the library in the default library directory.


@section Example

@verbatim
#include <iostream>
#include <cstdlib>
#include <libGkArrays/gkArrays.h>
#include <cstdio>

int main(int argc, char **argv) {
  
  if (argc < 2) {
    std::cerr << "Usage: " << argv[0] << " filename" << std::endl;
    exit(1);
  }

  // Default k-mer length
  int k = 5;
  // Building the index
  gkarrays::gkArrays *reads = new gkarrays::gkArrays(argv[1], k);
  // Retrieving the first k-mer of the first read
  char *kmer = reads->getTagFactor(0, 0, k);

  // Displaying the number of indexed reads
  printf("%d reads in the collection\n", reads->getNbTags());
  
  // Displaying the number of reads with a given k-mer
  printf("%d read(s) share the k-mer %s\n",
         reads->getNbTagsWithFactor(0, 0, false),
         kmer);

  // Displaying the read numbers that contain that specific k-mer
  printf("Read containing the occurrences:\n");
  uint *read_occurrences = reads->getTagNumWithFactor(0, 0);
  for (uint i = 0; i < reads->getNbTagsWithFactor(0, 0, false); i++) 
    printf("%d\t", read_occurrences[i]);
  printf("\n");

  // Displaying the occurrences (read number, position in the read)
  // for that specific k-mer.
  printf("All the occurrences:\n");
  std::pair<uint, uint> *occurrences = reads->getTagsWithFactor(0, 0);
  for (uint i = 0; i < reads->getNbTagsWithFactor(0, 0, true); i++) {
    printf("(%d,%d)\t", occurrences[i].first, occurrences[i].second);
  }
  printf("\n");

  // Free-ing memory.
  free(read_occurrences);
  delete [] occurrences;
  delete reads;
  delete [] kmer;
  exit(0);  
}
@end verbatim

If you store this example in a file called @t{test.cpp}, you can compile it 
with @t{g++  -Wall -pedantic -O3 test4.cpp -o test4 -lGkArrays}.
Note that a more complete example is given in the @t{src} directory in the 
file @t{buildTables.cpp}.

@node Copying, Problems, Developing, Top
@c @chapter How you can copy and share the Gk-arrays library.
@include CeCILL-C.texi

@node Problems, Concept Index, Copying, Top
@chapter Reporting Bugs
@cindex bugs
@cindex problems

If you find a bug in Gk-arrays library, please send
electronic mail to @value{BUGREPORT}.  Include the version
number. Also include in your message the output that the program
produced and the output you expected.@refill

If you have other questions, comments or suggestions about the
GkArrays library, contact the authors via electronic mail to
@value{BUGREPORT}.  The authors will try to help you out, although
they may not have time to fix your problems.

@ifinfo
@node Concept Index, , Problems, Top
@end ifinfo
@ifnotinfo
@node Concept Index, Table of contents, Problems, Top
@end ifnotinfo
@unnumbered Concept Index

@cindex tail recursion
@printindex cp

@ifnotinfo
@node Table of contents,  , Concept Index, Top
@c @unnumbered Table of contents
@contents
@end ifnotinfo
@bye