File: microbiomeutil.html

package info (click to toggle)
microbiomeutil 20101212%2Bdfsg1-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 49,284 kB
  • sloc: perl: 4,878; ansic: 419; makefile: 98; sh: 27
file content (550 lines) | stat: -rw-r--r-- 22,511 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="generator" content="AsciiDoc 8.2.6" />
<style type="text/css">
/* Debug borders */
p, li, dt, dd, div, pre, h1, h2, h3, h4, h5, h6 {
/*
  border: 1px solid red;
*/
}

body {
  margin: 1em 5% 1em 5%;
}

a {
  color: blue;
  text-decoration: underline;
}
a:visited {
  color: fuchsia;
}

em {
  font-style: italic;
  color: navy;
}

strong {
  font-weight: bold;
  color: #083194;
}

tt {
  color: navy;
}

h1, h2, h3, h4, h5, h6 {
  color: #527bbd;
  font-family: sans-serif;
  margin-top: 1.2em;
  margin-bottom: 0.5em;
  line-height: 1.3;
}

h1, h2, h3 {
  border-bottom: 2px solid silver;
}
h2 {
  padding-top: 0.5em;
}
h3 {
  float: left;
}
h3 + * {
  clear: left;
}

div.sectionbody {
  font-family: serif;
  margin-left: 0;
}

hr {
  border: 1px solid silver;
}

p {
  margin-top: 0.5em;
  margin-bottom: 0.5em;
}

ul, ol, li > p {
  margin-top: 0;
}

pre {
  padding: 0;
  margin: 0;
}

span#author {
  color: #527bbd;
  font-family: sans-serif;
  font-weight: bold;
  font-size: 1.1em;
}
span#email {
}
span#revision {
  font-family: sans-serif;
}

div#footer {
  font-family: sans-serif;
  font-size: small;
  border-top: 2px solid silver;
  padding-top: 0.5em;
  margin-top: 4.0em;
}
div#footer-text {
  float: left;
  padding-bottom: 0.5em;
}
div#footer-badges {
  float: right;
  padding-bottom: 0.5em;
}

div#preamble,
div.tableblock, div.imageblock, div.exampleblock, div.verseblock,
div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
div.admonitionblock {
  margin-right: 10%;
  margin-top: 1.5em;
  margin-bottom: 1.5em;
}
div.admonitionblock {
  margin-top: 2.5em;
  margin-bottom: 2.5em;
}

div.content { /* Block element content. */
  padding: 0;
}

/* Block element titles. */
div.title, caption.title {
  color: #527bbd;
  font-family: sans-serif;
  font-weight: bold;
  text-align: left;
  margin-top: 1.0em;
  margin-bottom: 0.5em;
}
div.title + * {
  margin-top: 0;
}

td div.title:first-child {
  margin-top: 0.0em;
}
div.content div.title:first-child {
  margin-top: 0.0em;
}
div.content + div.title {
  margin-top: 0.0em;
}

div.sidebarblock > div.content {
  background: #ffffee;
  border: 1px solid silver;
  padding: 0.5em;
}

div.listingblock {
  margin-right: 0%;
}
div.listingblock > div.content {
  border: 1px solid silver;
  background: #f4f4f4;
  padding: 0.5em;
}

div.quoteblock > div.content {
  padding-left: 2.0em;
}

div.attribution {
  text-align: right;
}
div.verseblock + div.attribution {
  text-align: left;
}

div.admonitionblock .icon {
  vertical-align: top;
  font-size: 1.1em;
  font-weight: bold;
  text-decoration: underline;
  color: #527bbd;
  padding-right: 0.5em;
}
div.admonitionblock td.content {
  padding-left: 0.5em;
  border-left: 2px solid silver;
}

div.exampleblock > div.content {
  border-left: 2px solid silver;
  padding: 0.5em;
}

div.verseblock div.content {
  white-space: pre;
}

div.imageblock div.content { padding-left: 0; }
div.imageblock img { border: 1px solid silver; }
span.image img { border-style: none; }

dl {
  margin-top: 0.8em;
  margin-bottom: 0.8em;
}
dt {
  margin-top: 0.5em;
  margin-bottom: 0;
  font-style: normal;
}
dd > *:first-child {
  margin-top: 0.1em;
}

ul, ol {
    list-style-position: outside;
}
div.olist > ol {
  list-style-type: decimal;
}
div.olist2 > ol {
  list-style-type: lower-alpha;
}

div.tableblock > table {
  border: 3px solid #527bbd;
}
thead {
  font-family: sans-serif;
  font-weight: bold;
}
tfoot {
  font-weight: bold;
}

div.hlist {
  margin-top: 0.8em;
  margin-bottom: 0.8em;
}
div.hlist td {
  padding-bottom: 15px;
}
td.hlist1 {
  vertical-align: top;
  font-style: normal;
  padding-right: 0.8em;
}
td.hlist2 {
  vertical-align: top;
}

@media print {
  div#footer-badges { display: none; }
}

div#toctitle {
  color: #527bbd;
  font-family: sans-serif;
  font-size: 1.1em;
  font-weight: bold;
  margin-top: 1.0em;
  margin-bottom: 0.1em;
}

div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
  margin-top: 0;
  margin-bottom: 0;
}
div.toclevel2 {
  margin-left: 2em;
  font-size: 0.9em;
}
div.toclevel3 {
  margin-left: 4em;
  font-size: 0.9em;
}
div.toclevel4 {
  margin-left: 6em;
  font-size: 0.9em;
}
/* Workarounds for IE6's broken and incomplete CSS2. */

div.sidebar-content {
  background: #ffffee;
  border: 1px solid silver;
  padding: 0.5em;
}
div.sidebar-title, div.image-title {
  color: #527bbd;
  font-family: sans-serif;
  font-weight: bold;
  margin-top: 0.0em;
  margin-bottom: 0.5em;
}

div.listingblock div.content {
  border: 1px solid silver;
  background: #f4f4f4;
  padding: 0.5em;
}

div.quoteblock-content {
  padding-left: 2.0em;
}

div.exampleblock-content {
  border-left: 2px solid silver;
  padding-left: 0.5em;
}

/* IE6 sets dynamically generated links as visited. */
div#toc a:visited { color: blue; }

/* Because IE6 child selector is broken. */
div.olist2 ol {
  list-style-type: lower-alpha;
}
div.olist2 div.olist ol {
  list-style-type: decimal;
}
</style>
<title>Microbiome Utilities Portal of the Broad Institute</title>
</head>
<body>
<div id="header">
<h1>Microbiome Utilities Portal of the Broad Institute</h1>
</div>
<div id="preamble">
<div class="sectionbody">
<div class="para"><p><span class="image">
<img src="images/broad-hmp-banner.gif" alt="Broad HMP logo" title="Broad HMP logo" width="800" />
</span></p></div>
<div class="para"><p>The Human Microbiome Project (HMP) is an exciting Roadmap initiative funded by the National Institutes of Health (NIH). The goal of the project is to understand how the microbial communities inhabiting our bodies contribute to normal human health, development, and disease (<a href="http://nihroadmap.nih.gov/hmp/">http://nihroadmap.nih.gov/hmp</a>).</p></div>
<div class="para"><p>The Broad Institute (<a href="http://www.broadinstitute.org">http://www.broadinstitute.org</a>) was launched in 2004 with the visionary philanthropic investment of Eli and Edythe Broad, who joined with leaders at Harvard and its affiliated hospitals, MIT, and the Whitehead Institute to pioneer a "new model” of collaborative science. The Broad Institute is organized as a transparent infrastructure that allows biology- and technology-focused scientists to work together to identify and overcome the most critical obstacles to realizing the full promise of genomic medicine.</p></div>
<div class="para"><p>The Broad Institute aggressively advances sequence-based technologies and the bioinformatics necessary to characterize the vast complexity of the human microbiome. In keeping with our mission, we make the microbiome analysis utilities developed by the Broad Institute available to the community in order to promote further innovation and collaborative research efforts. We appreciate your feedback.</p></div>
<div class="para"><p>The utilities developed by the Broad Institute and provided here apply to a range of challenges posed by the microbiome initiative, including:</p></div>
<div class="ilist"><ul>
<li>
<p>
Sequence alignment (<a href="#A_NASTiEr">NAST-iEr</a>)
</p>
</li>
<li>
<p>
Chimera detection (<a href="#A_CS">ChimeraSlayer</a>, <a href="#A_WigeoN">WigeoN</a>)
</p>
</li>
<li>
<p>
Operational taxonomic unit OTU binning (<a href="#A_TreeChopper">TreeChopper</a>)
</p>
</li>
<li>
<p>
Sequence assembly (<a href="#A_AMOScmp">AmosCmp16Spipeline</a>)
</p>
</li>
</ul></div>
<div class="admonitionblock">
<table><tr>
<td class="icon">
<div class="title">Note</div>
</td>
<td class="content">ChimeraSlayer, WigeoN, NAST-iEr, and the database of reference 16S sequences are provided as a single co-dependent <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">download</a>.  Sample data and usage instructions are included.</td>
</tr></table>
</div>
</div>
</div>
<h2 id="_microbiome_analysis_utilities">Microbiome Analysis Utilities</h2>
<div class="sectionbody">
<h3 id="A_CS">ChimeraSlayer</h3><div style="clear:left"></div>
<div class="para"><p>ChimeraSlayer  <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">(download)</a> is a chimeric sequence detection utility, compatible with near-full length Sanger sequences and shorter 454-FLX sequences (~500 bp).</p></div>
<div class="para"><p>Chimera Slayer involves the following series of steps that operate to flag chimeric 16S rRNA sequences: (A) the ends of a query sequence  are searched against an included database of reference chimera-free 16S sequences to identify potential parents of a chimera; (B) candidate parents of a chimera are selected as those that form a branched best scoring alignment to the NAST-formatted query sequence; &#169; the NAST alignment of the query sequence is improved in a ‘chimera-aware’ profile-based NAST realignment to the selected reference parent sequences; and (D) an evolutionary framework is used to flag query sequences found to exhibit greater sequence homology to an in silico chimera formed between any two of the selected reference parent sequences.</p></div>
<div class="para"><p>To run Chimera Slayer, you need NAST-formatted sequences generated by the included <a href="#A_NASTiEr">NAST-iEr</a> utility.  Given NAST-formatted sequences, run ChimeraSlayer like so:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>%microbiomeutil/ChimeraSlayer/ChimeraSlayer.pl  --query_NAST  ${sequences}.NAST</tt></pre>
</div></div>
<div class="para"><p>The output files include the following:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>${sequences}.NAST.CPS                      :results from the chimera parent selection step
${sequences}.NAST.CPS_RENAST               :NAST alignments from a 'chimera-aware' realignment of the query
${sequences}.NAST.CPS.CPC                  :results from the chimera 'phylo-checker' step  ** the Chimera Slayer final verdict **
${sequences}.NAST.CPS.CPC.wTaxons          :the taxonomy of the reference (step)parents of the chimera</tt></pre>
</div></div>
<div class="para"><p>The .CPC output file is tab-delimited with the following fields:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>0      ChimeraSlayer
1      chimera_AJ007403            # the accession of the chimera query
2      S000387216                  # reference parent A
3      S000001688                  # reference parent B
4      0.9422                      # divergence ratio of query to chimera (left_A, right_B)
5      90.00                       # percent identity between query and chimera(left_A, right_B)
6      0                           # confidence in query as a chimera related to (left_A, right_B)
7      1.0419                      # divergence ratio of query to chimera (right_A, left_B)
8      99.52                       # percent identity between query and chimera(right_A, left_B)
9      100                         # confidence in query as a chimera related to (right_A, left_B)
10     YES                         # ** verdict as a chimera or not **
11     NAST:4032-4033              # estimated approximate chimera breakpoint in NAST coordinates
12     ECO:767-768                 # estimated approximate chimera breakpoint according to the E. coli unaligned reference seq coordinates</tt></pre>
</div></div>
<div class="para"><p>For those query sequences flagged as chimeras, the .wTaxons file includes the following extra columns:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>13      Rhodococcus                                                                # genus name of Parent A
14      Rhodococcus koreensis (T); DNP505; AF124342 Rhodococcus koreensis          # descriptive info for Parent A
15      Streptomyces                                                               # genus name of Parent B
16      Streptomyces somaliensis (T); DSM 40738; AJ007403 Streptomyces somaliensis # descriptive info for Parent B
17      INTRA-ORDER                                                                # type of chimera based on selected parents</tt></pre>
</div></div>
<div class="admonitionblock">
<table><tr>
<td class="icon">
<div class="title">Note</div>
</td>
<td class="content">It is <strong>not</strong> recommended to blindly discard all sequences flagged as chimeras.  Some may represent naturally formed chimeras that do not represent PCR artifacts.   Sequences flagged may warrant further investigation.</td>
</tr></table>
</div>
<div class="para"><p>If you use the &#8212;printCSalignments option, a diagram of the query matching the parents on both sides of the breakpoint is included in the output.  For example:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>Per_id parents: 89.52</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>          Per_id(Q,A): 94.00
--------------------------------------------------- A: S000387216
88.65                                99.06
~~~~~~~~~~~~~~~~~~~~~~~~\ /~~~~~~~~~~~~~~~~~~~~~~~~ Q: chimera_AJ007403
DivR: 0.942 BS: 0.00     |
Per_id(QLA,QRB): 90.00   |
                         |
   (L-AB: 88.65)         |      (R-AB: 90.34)
   WinL:0-704            |      WinR:705-1449
                         |
Per_id(QLB,QRA): 99.52   |
DivR: 1.042 BS: 100.00   |
~~~~~~~~~~~~~~~~~~~~~~~~/ \~~~~~~~~~~~~~~~~~~~~~~~~~ Q: chimera_AJ007403
100.00                                91.28
---------------------------------------------------- B: S000001688
           Per_id(Q,B): 95.52</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>DeltaL: -11.35                   DeltaR: 7.79</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
GGAGGCTCGTACCGCTGTCTTGTTAAGGACTGGTTTTTTACTGTCTATACAGACTCTTCA  A: S000387216
AAGACGCTTGGGTTTCACTCCTGCGCTTCGGCCGGGCCCGGCACTCGCCACAGTCTCGAG  Q: chimera_AJ007403
AAGACGCTTGGGTTTCACTCCTGCGCTTCGGCCGGGCCCGGCACTCGCCACAGTCTCGAG  B: S000001688</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>!!!!!!!!!!!!!!!!!!!!
TACTACTGGATATCCTGATA  A: S000387216
CGTCGTCTTGATGTTCACAT  Q: chimera_AJ007403
CGTCGTCTTGATGTTCACAT  B: S000001688</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>** Breakpoint **</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>                           !!!!!!!
TGCGTTCGGATCGATTGTTGCCGTACGCTGTGTCGATTAAAGGTAATCATAAGGGCTTTC  A: S000387216
TGCGTTCGGATCGATTGTTGCCGTACGCCTGTGTCATTAAAGGTAATCATAAGGGCTTTC  Q: chimera_AJ007403
GTAACGATCGCTTCCAACCCATCCGGTGCTGTGTCGCCGGGCACGGCTTGGGAATTAACT  B: S000001688
!!!!!!!!!!!!!!!!!!!!!!!!!!!!       !!!!!!!!!!!!!!!!!!!!!!!!!</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>GACTTACGACTC  A: S000387216
GACTTACGACTC  Q: chimera_AJ007403
ATTCCCAAGTCT  B: S000001688
!!!!!!!!!!!!</tt></pre>
</div></div>
<div class="para"><p>The above indicates the percent identities between the alignment segments corresponding to query and either parent.  Since chimeras can occur two ways: (left parent A &amp; right parent B) or (left parent B &amp; right parent A), a fork diagram is shown with the statistics for each potential chimera as it relates to the query sequence.  The bootstrap (BS) values indicate the confidence level for the corresponding chimera type.  The informative SNP positions from the complete alignments are shown for both sides of the breakpoint.</p></div>
<h3 id="A_WigeoN">WigeoN</h3><div style="clear:left"></div>
<div class="para"><p>WigeoN <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">(download)</a> examines the sequence conservation between a query and a trusted reference sequence, both in NAST alignment format.  Based on the sequence identity between the query and the reference sequence, there is an expected amount of variation among the alignment. If the observed variation is greater than the 95% quantile of the distribution of variation observed between non-anomalous sequences, then it is flagged as an anomaly.</p></div>
<div class="para"><p>WigeoN is a flexible command-line based reimplementation of the <a href="http://www.bioinformatics-toolkit.org/Pintail/">Pintail</a> algorithm <a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=16332745">Appl Environ Microbiol. 2005 Dec;7112:7724-36</a>.</p></div>
<div class="para"><p>WigeoN is useful for flagging chimeras and anomalies <strong>only in near full-length 16S rRNA sequences</strong>.  WigeoN lacks sensitivity with sequences less than 1000 bp.</p></div>
<div class="para"><p>To run WigeoN, you need NAST-formatted sequences generated by the included &lt;&lt;A_NASTiEr, NAST-iEr&gt; utility.  Given NAST-formatted sequences, run WigeoN like so:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>%microbiomeutil/WigeoN/run_WigeoN.pl --query_NAST ${sequences}.NAST  &gt;  ${sequences}.WigeoN</tt></pre>
</div></div>
<div class="para"><p>The output is tab-delimited like so:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>0       chimera_AJ007403       # query sequence
1       S000387216             # best matching reference sequence
2       div:
3       5.45                   # percent sequence divergence between the query and the reference sequence
4       stDev:
5       4.01                   # standard deviation from expected reference sequence divergence across alignment windows
6       Quant95:Yes            # stDev is in the top 5% of stDev values observed among reference sequences at that same mean divergence
7       Quant99:YES            # top 1%  *** This value is recommended for flagging aberrant sequences ***
8       Quant99.9:No           # top 0.1%
9       Quant99.99:No          # top 0.01%</tt></pre>
</div></div>
<h3 id="A_NASTiEr">NAST-iEr</h3><div style="clear:left"></div>
<div class="para"><p>The NAST-iEr alignment utility <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">(download)</a> aligns a single raw nucleotide sequence against one or more NAST formatted sequences.</p></div>
<div class="para"><p>The alignment algorithm involves global dynamic programming profile alignment to fixed (NAST-formatted) multiply aligned template sequences without any end-gap penalty.</p></div>
<div class="para"><p>Run it like so, using a set of fasta-formatted sequences.</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>% microbiomeutil/NAST-iEr/run_NAST-iEr.pl --query_FASTA ${sequences}.fasta  &gt; ${sequences}.NAST</tt></pre>
</div></div>
<h3 id="A_AMOScmp">AmosCmp16Spipeline</h3><div style="clear:left"></div>
<div class="para"><p>AmosCmp16Spipeline <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">(download)</a> uses the AMOScmp software to assemble multiple, potentially overlapping 16S rRNA sequencing reads based on read mappings to a reference 16S rRNA gene.</p></div>
<div class="para"><p>Given the following inputs:
-fasta file containing sequencing reads
-file containing the corresponding qual values
-file enumerating the accessions corresponding to reads of the same clone individual assembly tasks
-a reference database of 16S rRNA sequences</p></div>
<div class="para"><p>The single reference sequence that best matches all the reads is chosen.  Lucy is used to trim the sequence reads of low quality termini. An additional homology-trimming operation is performed to exclude regions of the sequence that lack homology to the reference.  The resulting trimmed reads and quality values are used to generate a sequence assembly using the AMOScmp software.  A scaffold sequence is generated, where Ns are used to fill in gaps according to estimated gap sizes based on reference sequence anchoring, and quality values are reported according to the scaffold sequence. A README file containing instructions and sample data are provided.</p></div>
<h3 id="A_TreeChopper">TreeChopper</h3><div style="clear:left"></div>
<div class="para"><p>TreeChopper <a href="http://sourceforge.net/project/showfiles.php?group_id=262346">(download)</a> clusters tree leaf nodes according to phylogenetic distance.</p></div>
<div class="para"><p>A graph is constructed from the tree like so:  all leaves are visited, and from each leaf, all neighboring leaves within a specified distance threshold are added to a graph with an edge placed between them.  After building this graph, each edge connecting pairs of nodes is examined and a Jaccard similarity coefficient is computed (see <a href="http://www.biomedcentral.com/1741-7007/3/7">http://www.biomedcentral.com/1741-7007/3/7</a> for details).  Those edges that loosely connect nodes as defined by this similarity coefficient are removed.  The nodes connected by the remaining edges are clustered by transitive closure (single linkage clustering) and reported as OTUs.</p></div>
<div class="para"><p>The minimum phylogenetic distance between clustered nodes, and the minimum similarity coefficient between nodes in the graph are tuneable parameters. A README file containing instructions and sample data are provided.</p></div>
</div>
<h2 id="_miscellaneous_remarks">Miscellaneous Remarks</h2>
<div class="sectionbody">
<div class="ilist"><ul>
<li>
<p>
The bacterial 16S rRNA is the primary target of the ChimeraSlayer, WigeoN, and NAST-iEr utilities.  Ultimately, we'd like to have a version that operates on eukaryotic 18S sequences as well.
</p>
</li>
</ul></div>
</div>
<h2 id="_questions_comments_etc">Questions, comments, etc?</h2>
<div class="sectionbody">
<div class="para"><p>Contact Brian Haas (bhaas at broadinstitute dot org)</p></div>
</div>
<div id="footer">
<div id="footer-text">
Last updated 2010-10-31 12:59:13 EDT
</div>
</div>
</body>
</html>