File: building_a_tree_of_life.rst

package info (click to toggle)
python-cogent 1.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 16,424 kB
  • ctags: 24,343
  • sloc: python: 134,200; makefile: 100; ansic: 17; sh: 10
file content (275 lines) | stat: -rw-r--r-- 34,081 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
***********************
Building a tree of life
***********************

.. authors, Greg Caporaso

Building a tree of life with PyCogent
======================================

This cookbook example runs through how to construct construct a tree of life from 16S rRNA sequences to test whether the three domains of life are visible as three separate clusters in a phylogenetic tree. This example covers compiling sequences, building a multiple sequence alignment, building a phylogenetic tree from that sequence alignment, and visualizing the tree. 

Step 0. Set up your python environment
--------------------------------------

For this tutorial you'll need cogent, muscle, and FastTree installed on your system.

Start an interactive python session by entering the following into a command terminal::

	python

You should now see the python command prompt::

	>>>

Step 1: Download sequences from NCBI
------------------------------------

Here we'll work with archaeal, bacteria, and eukaryotic sequences obtained from NCBI using the PyCogent EUtils wrappers. Run the following commands to obtain these sequences::

	from cogent.db.ncbi import EUtils
	from cogent.parse.fasta import MinimalFastaParser
	e = EUtils()
	arc16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND archaea[orgn]']))
	bac16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND bacteria[orgn]']))
	euk16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND eukarya[orgn]']))

You can check how many sequences you obtained for each query by running::

	len(arc16s)
	len(bac16s)
	len(euk16s)



.. note:: In this example you'll notice that you have relatively few sequences for each query. You'd obtain many more if you replaced the ``rRNA`` in the query with ``ribosomal RNA``, but the runtime would also be significantly longer. For the purpose of these tutorial we'll therefore stick with this command that returns fewer sequences.

Step 2: Load the sequences
--------------------------

We'll begin by loading the sequences that have been downloaded, applying a filter to retain only those that we consider to be of good quality. Sequences fewer than 750 bases or sequences containing one or more ``N``  characters will be ignored (``N`` characters typically represent ambiguous base calls during sequencing).

First, define a function to load and filter the sequences::

	from cogent.parse.fasta import MinimalFastaParser
	
	def load_and_filter_seqs(seqs, domain_label):
	    result = []
	    for seq_id, seq in seqs:
	        if len(seq) > 750 and seq.count('N') < 1:
	            result.append((domain_label + seq_id,seq))
	    return result

Next, load and filter the three sequence sets::

	arc16s_filtered = load_and_filter_seqs(arc16s,'A: ')
	bac16s_filtered = load_and_filter_seqs(bac16s,'B: ')
	euk16s_filtered = load_and_filter_seqs(euk16s,'E: ')
	
	len(arc16s_filtered)
	len(bac16s_filtered)
	len(euk16s_filtered)


Step 3: Select a random subset of the sequences
-----------------------------------------------

Import shuffle from the random module to extract a random collection of sequences::

	from random import shuffle
	shuffle(arc16s_filtered)
	shuffle(bac16s_filtered)
	shuffle(euk16s_filtered)

Select some random sequences from each domain. Note that only a few sequences are chosen to facilitate a quick analysis::

	combined16s = arc16s_filtered[:3] + bac16s_filtered[:10] + euk16s_filtered[:6]
	len(combined16s)

Step 4: Load the sequences into a SequenceCollection object
-----------------------------------------------------------

Use ``LoadSeqs`` to load the unaligned sequences into a ``SequenceCollection`` object. In this step we'll rename the sequences (by passing a ``label_to_name`` function) to only the accession number for the sequence. This facilitates visualization in downstream steps.

::

	from cogent import LoadSeqs, DNA
	seqs = LoadSeqs(data=combined16s,moltype=DNA,aligned=False,label_to_name=lambda x: '|'.join(x.split('|')[:2]))

You can explore some properties of this sequence collection. For example, you can count how many sequences are in the sequence collection object::

	seqs.getNumSeqs()

.. _step5:

Step 5: Align the sequences using muscle
----------------------------------------

Load an aligner function, and align the sequences. Here we'll align with muscle via the muscle application controller. The sequences will be loaded into an ``Alignment`` object called ``aln``.
::

	from cogent.app.muscle import align_unaligned_seqs
	aln = align_unaligned_seqs(seqs,DNA)

Step 6: Build a tree from the alignment using FastTree
------------------------------------------------------

Load a tree-building function, and build a tree from the alignment. Here we'll use FastTree. The tree will be stored in a ``PhyloNode`` object called ``tree``.
::

	from cogent.app.fasttree import build_tree_from_alignment
	tree = build_tree_from_alignment(aln,DNA)

Step 7: Visualize the tree
------------------------------------------

Load a drawing function to generate a prettier picture of the tree::

	from cogent.draw.dendrogram import UnrootedDendrogram 
	dendrogram = UnrootedDendrogram(tree)

Have a quick look at the unrooted dendrogram::

	dendrogram.showFigure()

You should see something like this:

	.. image:: ../images/tol_not_gap_filtered.png

Figure 1: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.


Step 8: Save the tree as a PDF
-------------------------------

Finally, you can save this tree as a PDF for sharing or later viewing::

	dendrogram.drawToPDF('./tol.pdf')

You can also write the alignment and tree to fasta and newick files, respectively. You can then load these in tools such as `BoulderALE <http://www.microbio.me/boulderale/>`_ (for alignment editing) or `TopiaryExplorer <http://topiaryexplorer.sourceforge.net/>`_ or `FigTree <http://tree.bio.ed.ac.uk/software/figtree/>`_ (for tree viewing, coloring, and layout manipulation).

::

	open('./tol.fasta','w').write(aln.toFasta())
	open('./tol.tre','w').write(tree.getNewick(with_distances=True))



Extra credit: Alignment filtering
---------------------------------

Filter highly gapped positions from the alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To try to improve the quality of the alignment and therefore the tree, it's often a good idea to removed positions that contain a high proportion of gap characters from the alignment. These generally represent non-homologous regions of the sequence of interest, and therefore contribute little to our understanding of the evolutionary history of the sequence. These steps may result in a clearer delineation of the three domains on your tree, but the results will in part be dependent on the randomly chosen sequences in your alignment.

To remove positions that are greater than 10% gap characters from the alignment, run the following command::

	gap_filtered_aln = aln.omitGapPositions(allowed_gap_frac=0.10)

If you count the positions in both the full and reduced alignments you'll see that your alignment is now a lot shorter::

	len(aln)
	len(gap_filtered_aln)

Rebuild the tree and visualize the result as before::

	gap_filtered_tree = build_tree_from_alignment(gap_filtered_aln,DNA)
	gap_filtered_dendrogram = UnrootedDendrogram(gap_filtered_tree)
	gap_filtered_dendrogram.showFigure()

Your tree should look something like this:

	.. image:: ../images/tol_gap_filtered.png

Figure 2: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.

Filtering highly variable positions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Another issue that adds noise to alignments of distantly related sequences is highly entropic (or highly variable) positions. To filter these, we can compute the Shannon Entropy or uncertainty of each position, and then remove the most 10% entropic positions.

First we'll compile the Shannon Entropy value for each position in the alignment::

	sorted_uncertainties = sorted(gap_filtered_aln.uncertainties())

Next we'll find the 90th percentile by sorting the uncertainties and finding that value that is 90% of the way through that list::

	uncertain_90p = sorted_uncertainties[int(len(sorted_uncertainties)*0.9)]

Next we'll identify and store the positions that have lower entropy than ``uncertain_90p``::

	positions_to_keep = []
	for i,u in enumerate(gap_filtered_aln.uncertainties()):
	     if u < uncertain_90p:
	         positions_to_keep.append(i)

Then we'll filter the alignment to contain only those positions::

	entropy_gap_filtered_aln = gap_filtered_aln.takePositions(positions_to_keep)

We can then rebuild and visualize the tree::

	entropy_gap_filtered_tree = build_tree_from_alignment(entropy_gap_filtered_aln,DNA)
	entropy_gap_filtered_dendrogram = UnrootedDendrogram(entropy_gap_filtered_tree)
	entropy_gap_filtered_dendrogram.showFigure()

Your tree should look something like this:

	.. image:: ../images/tol_entropy_gap_filtered.png

Figure 3: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.

While the trees in Figures 1, 2, and 3 don't look very different, an interesting point to note is the amount of information in each::

	len(aln)
	len(gap_filtered_aln)
	len(entropy_gap_filtered_aln)

The entropy and gap filtered alignment (``entropy_gap_filtered_aln``) contains approximately 1/4 of the positions as the full alignment (``aln``), yet results in a nearly identical phylogenetic tree. This suggests that the filtered positions add very little phylogenetic information. In small alignments such as the example here this may not have a large affect on run time, but when building a tree from thousands or tens of thousands of sequences removing gap and high entropy positions can save significant compute time as well as frequently improving results.

Starting with Silva sequences (to skip steps of obtaining sequences from NCBI)
------------------------------------------------------------------------------
The following sequences are randomly chosen from the Silva database. You can use these instead of pulling random sequences from NCBI.

::

	fasta_str = """>AF424517 1 994 Archaea/Crenarchaeota/uncultured/uncultured
	CAGCAGCCGCGGTAATACCAGCCCCCCGAGTGGTGGGGATGTTTATTTGGCCTAAAACGTCCGTAGCCAGCTCGGTAAATCTCTCGTTAAATCCAGCGTCCTAAGCGTTGGGCTGCGAGGGAGACTGCCAAGCTAGAGGGTGGGAGAGGTCAGCGGTATTTCTGGGGTAGGGGCGAAATCCATTGATCCCAGGAGGACCACCAGTGGCGAAGGCTGCTGACTAGAACACGCCTGACGGTGAGGGACGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAAACTCGGTGATGCCCTGGCTTGTGGCCAGTGCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGAAGTACGTACGCAAGTATGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCAGAAATCTTACCCGAAGAGACAGCAGAATGAAGGTCAAGCTGGAGACTTTACCAGACAAGCTGAGAAGTGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGATGTCCTGTTAAGTCAGGTAACCAGCGAGATCCCTGCCTCTAGTTGCCACCATTACTCTCCGGAGTAGTGGGGCGAATTAGCGGGACCGCCGTAGTTAATACGGAGGAAGGAAGGGGCCACGGCAGGTCAGTATGCCCTGAAACTTTGGGGCCACACGCGGGCTGCAATGGTAACGACAATGGGTTCCGAAACCGAAAGGTGGAGGTAATCCTCAAACGTTACCACAGTTATGATTGAGGGCTGCAACTCGCCCTCATGAATATGGAATCCCTAGTAACTGCGTGTCATTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACACACTGCCCGTCGAACCACCCGAATGAGGTTTGGGTGAGGAATGGTCGAATGTTGGCCGTTTCGAACCTGGGCTTCGTAAGGAGGGTTAAGTCGTAACAAGGTAACCGTA
	>AF448158 1 1828 Eukarya/Metazoa/Magelona et rel.
	TTGATCCTGCCAGTAGTCATATGCTTGACTCAAAGATTAAGCCATGCATGTGCAAGTACATGACTTTTTTACACACGGTGAGACCGCGAATGGCTCATTAGATCAGTCTTAGTTCCTTAGACGGAAAGTGCTACTTGGATAACTGTGGCAATTCTAGAGCTAATACGTGCACGCAAGCTCCGACCTACTGGGGAAGAGCGCAATTATTAGATCAAGACCAAACGAGTCGAAAGGCTCGAACGTCTGGTGACTCTGGATAACCTCGGGCTGACCGCACGGCCAAGAGCCGGCGGCGCATCTTTCAAGTGTCTGCCCTATCAACTTTCGATGGTATGCGATCTGCGTACCATGGTGCTTACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACCTCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCTGGCACAGGGAGGTAGTGACGAGCAATAGCGACTCGGGACTCTTTCGAGGCCTCGGGATCGGAATGAGTACAACGTAAACACTTTTGCAAGGAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGCTGTTGCAGTTAAAAAGCTCGTAGCTGAATCTCGGGTGCGGGCGGGCGGTCCGCCTTACAGCGTGCACTGCCCCGATCCTGATCCAACTGCCGGTATTATCTCGGGGTGCTCTTAGCTGAGTGTCTTGGGCTGGCCGGTGCTTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAGGCTTCCACGCCTGAATACTATAGCATGGAATAATGGAATAAGACCTCGGTTCTATTCTGTTGGTCTCTGGAAACCAGAGGTAATGATTAAGAGGGACAGACGGGGGCATTCGTATTGCGGGGCGAGAGGTGAAATTCTTAGACCCTCGCAAGACGAACTACAGCGAAAGCATTTGCCAAGCATGTTTTCTTTAGTCAAGAACGAAAGTCAGAGGTTCGAAGACGATCAGATACCGTCCTAGTTCTGACCATAAACGATGCCGACTAGCGATGCGCGAGCGTTGGTATCTGACCTCGCGCGCAGCTCCCGGGAAACCAAAGTCTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCCGGCCCGGACACTGCGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGTTCGTCGACACGCGGTTGTGTCTGGCGAGGAAACTTCTTAGAGGGACAAATGGCATTTAGTCATACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCGGGGCCGCACGCGCGCTACACTGAAGGAGACAGCGAGTGTCCTGACCTAGCCCGAAAGGGCCGGGCAATCTGCTGAACCTCTTTCGTGGTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACCAGGAATTCCGAGTAAGCGCAGGTCACAAGCCTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAGCGGTTCAGTGAGACCCTCGGACTTGCCCAGCAGGAGCCGGCGACGGCTCCGCGTGTGTGCGAGAAAGAATGTCGAACTGTATTGCTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTT
	>AJ428075 1 1749 Eukarya/Viridiplantae/Streptophyta/Klebsormidiophyceae
	TAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAATTACTCTAAATGGTAAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTATTTGATGATTCCTGCTACTCGGATAACCGTAGTAATTATAGAGCTAATACGTGCGCAAACGCCCGACTTCGGAAGGGCCGTATTTATTAGATAAAAGACCAACTCGGGGTTCGCCCCGAAACTTTGGTGATTCATAATGTAATCTCGGACCGCACGGCCTCGCGCCGGCGGCAAATCAATCAAATATCTGCCCTATCAACTTTCGATGGCAGGATAGTCGCCTGCCATGGTTGTAACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGATTCAGGGAGGTAGTGACAATAAATAACAATACCGGTCTCTTATGTGACTGGTAATTGGAATGAGCGGAACATAAATACCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCGGGACGGAGACGTCGGTCCTCCCTCGTGGTCGATACTGACTCTCTTCCTTAATTGCCTCGAGCGCCGCCTAGTCTTCATTGCCTGGGCGCGCTACGCGGCGCCGTTACCTTGAATAAATTATGGTGTTCAAAGCAGGCTTATGCTCTGAGTACATTAGCATGGAATAACGCTATAGGACTCCGGTCCTATTACGTTGGTCTTCTGACCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTACTTCATCGTTAGAGGTGAAATTCTTGGATCGATGAAAGACGAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCGCGAAGACGATTAGATACCGTCCTAGTCCCAACCGTAAACGATGCCGACCCCGAATTGGCGCACGTATGACTTGACGTCGCCAGCGCCCGAGGAGAAATCAGAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGTCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGTGTGGAGCGTGCGGCTTAATTTGACTCAACGCGGGGAATCTTACCAGGTCCAGACATAGCGACGATTGACAGACTGATAGCTCTTTCTTGATCATATGGGTAGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCTTGCTAACTAGTTGCGCGAAGATTTTCTTCGCGCACACTTCTTAGAAGGACTTTGAGCGTTTAGCTCATGGAGGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACAATGATGCATTCAGCGAGCGGAATCCCTGATCGGAAACGGTCGGGCAATCTTTGAATCTTTATCGTGATGGGGATAGACCCTTGCAATTATTGGTCTCGAACGAGGAATACCTAGTAAGCGCTCGTCATCAGCGTGCGCTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATAGAATGCTTCGGTGAAGCACTCGGATCGCGCCGCCGSCGGCGAAACCTCCGGGGACGGCATGAGAAGTTTGTTAAACCATATCGTTTAGAGGAAGGAGAAGTCGTAACAAGG
	>AJ850036 1 1961 Eukarya/Metazoa/Arthropoda/Polyphaga/Bagous et rel.
	TTGTCTCAAAGATTAAGCCATGCATGTCTCAGTACAAGCCATATTAAGGTGAAACCGCGAAAGGCTCATTAAATCAGTTATGGTTCCTTAGATCGTACCCAGGTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAGCTCCGACTGGAAACGGAAGGAGTGCTTTTATTAGATCAAAGCCAAACGGTAACTTAATGTTGTCGTACAATAATATTGTTGACTCTGAATAACTTTATGCTGATCGCATGGTCTTGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGTAGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGATACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTCAAATTTGTGTCTCGTGCCGCTGGTTCATCGTTCGCGGTGTTAATTGGCGTGATACGAGACGTCCTGCCGGTGGGCTTTCAGATTTTTCCGTATTTCAGGACCATAACAATTGGTTTGTATCTGTGGCGTAATACTGCAGTGCAGGGCAATTGGTTAATGAACGGTTGGTTTTTGTGCTACCCAAACTTACAATCCTGTCGCGTTGCTCTTGATTGAGTGACGAGGTGGGCCGGCACGTTTACTTTGAACAAATTAGAGTGCTTAAAGCAGGCAAAATTTCGCCTGAATATTCTGTGCATGGAATAATGGAATAGGACCTCGGTTCTATTTCGTTGGTTTTCGGAACTCCGAGGTAATGATTAATAGGAACGGATGGGGGCATTCGTATTGCGACGTTAGAGGTGAAATTCTTGGATCGTCGCAAGACGAACAGAAGCGAAAGCATTTGCCAAAAACGCTTTCATTGATCAAGAACGAAAGTTAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTAACCGTAAACTATGTCATCTGACGATCCGTCGACGTTCCTTTATTGACTCGACGGGCAGTTTCCGGGAAACCAAAGATTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCAGGCCCGGACACCGGAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGGCGACATATGACATCGCAAAGGCCAGCCGGTTTGATTTAAAGGGTGGCGAGGTGGCGTCAAGGCGTTTATCTCGTGCTCTTGTCAGATTGTGCGCGGTTTTTACTGTCGGCGTATAAATAATTCTTCTTAGAGGGACAGGCGGCTTTTAGCCGCACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGAATCAGCGTGTCCTCCCTGGCCGAGTGGCCCGGGTAACCCGCTGAACCTCCTTCGTGCTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACGAGGAATTCCCAGTAAGCGCGAGTCATAAGCTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAATGATTTACTGAGGTCTTCGGATCGATGCGCGATGACGTCTGACGTTGATCGATGTATCCGAGAAGATGACCAAACTTGATCATTT
	>AM745254 1 1365 Archaea/Euryarchaeota/Halobacteriales/uncultured
	TTCCGGTTGATCCTGCCGGACCTGACTGCTATTGGAGTAGGACTAAGTCACGCTAGTCAAAGGTGTGGAATGGAACACCTGGCGCACGGCTCAGTAACACGTAGTGAACCTACCCTAAGGACGAGGACAACCACGGGAAACTGTGGCTAATCCTCGATAGGAAATTTGGCCTGGAACGGTATCTTTCCTAAAACCGGCTCGCCGTGAGACACGGGCCTTAGGATGGCGCTGCGGCCGATTATGCTAGACGGCGGTGTAAAGGACCACCGTGGCGACGATCGGTATGGGCGATGGAAGTCGGAGCCCAGAGTCGGCTACTGAGACAAGGAGCCGAGCCTTACGAGGCTTAGCGGTCGCGAAAACTCGCCAATGCACGAAAGTGTGAGTGGGCTACTCCAAGTGTCATTCTTACGGATGACTGTCGCCCAGTTTTACAAGCTGGGAAAGGAAGGAGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCTTCGAGTGGTCAGGACGAATATTGGGTCTAAAGCGTTCGTAGCGGGACAAGTAGGTTCCTGGTTAAATCCGATGTCACAAGCATCGGGCTGCTGGGAATACCGCTAGTCTTGAGAGCGGGATAGGACAGGGGTAGTCTATGGGCAGGGGTGAAATCCAGTGATCCATAGGCGACCACCGATGGCGAAGGCACCTGTCTGGAACGTATCTAACCGTGATGGACGAAAGCCAGGGGAGCGACCCGGATTAGATACCCGGTTAGTCCTGGCCGTAAACGATGCCGACTAGGTGTTGCAGCGGCCAAGAGCCACTGCAGTGCCACAGTGAAGACGTTAAGTCGGCCACCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGACGGGGGCGCACCACCAGGAGTGAAGCCTGCGGTTTAATTGGATTCAACGCCGAAAAACTCACCTAAACAGACGGCAGAATGAAGCTCAAGTTAATGACTTTAGCTAACTCGCCGAGAGGAAGTGCATGGCCGTCGACAGTTCGTGCTGTGAAGTGTCTTGTTAAGTCAAGCAACGAACGAGATCCACGTCCGCAATTGCCAGCGGGTCCCTTTGGGATGCCGGGAACCTTGCGGAGACTGCTTGGTGCTAAACCAGAGGAAGGAGTGGGCAACGGCAGGTCAGTATGCTCCGATAGTTTAGGGCTACACGCGGGCTGCAATGGTCGGTACAATGGGCCGCGACCCCGAAAGGGGAAGCCAATCCCGAAAGCCGGTCTCAGTCAGGATTGGGGTTTGCAACTCAGCCCCATGAATATGGAATTCCTAGTAAACGTGTTTCATTAAGACACGTTGAATACGTCCCCGCGCCTTGTACACACCGCCCGT
	>AY175392 1 1057 Archaea/Euryarchaeota/Methanomicrobiales
	CCCTTTCTGGTTGATCCTGCCAGAGGCCACTGCTATCGGGGTTCGACTAAGCCATGCGAGTCGAGAGGGGTAATGCCCTCGGCGAACGGCTCAGTAACACGTGGACAACCTACCCTCAGATCTGGGATAACTCCGGGAAACTGGAGATAATACCGGATAATCCGTGAACGCTGGAATGCCTTACGGTTCAAAGCTTTAGCGTCTGAGGATGGGTCTGCGGCCGATTAGGTAGTTGCTGGGGTAACGTCCCAACAAGCCGATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGATGGATTCTGAGACACGAATCCAGGTCCTACGGGGCGCAGCAGGCGCGAAAACTTTACACTGCGCGAAAGCGCGATAAGGGAACCTCGAGTGCGTGCGCAATGCGTACGCTTTTCACATGCCTAAAAAGCATGTGGAATAAGAGCCGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCGGCTCAAGTGGTGGCCGCTATTATTGGGCTTAAAGGGTCCGTAGCCGGACCAGTTAGTCCCTTGGGAAATCTTACGGCTTAACCGTAAGGCTGCCAATGGATACTGCTGGCCTTGGGACCGGGAGAGGCAAGAGGTACCTCAGGGGTAGGAGTGAAATCCTGTAATCCTTGAGGGACCGCCAGTGGCGAAGGCGTCTTGCTAGAACGGGTCCGACGGTGAGGGACGAAAGCTAGGGGCACGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGCGAGCTAGGTGTCACGTGGATTGCGAATCCATGTGGTGCCGTAGGGAAACCGTGAAGCTCGCCGCCTGGGAAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGAAAGCTCACCGGAGACGACAGCGGGATGAGGGCCAGGCTGATGACCTTGCTAGACTAGCTGAGAGGAGGTGCATGGCCGCCGTCAGTTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCAAAGGG
	>AY284588 1 1736 Eukarya/Metazoa/Nematoda/Aphelenchus et rel.
	CTCAAAGATTAAGCCATGCATGTGTAAGTATAAACGATTCAATCGTGAAACCGCGAACGGCTCATTATAACAGCTATGATCTACTTGATCTTGAGAATCCTAATTGGATAACTGTAGTAATTCTAGAGCTAATACATGCATAAGAGCTCGAACCTTGCGCAAGCGGGGGAAGAGTGCATTTATTGGAAGAAGACCAGTTGTGGCTGTAAAAAGCTGCATGTCGTTGACTCGCAATAACTAAGCTGATCGCATGGCCTTGTGCCGGCGACGAGTCTTTCGAGTATCTGCCTTATCAACTTTCGACGGTAGTGTATTTGACTACCATGGTGGTGACGGGTAACGGAGGATAAGGGTTCGACTCCGGAGAAGGGGCCTGAGAAATGGCCACTACGTCTAAGGATGGCAGCAGGCGCGCAAATTACCCACTCTCGGTACGAGGAGGTAGTGACGAAAAATAACGAAGAGGTCCCCTATGGGTCTTCTATTGGAATGGGTACAATTTAAACCCTTTAACGATTAACCAAGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCTAAATGCATAGATACATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGTGTTGGGGACTTGGTCCACTCTAACGGGTGGTACTTTGCTCCTTGACAATCAATGTTGGCTCACTTGGCGTAGTCTTCAGTGATTGCGTCATAGTTGGCTGACGAGTTTACTTTGAGCAAATCAGAGTGCTCCAAACAGGCGTTTACGCTTGAATGTTCGTGCATGGAATAATAGAAGAGGATTTCGGTTCTATTTTGTTGGTTTTGAGACCGAGATAATGGTTAACAGAGACAGACGGGGGCATTCGTACTTCTGCGTGAGAGGTGAAATTCTTGGACCGCAGAAAGACGCACCACAGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGATCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATGCCAACTAGCGATCTGTCGGTGGTGTGTTTTCGCCCTGATAGGGAGCTTCCCGGAAACGAAAGTCTTCGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCCGGGCCGGACACCGTAAGGATTGACAAATTGATAGCTTTTTCATGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTCGTGGAGCGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTTGGCACATTACATTGTGCGTCCTAACTTCTTAGAGGGATTTACGGCGTATAGCCGCAAGAGAATGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGGTGAAATCAACGTGTTCTCCTATGCCGAGAGGCACTTGGGTAAACCATTGAAAATTCGCCGTGATTGGGATCGGAGATTGAAATTATTTTCCGTGAACGAGGAATTCCAAGTAAGTGCGAGTCATCAACTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACCCGGGACTGGGTTATTTCGAGAAATTTGAGGATTGGCTAGGTGCTTGATGCCTCCGGGTGTCATCGCCTGTCGAGAATCAACTTAATCGAGATGGCCTGAACCGGGT
	>AY454558 1 1110 Archaea/Crenarchaeota/uncultured/uncultured
	ACTCACTAAGAGCGAATTGGGCCTTTCGTCGCATGCTAAAAGGCCGCCATGGCCGCGGGATTGGGCACGGGGGGACGGGTTGCCGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATTTCCGTTAAGGAGATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTGGTCGGGACGTTTATTGGGCCTAAAGCATCCGTAGCCGGTTCTACAAGTCTTCCGTTAAATCCACCTGCTTAACAGATGGGCTGCGGAAGATACTATAGAGCTAGGAGGCGGGAGAGGCAAGCGGTACTCGATGGGTAGGGGTAAAATCCGTTGATCCATTGAAGACCACCAGTGGCGAAGGCGGCTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAGACTCGGTGATGAGTTGGCTTCTTGCTAACTCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCGGAAATCTTACCGGGGGCGACAGCAGAGTGAAGGTCAAGCTGAAGACTTTACCAGACAAGCTGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGGTGTCCTGTTAAGTCAGGTAACGAGCGAGATCCCTGCCTCTAGTTGCTACCATTATTCTCAGGAGTAGTGGAGCTAATTAGAGGGACCGCCGTCGCTGAGACGGAGGAAGGTGGGGGCTACGGCAGGTCAGTATGCCCCGAAACCCTCGGGCCACACGCGGGCTGCAATGGTAAGGACAATGAGTTTCAATTCCGAAAGGAGGAGGCAATCTCTAAACCTTACCACAGTTATGATTGAGGGCTGAAACTCGCCCTCATGAATATGGAATCCCTAGTAACCGCGTGTCACTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACGAGTTAACCGAATCACTAGT
	>DQ421767 1 1422 Bacteria/Beta Gammaproteobacteria/Gammaproteobacteria_1/Oceanospirillales_2/Marinomonas
	AGCGGTAACAGGAATTAGCTTGCTAATTTGCTGACGAGCGGCGGACGGGTGAGTAACGCGTAGGAATCTGCCTGGTAGTGGGGGACAACATGTGGAAACGCATGCTAATACCGCATACGCCCTACGGGGGAAAGGAGGGGATCTTCGGACCTTTCGCTATCAGATGAGCCTGCGTGAGATTAGCTAGTTGGTGGGGTAAAGGCTCACCAAGGCGACGATCTCTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGGGAAGATGATGACGTTACCAACAGAAGAAGCACCGGCTAAATCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGACCAGAAAGTTGGGGGTGAAATCCCGGGGCTCAACCCCGGAACGGCCTCCAAAACTCCTGGTCTTGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATAGGAAGGAACATCAGTGGCGAAGGCGACACCCTGGACCGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTAGCCGTTGGGGATTTTATTCTTAGTGGCGCAGCTAACGCGATAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGAATTTAGCAGAGATGCTTTAGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTATCCTTATTTGCCAGCACTTCGGGTGGGAACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAGAGGGCCGCAAGACCGCGAGGTGGAGCAAATCCCAAAAAGTACGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGATTGCTCCAGAAGTAGCTAGCTTAACCTTCGGGATGGCGGTTACCACGGAGTGGTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCTAGG
	>DQ628981 1 1786 Eukarya/Rhodophyta et al./Rhodophyta/Florideophyceae/Corallinales
	CACCTGGTTGATCCTGCCAGTGGTATATGCTTGTCTCAAAGACTAAGCCATGCAAGTCTAAGTATAAGTTATTCTTACGACAAAACTGCGAATGGCTCGGTAAAACAGCAATAATTTCTTCAGTGATGATTTTACTCACGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAAATTAAAGCAATGACCGCAAGGCCAGCGCTGTGCCGTTTAGATAACAACACCATCATTTGGTGATTCATAATCGTCTTTCTGATCGCTTCGTGCGACACACTGTTCAAATTTCTGACCTATCAACTTTCGATGGTAAGGTAGTGTCTTACCATGGTTATGACGGGTAACGGACCGTGGGTGCGGGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAAATTACCCAATCCAGACACTGGGAGGTAGTGACAAGAAATATCAATGGGGGAACTGTAAAGTTCTTCCAATTGGAATGAGATCGAGCTAAATAGCCAAATCGAGAATCCAGCAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTGTAAGCGTATACCAAAGTTGTTGCACTTAAAACGCTCGTAGTCGGACATTGGTAGTTCCGGGAGTGTGCGCGTCGTGTGCATGCTCTGCGGGACTGCCTTTCGTGGAGTTGTCGGAGGGATGAAGCATTTTAATTAATGAACGTCCACCGCGCCCACTTTTTACTGTGAGAAAATCAGAGTGCTCAAAGCAGGCAATTGCCGTGAATGTATTAGCATGGAATAATAGAATAGGACTCGTTTCTATTTTGTTGGTTTGTTGGGAATGAGTAATGATTAAGAGGGACAGTTGGGGGCATTTGTATTACGAGGCTAGAGGTGAAATTCTTAGATTCTCGTAAGACAAACTGCTGCGAAAGCGTCTGCCAAGGATGTTTTCATTGATCAAGAACGAAAGTAAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTTACTATAAACGATGAGAACTAGGGATCGGGCGAGGCATTACGATGACCCGCCCGGCACCTTCCGCGAAAGCAAAGTGTTTGCTTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCATCACCGGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTTACCAGGTCAGGACATAGTGAGGATGAACAGATTGAGAGCTCTTTTTTGATTCTATGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAGCGAGACCTGGGCGTGCTAACTAGGAGAGGCTACACTCGTGGTAGTTTTCGACTTCTTAGACGGACTGGTGGCGTCTAGCCACCGGAAGCTCCAGGCAATAACAGGTCTGAGATGCCCTTAGATGTTCTGGGCCGCACGCGTGCTACACTGAGTAATTCAATGGGTAAGGGAACACGAAAGTGCGACCTAATCTTGAAATTTGCTCGTGATGGGGATCGACGGTTGCAATTTTCCGTCGTGAACGAGGAATACCTTGTAGGCGCGTGTCATCATCACGCGCCGAATACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAGTGATCCGGTGAGGCTCTGGGACCTGAGCGGAAAGAGCGTTTCGCTTGTTCTGCTTGGGAAACTTGGTCGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTA
	>EF406474 1 1502 Bacteria/Firmicutes/Clostridiales/Ruminococcus et rel./Papillibacter et rel./Oscillospira
	TAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGCACCCTTGAAGGAGTTTTCGGACAACGGATAGGAATGCTTAGTGGCGGACTGGTGAGTAACGCGTGAGGAACCTGCCTTCCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATGACGCATTGGTGTCGCATGGCACTGATGTCAAAGATTTATCGCTGGAAGATGGCCTCGCGTCTGATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCGACGATCAGTAGCCGGACTGAGAGGTTGGCCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTAAGGGGGAAGAGCAGAAGACGGTACCCCTTGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGAGAGACAAGTCAGATGTGAAATCCACGGGCTCAACCCGTGAACTGCATTTGAAACTGTTTCCCTTGAGTGTCGGAGAGGTAATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGATTACTGGACGATAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATCGATACTAGGTGTGCGGGGACTGACCCCCTGCGTGCCGGAGTTAACACAATAAGTATCGCACCTGGGGAGTACGATCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGATTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGGCTTGACATCCTACTAACGAAGTAGAGATACATTAGGTGCCCTTCGGGACAAGAGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCTTCAGTAGCCAGCAGGTAAAGCCGGGCACTCTGGAGAGACTGCCGGGGATAACCCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAATGGCGTAAACAGAGGGAAGCGAGCCCGCGAGGGGGAGCAAATCCCAAAAATAACGTCCCAGTTCGGATTGTAGTCTGCAACCCGACTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGAATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTCGGAAATGCCCGAAGTCTGTGACCCAACCGCAAGGAGGGAGCAGCCGAAGGCAGGTCGGATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAA
	>EF516988 1 1782 Bacteria/Firmicutes/Bacillales Mollicutes/Staphylococcaceae/Staphylococcus/Staphylococcus aureus et rel./Staphylococcus aureus et rel./Staphylococcus warneri
	GTACCGCTTTGGAGCCTCTCGAGTTTGATCCTGGCTCAGGAGGTCCTAACAAGGTAACCAGTATTGGATCCCCTAGAGTTTGATCCCGGCCCCTAAAGTTTGAACAAAGTCCAGGAAATTGGGGCCCCTACAGTTTAATCTCTTTTGCTTCATGGTAAAAAACTGAAAGACGGTTTCGGCTGTCGCTATTTGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCGACGATGCGTAGCCCACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTAAGGGAAGAACAAGTACAGTAGTAACTGGCTGTACCTTGACGGTACCTTATTAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGAGAGTACGGTCGCAGGACTGAAACTCAAAAGAATTTGACGGGGGGCTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCCGTTGACCACTGTAGAGATATAGTTTCCCCTTCGGGGGCAACGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTTAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGATACAAACGGTTGCCAACTCGCGAGAGGGAGGTATCCGATAAAGTCGTTCTCAGTTCGGATTGTTGGCCCCAACTCGCGTACGTGAAACCAGAATAACCAGTAATGGCTCCTCAGCATTTTGATCCGGGCTCGTTAAGTGGTAACAAGGTAACCGCTATTGGATCCTTAGAGTTTGATCCGGCTCAGGAAGTCGTAACAAGGTAACCAGTATGGTCCTCTAGAG
	>EF551905 1 1203 Bacteria/Beta Gammaproteobacteria/Xanthomonadales
	GATAGCGGCGCGATTCGCCCTTCCTACGGGGGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCAGATCCAGCCATGCCGCGTGGGTGAAGAAGGCCTTCGGGTTGTAAAGCCCTTTTGTTGGGAAAGAAAGACGTCCGGCTAATACCCGGATGGAATGACGGTACCCAAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTCGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAATTGCATTGGATACTGGCGAGCTGGAGTGCGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGATATCGGGAGGAACATCCGTGGCGAAGGCGACTACCTGGACCAGCACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTGGATGTTGGGTTCAATCAGGAACTCAGTATCGAAGCTAACGCGTTAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGCCTTGACATGTCGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCCAGCACGTAATGGTGGGAACTCTAAGGAGACCGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTACTACAATGGGAAAGGACAGAGGGCTGCGAACCCGCGAGGGCAAGCCAATCCCAGAAACCTTTCTCCCAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCAGATCAGCATTGCTGCGGTGAATACGTTTCCGGTCTTGTACAACACCGCCCGTCACACCATGGGAGTGGGTGCCACCAGAAGTAGCTAGACTACGTTCGGGAGACCGTTACCCACGGTTGAATTCATGGACTTGGGGTGAGTCCGTAAACAGGGTTACCCCCG
	>EU132755 1 1345 Bacteria/Actinobacteria/CMN et rel./CMN/Pseudonocardiaceae_3/Pseudonocardia aurantiaca et rel./Pseudonocardia aurantiaca et rel.
	GAACGCTTGACGGCGTGCTTACACATGCAAGTCGAACGGGCCATTGCTCTTCGGGGTGGTGGTTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCGGCTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATATTCACATCTTGTTGCATGGTGGGGTGTGGAAAGGGTTTCTGGCTGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCGGTGACGGGTAGCCGGCCTGAGAGGGCGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGCGGAAGCCTGACGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCCCCGACGAAGCGAAAGTGACGGTAGGGGTAGAAGAAGCGCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTTCCGTGAAAACTGGGGGCTTAACTTCCAGCTTGCGGTGGATACGGGCTGACTGGAGTGCGGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCGCTAGGTGTGGGGGACTTTCCACGTTCTCCGTGCCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTGGCTTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCGCGGTAATCCTGTAGAGATACAGGGTCCTTCGGGGCCGTGTACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCCATGTTGCCAGCACGTGATGGTGGGGACTCATGGGAGACTGCCGGGGTCAACTCGGAGGAAGGTAGGGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCACACATGCTACAATGGCTCATACAGAGGGCTGCGATGCTGTGAGGCTGAGCGAATCCCTTAAAGTGAGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGATACGTTCCCGGGCATTGCACTCA
	>EU570118 1 1433 Archaea/Euryarchaeota/Thermoplasmatales/uncultured
	CGGTTGATCCTGCCGGCGCTCACCGCTCTTGGAATCCGATTAAGCCATGTGAGTCGAGAGGGTTCGGCCCTCGGCAAACTGCTCAGTAACACGTGGATAACCTAACCTAAGGTGGGAGATAATCTCGGAAAACTGAGGCTAATATCCCATAGACCTTGATGACTGGAATGTTTTGAGGTTTAAAGTTACGACGCCTTAGGATGGGTCTGCGGCCTATCAGGTTGTAGTTAGTGTAAAGGACTAACTAGCCGACGACGGGTACGGGCCATGGGAGTGGTTGCCCGGAGATGGACTCTGAGACACGAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTTGCAATGCGCGAAAGCGCGACAAGGGGATTCCAAGTGCATGCACTAAGTGTATGCTTTTCGTGAGTGTAAAAAGCTCACGGAATAAGGGCTGGGTAAGACTGGTGCCAGCCGCCGCGGTAATACCAGCGGCCCTAGTGGTGATCGTTTTTATTGGGCCTAAAGCGTCCGTAGCCGGTTCGGTAAATCTCTGGGTAAATCGTTGGGCTTAACCCAACGAATTCTGGGGAGACTGCCGAACTTGGGACCGGGAGAGGTCGGAGGTACTCCAGGGGTAGGGGTGAAATCCTGTAATCCTTGGGGGACCACCGGTGGCGAAAGCGTCCGACCAGAACGGGTCCGACGGTAAGGGACGAAGCCCTGGGTCGCGAACCGGATTAGATACCCGGGTAGTCCAGGGTGTAAACGCTGTGCGCTTGGTGTAGGGGGTCCTACGAGGGCATCCTGTGCCGGAGAGAAGTTGTTAAGCGCACCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACAGCAACGGGAGGAGCGTGCGGTTTAATTGGATTCAACGCCGGAAAACTCACCAGGGGCGACTGCCACATGAAGATCAAGCTGATGACTTTATCTGATTGGTAGAGAGGTGGTGCATGGCCGTCGTCAGTTCGTACCGTAGGGCGTTCTGTTAAGTCAGATAACGAACGAGACCCTTGCCCTTAATTGCCATGTTTCCCTCCGGGGGAACGGTACTTTAAGGGGACCGCTGGTGCAAAATCAGAGGAAGGGAAGGGCAACGGTAGGTCAGTATGCCCCGAATCCCCTGGGCAACACGCGCGCTACAAAGGCCGGGACAAAGGGTTCCGACACCGAGAGGTGAAGGTAATCCCGAAACCTGTCCGTAGTTCGGATCGAGGGCTGCAACCCGCCCTCGTGAAGCTGGATTCCGTAGTAATCGCAGATCAACATCCTGCGGTGAATATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCATCCGAGTGGAGTTTCGATGAGGGTGGGATTCTTGTCCTTCTCAAATCGCGATTTCGCAAGGAGGGTTAAGTCGTAACAAGGTAACC"""
	
	def label_to_name(x):
	    fields = x.split()
	    return '%s: %s' % (fields[3].split('/')[0], fields[0])
	
	seqs = LoadSeqs(data=fasta_str.split('\n'),moltype=DNA,aligned=False,label_to_name=label_to_name)

Now pick up with `Step 5 <./building_a_tree_of_life.html#step5>`_ above.