1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
|
***********************
Building a tree of life
***********************
.. authors, Greg Caporaso
Building a tree of life with PyCogent
======================================
This cookbook example runs through how to construct construct a tree of life from 16S rRNA sequences to test whether the three domains of life are visible as three separate clusters in a phylogenetic tree. This example covers compiling sequences, building a multiple sequence alignment, building a phylogenetic tree from that sequence alignment, and visualizing the tree.
Step 0. Set up your python environment
--------------------------------------
For this tutorial you'll need cogent, muscle, and FastTree installed on your system.
Start an interactive python session by entering the following into a command terminal::
python
You should now see the python command prompt::
>>>
Step 1: Download sequences from NCBI
------------------------------------
Here we'll work with archaeal, bacteria, and eukaryotic sequences obtained from NCBI using the PyCogent EUtils wrappers. Run the following commands to obtain these sequences::
from cogent.db.ncbi import EUtils
from cogent.parse.fasta import MinimalFastaParser
e = EUtils()
arc16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND archaea[orgn]']))
bac16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND bacteria[orgn]']))
euk16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND eukarya[orgn]']))
You can check how many sequences you obtained for each query by running::
len(arc16s)
len(bac16s)
len(euk16s)
.. note:: In this example you'll notice that you have relatively few sequences for each query. You'd obtain many more if you replaced the ``rRNA`` in the query with ``ribosomal RNA``, but the runtime would also be significantly longer. For the purpose of these tutorial we'll therefore stick with this command that returns fewer sequences.
Step 2: Load the sequences
--------------------------
We'll begin by loading the sequences that have been downloaded, applying a filter to retain only those that we consider to be of good quality. Sequences fewer than 750 bases or sequences containing one or more ``N`` characters will be ignored (``N`` characters typically represent ambiguous base calls during sequencing).
First, define a function to load and filter the sequences::
from cogent.parse.fasta import MinimalFastaParser
def load_and_filter_seqs(seqs, domain_label):
result = []
for seq_id, seq in seqs:
if len(seq) > 750 and seq.count('N') < 1:
result.append((domain_label + seq_id,seq))
return result
Next, load and filter the three sequence sets::
arc16s_filtered = load_and_filter_seqs(arc16s,'A: ')
bac16s_filtered = load_and_filter_seqs(bac16s,'B: ')
euk16s_filtered = load_and_filter_seqs(euk16s,'E: ')
len(arc16s_filtered)
len(bac16s_filtered)
len(euk16s_filtered)
Step 3: Select a random subset of the sequences
-----------------------------------------------
Import shuffle from the random module to extract a random collection of sequences::
from random import shuffle
shuffle(arc16s_filtered)
shuffle(bac16s_filtered)
shuffle(euk16s_filtered)
Select some random sequences from each domain. Note that only a few sequences are chosen to facilitate a quick analysis::
combined16s = arc16s_filtered[:3] + bac16s_filtered[:10] + euk16s_filtered[:6]
len(combined16s)
Step 4: Load the sequences into a SequenceCollection object
-----------------------------------------------------------
Use ``LoadSeqs`` to load the unaligned sequences into a ``SequenceCollection`` object. In this step we'll rename the sequences (by passing a ``label_to_name`` function) to only the accession number for the sequence. This facilitates visualization in downstream steps.
::
from cogent import LoadSeqs, DNA
seqs = LoadSeqs(data=combined16s,moltype=DNA,aligned=False,label_to_name=lambda x: '|'.join(x.split('|')[:2]))
You can explore some properties of this sequence collection. For example, you can count how many sequences are in the sequence collection object::
seqs.getNumSeqs()
.. _step5:
Step 5: Align the sequences using muscle
----------------------------------------
Load an aligner function, and align the sequences. Here we'll align with muscle via the muscle application controller. The sequences will be loaded into an ``Alignment`` object called ``aln``.
::
from cogent.app.muscle import align_unaligned_seqs
aln = align_unaligned_seqs(seqs,DNA)
Step 6: Build a tree from the alignment using FastTree
------------------------------------------------------
Load a tree-building function, and build a tree from the alignment. Here we'll use FastTree. The tree will be stored in a ``PhyloNode`` object called ``tree``.
::
from cogent.app.fasttree import build_tree_from_alignment
tree = build_tree_from_alignment(aln,DNA)
Step 7: Visualize the tree
------------------------------------------
Load a drawing function to generate a prettier picture of the tree::
from cogent.draw.dendrogram import UnrootedDendrogram
dendrogram = UnrootedDendrogram(tree)
Have a quick look at the unrooted dendrogram::
dendrogram.showFigure()
You should see something like this:
.. image:: ../images/tol_not_gap_filtered.png
Figure 1: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.
Step 8: Save the tree as a PDF
-------------------------------
Finally, you can save this tree as a PDF for sharing or later viewing::
dendrogram.drawToPDF('./tol.pdf')
You can also write the alignment and tree to fasta and newick files, respectively. You can then load these in tools such as `BoulderALE <http://www.microbio.me/boulderale/>`_ (for alignment editing) or `TopiaryExplorer <http://topiaryexplorer.sourceforge.net/>`_ or `FigTree <http://tree.bio.ed.ac.uk/software/figtree/>`_ (for tree viewing, coloring, and layout manipulation).
::
open('./tol.fasta','w').write(aln.toFasta())
open('./tol.tre','w').write(tree.getNewick(with_distances=True))
Extra credit: Alignment filtering
---------------------------------
Filter highly gapped positions from the alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To try to improve the quality of the alignment and therefore the tree, it's often a good idea to removed positions that contain a high proportion of gap characters from the alignment. These generally represent non-homologous regions of the sequence of interest, and therefore contribute little to our understanding of the evolutionary history of the sequence. These steps may result in a clearer delineation of the three domains on your tree, but the results will in part be dependent on the randomly chosen sequences in your alignment.
To remove positions that are greater than 10% gap characters from the alignment, run the following command::
gap_filtered_aln = aln.omitGapPositions(allowed_gap_frac=0.10)
If you count the positions in both the full and reduced alignments you'll see that your alignment is now a lot shorter::
len(aln)
len(gap_filtered_aln)
Rebuild the tree and visualize the result as before::
gap_filtered_tree = build_tree_from_alignment(gap_filtered_aln,DNA)
gap_filtered_dendrogram = UnrootedDendrogram(gap_filtered_tree)
gap_filtered_dendrogram.showFigure()
Your tree should look something like this:
.. image:: ../images/tol_gap_filtered.png
Figure 2: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.
Filtering highly variable positions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Another issue that adds noise to alignments of distantly related sequences is highly entropic (or highly variable) positions. To filter these, we can compute the Shannon Entropy or uncertainty of each position, and then remove the most 10% entropic positions.
First we'll compile the Shannon Entropy value for each position in the alignment::
sorted_uncertainties = sorted(gap_filtered_aln.uncertainties())
Next we'll find the 90th percentile by sorting the uncertainties and finding that value that is 90% of the way through that list::
uncertain_90p = sorted_uncertainties[int(len(sorted_uncertainties)*0.9)]
Next we'll identify and store the positions that have lower entropy than ``uncertain_90p``::
positions_to_keep = []
for i,u in enumerate(gap_filtered_aln.uncertainties()):
if u < uncertain_90p:
positions_to_keep.append(i)
Then we'll filter the alignment to contain only those positions::
entropy_gap_filtered_aln = gap_filtered_aln.takePositions(positions_to_keep)
We can then rebuild and visualize the tree::
entropy_gap_filtered_tree = build_tree_from_alignment(entropy_gap_filtered_aln,DNA)
entropy_gap_filtered_dendrogram = UnrootedDendrogram(entropy_gap_filtered_tree)
entropy_gap_filtered_dendrogram.showFigure()
Your tree should look something like this:
.. image:: ../images/tol_entropy_gap_filtered.png
Figure 3: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.
While the trees in Figures 1, 2, and 3 don't look very different, an interesting point to note is the amount of information in each::
len(aln)
len(gap_filtered_aln)
len(entropy_gap_filtered_aln)
The entropy and gap filtered alignment (``entropy_gap_filtered_aln``) contains approximately 1/4 of the positions as the full alignment (``aln``), yet results in a nearly identical phylogenetic tree. This suggests that the filtered positions add very little phylogenetic information. In small alignments such as the example here this may not have a large affect on run time, but when building a tree from thousands or tens of thousands of sequences removing gap and high entropy positions can save significant compute time as well as frequently improving results.
Starting with Silva sequences (to skip steps of obtaining sequences from NCBI)
------------------------------------------------------------------------------
The following sequences are randomly chosen from the Silva database. You can use these instead of pulling random sequences from NCBI.
::
fasta_str = """>AF424517 1 994 Archaea/Crenarchaeota/uncultured/uncultured
CAGCAGCCGCGGTAATACCAGCCCCCCGAGTGGTGGGGATGTTTATTTGGCCTAAAACGTCCGTAGCCAGCTCGGTAAATCTCTCGTTAAATCCAGCGTCCTAAGCGTTGGGCTGCGAGGGAGACTGCCAAGCTAGAGGGTGGGAGAGGTCAGCGGTATTTCTGGGGTAGGGGCGAAATCCATTGATCCCAGGAGGACCACCAGTGGCGAAGGCTGCTGACTAGAACACGCCTGACGGTGAGGGACGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAAACTCGGTGATGCCCTGGCTTGTGGCCAGTGCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGAAGTACGTACGCAAGTATGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCAGAAATCTTACCCGAAGAGACAGCAGAATGAAGGTCAAGCTGGAGACTTTACCAGACAAGCTGAGAAGTGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGATGTCCTGTTAAGTCAGGTAACCAGCGAGATCCCTGCCTCTAGTTGCCACCATTACTCTCCGGAGTAGTGGGGCGAATTAGCGGGACCGCCGTAGTTAATACGGAGGAAGGAAGGGGCCACGGCAGGTCAGTATGCCCTGAAACTTTGGGGCCACACGCGGGCTGCAATGGTAACGACAATGGGTTCCGAAACCGAAAGGTGGAGGTAATCCTCAAACGTTACCACAGTTATGATTGAGGGCTGCAACTCGCCCTCATGAATATGGAATCCCTAGTAACTGCGTGTCATTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACACACTGCCCGTCGAACCACCCGAATGAGGTTTGGGTGAGGAATGGTCGAATGTTGGCCGTTTCGAACCTGGGCTTCGTAAGGAGGGTTAAGTCGTAACAAGGTAACCGTA
>AF448158 1 1828 Eukarya/Metazoa/Magelona et rel.
TTGATCCTGCCAGTAGTCATATGCTTGACTCAAAGATTAAGCCATGCATGTGCAAGTACATGACTTTTTTACACACGGTGAGACCGCGAATGGCTCATTAGATCAGTCTTAGTTCCTTAGACGGAAAGTGCTACTTGGATAACTGTGGCAATTCTAGAGCTAATACGTGCACGCAAGCTCCGACCTACTGGGGAAGAGCGCAATTATTAGATCAAGACCAAACGAGTCGAAAGGCTCGAACGTCTGGTGACTCTGGATAACCTCGGGCTGACCGCACGGCCAAGAGCCGGCGGCGCATCTTTCAAGTGTCTGCCCTATCAACTTTCGATGGTATGCGATCTGCGTACCATGGTGCTTACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACCTCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCTGGCACAGGGAGGTAGTGACGAGCAATAGCGACTCGGGACTCTTTCGAGGCCTCGGGATCGGAATGAGTACAACGTAAACACTTTTGCAAGGAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGCTGTTGCAGTTAAAAAGCTCGTAGCTGAATCTCGGGTGCGGGCGGGCGGTCCGCCTTACAGCGTGCACTGCCCCGATCCTGATCCAACTGCCGGTATTATCTCGGGGTGCTCTTAGCTGAGTGTCTTGGGCTGGCCGGTGCTTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAGGCTTCCACGCCTGAATACTATAGCATGGAATAATGGAATAAGACCTCGGTTCTATTCTGTTGGTCTCTGGAAACCAGAGGTAATGATTAAGAGGGACAGACGGGGGCATTCGTATTGCGGGGCGAGAGGTGAAATTCTTAGACCCTCGCAAGACGAACTACAGCGAAAGCATTTGCCAAGCATGTTTTCTTTAGTCAAGAACGAAAGTCAGAGGTTCGAAGACGATCAGATACCGTCCTAGTTCTGACCATAAACGATGCCGACTAGCGATGCGCGAGCGTTGGTATCTGACCTCGCGCGCAGCTCCCGGGAAACCAAAGTCTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCCGGCCCGGACACTGCGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGTTCGTCGACACGCGGTTGTGTCTGGCGAGGAAACTTCTTAGAGGGACAAATGGCATTTAGTCATACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCGGGGCCGCACGCGCGCTACACTGAAGGAGACAGCGAGTGTCCTGACCTAGCCCGAAAGGGCCGGGCAATCTGCTGAACCTCTTTCGTGGTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACCAGGAATTCCGAGTAAGCGCAGGTCACAAGCCTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAGCGGTTCAGTGAGACCCTCGGACTTGCCCAGCAGGAGCCGGCGACGGCTCCGCGTGTGTGCGAGAAAGAATGTCGAACTGTATTGCTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTT
>AJ428075 1 1749 Eukarya/Viridiplantae/Streptophyta/Klebsormidiophyceae
TAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAATTACTCTAAATGGTAAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTATTTGATGATTCCTGCTACTCGGATAACCGTAGTAATTATAGAGCTAATACGTGCGCAAACGCCCGACTTCGGAAGGGCCGTATTTATTAGATAAAAGACCAACTCGGGGTTCGCCCCGAAACTTTGGTGATTCATAATGTAATCTCGGACCGCACGGCCTCGCGCCGGCGGCAAATCAATCAAATATCTGCCCTATCAACTTTCGATGGCAGGATAGTCGCCTGCCATGGTTGTAACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGATTCAGGGAGGTAGTGACAATAAATAACAATACCGGTCTCTTATGTGACTGGTAATTGGAATGAGCGGAACATAAATACCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCGGGACGGAGACGTCGGTCCTCCCTCGTGGTCGATACTGACTCTCTTCCTTAATTGCCTCGAGCGCCGCCTAGTCTTCATTGCCTGGGCGCGCTACGCGGCGCCGTTACCTTGAATAAATTATGGTGTTCAAAGCAGGCTTATGCTCTGAGTACATTAGCATGGAATAACGCTATAGGACTCCGGTCCTATTACGTTGGTCTTCTGACCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTACTTCATCGTTAGAGGTGAAATTCTTGGATCGATGAAAGACGAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCGCGAAGACGATTAGATACCGTCCTAGTCCCAACCGTAAACGATGCCGACCCCGAATTGGCGCACGTATGACTTGACGTCGCCAGCGCCCGAGGAGAAATCAGAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGTCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGTGTGGAGCGTGCGGCTTAATTTGACTCAACGCGGGGAATCTTACCAGGTCCAGACATAGCGACGATTGACAGACTGATAGCTCTTTCTTGATCATATGGGTAGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCTTGCTAACTAGTTGCGCGAAGATTTTCTTCGCGCACACTTCTTAGAAGGACTTTGAGCGTTTAGCTCATGGAGGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACAATGATGCATTCAGCGAGCGGAATCCCTGATCGGAAACGGTCGGGCAATCTTTGAATCTTTATCGTGATGGGGATAGACCCTTGCAATTATTGGTCTCGAACGAGGAATACCTAGTAAGCGCTCGTCATCAGCGTGCGCTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATAGAATGCTTCGGTGAAGCACTCGGATCGCGCCGCCGSCGGCGAAACCTCCGGGGACGGCATGAGAAGTTTGTTAAACCATATCGTTTAGAGGAAGGAGAAGTCGTAACAAGG
>AJ850036 1 1961 Eukarya/Metazoa/Arthropoda/Polyphaga/Bagous et rel.
TTGTCTCAAAGATTAAGCCATGCATGTCTCAGTACAAGCCATATTAAGGTGAAACCGCGAAAGGCTCATTAAATCAGTTATGGTTCCTTAGATCGTACCCAGGTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAGCTCCGACTGGAAACGGAAGGAGTGCTTTTATTAGATCAAAGCCAAACGGTAACTTAATGTTGTCGTACAATAATATTGTTGACTCTGAATAACTTTATGCTGATCGCATGGTCTTGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGTAGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGATACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTCAAATTTGTGTCTCGTGCCGCTGGTTCATCGTTCGCGGTGTTAATTGGCGTGATACGAGACGTCCTGCCGGTGGGCTTTCAGATTTTTCCGTATTTCAGGACCATAACAATTGGTTTGTATCTGTGGCGTAATACTGCAGTGCAGGGCAATTGGTTAATGAACGGTTGGTTTTTGTGCTACCCAAACTTACAATCCTGTCGCGTTGCTCTTGATTGAGTGACGAGGTGGGCCGGCACGTTTACTTTGAACAAATTAGAGTGCTTAAAGCAGGCAAAATTTCGCCTGAATATTCTGTGCATGGAATAATGGAATAGGACCTCGGTTCTATTTCGTTGGTTTTCGGAACTCCGAGGTAATGATTAATAGGAACGGATGGGGGCATTCGTATTGCGACGTTAGAGGTGAAATTCTTGGATCGTCGCAAGACGAACAGAAGCGAAAGCATTTGCCAAAAACGCTTTCATTGATCAAGAACGAAAGTTAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTAACCGTAAACTATGTCATCTGACGATCCGTCGACGTTCCTTTATTGACTCGACGGGCAGTTTCCGGGAAACCAAAGATTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCAGGCCCGGACACCGGAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGGCGACATATGACATCGCAAAGGCCAGCCGGTTTGATTTAAAGGGTGGCGAGGTGGCGTCAAGGCGTTTATCTCGTGCTCTTGTCAGATTGTGCGCGGTTTTTACTGTCGGCGTATAAATAATTCTTCTTAGAGGGACAGGCGGCTTTTAGCCGCACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGAATCAGCGTGTCCTCCCTGGCCGAGTGGCCCGGGTAACCCGCTGAACCTCCTTCGTGCTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACGAGGAATTCCCAGTAAGCGCGAGTCATAAGCTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAATGATTTACTGAGGTCTTCGGATCGATGCGCGATGACGTCTGACGTTGATCGATGTATCCGAGAAGATGACCAAACTTGATCATTT
>AM745254 1 1365 Archaea/Euryarchaeota/Halobacteriales/uncultured
TTCCGGTTGATCCTGCCGGACCTGACTGCTATTGGAGTAGGACTAAGTCACGCTAGTCAAAGGTGTGGAATGGAACACCTGGCGCACGGCTCAGTAACACGTAGTGAACCTACCCTAAGGACGAGGACAACCACGGGAAACTGTGGCTAATCCTCGATAGGAAATTTGGCCTGGAACGGTATCTTTCCTAAAACCGGCTCGCCGTGAGACACGGGCCTTAGGATGGCGCTGCGGCCGATTATGCTAGACGGCGGTGTAAAGGACCACCGTGGCGACGATCGGTATGGGCGATGGAAGTCGGAGCCCAGAGTCGGCTACTGAGACAAGGAGCCGAGCCTTACGAGGCTTAGCGGTCGCGAAAACTCGCCAATGCACGAAAGTGTGAGTGGGCTACTCCAAGTGTCATTCTTACGGATGACTGTCGCCCAGTTTTACAAGCTGGGAAAGGAAGGAGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCTTCGAGTGGTCAGGACGAATATTGGGTCTAAAGCGTTCGTAGCGGGACAAGTAGGTTCCTGGTTAAATCCGATGTCACAAGCATCGGGCTGCTGGGAATACCGCTAGTCTTGAGAGCGGGATAGGACAGGGGTAGTCTATGGGCAGGGGTGAAATCCAGTGATCCATAGGCGACCACCGATGGCGAAGGCACCTGTCTGGAACGTATCTAACCGTGATGGACGAAAGCCAGGGGAGCGACCCGGATTAGATACCCGGTTAGTCCTGGCCGTAAACGATGCCGACTAGGTGTTGCAGCGGCCAAGAGCCACTGCAGTGCCACAGTGAAGACGTTAAGTCGGCCACCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGACGGGGGCGCACCACCAGGAGTGAAGCCTGCGGTTTAATTGGATTCAACGCCGAAAAACTCACCTAAACAGACGGCAGAATGAAGCTCAAGTTAATGACTTTAGCTAACTCGCCGAGAGGAAGTGCATGGCCGTCGACAGTTCGTGCTGTGAAGTGTCTTGTTAAGTCAAGCAACGAACGAGATCCACGTCCGCAATTGCCAGCGGGTCCCTTTGGGATGCCGGGAACCTTGCGGAGACTGCTTGGTGCTAAACCAGAGGAAGGAGTGGGCAACGGCAGGTCAGTATGCTCCGATAGTTTAGGGCTACACGCGGGCTGCAATGGTCGGTACAATGGGCCGCGACCCCGAAAGGGGAAGCCAATCCCGAAAGCCGGTCTCAGTCAGGATTGGGGTTTGCAACTCAGCCCCATGAATATGGAATTCCTAGTAAACGTGTTTCATTAAGACACGTTGAATACGTCCCCGCGCCTTGTACACACCGCCCGT
>AY175392 1 1057 Archaea/Euryarchaeota/Methanomicrobiales
CCCTTTCTGGTTGATCCTGCCAGAGGCCACTGCTATCGGGGTTCGACTAAGCCATGCGAGTCGAGAGGGGTAATGCCCTCGGCGAACGGCTCAGTAACACGTGGACAACCTACCCTCAGATCTGGGATAACTCCGGGAAACTGGAGATAATACCGGATAATCCGTGAACGCTGGAATGCCTTACGGTTCAAAGCTTTAGCGTCTGAGGATGGGTCTGCGGCCGATTAGGTAGTTGCTGGGGTAACGTCCCAACAAGCCGATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGATGGATTCTGAGACACGAATCCAGGTCCTACGGGGCGCAGCAGGCGCGAAAACTTTACACTGCGCGAAAGCGCGATAAGGGAACCTCGAGTGCGTGCGCAATGCGTACGCTTTTCACATGCCTAAAAAGCATGTGGAATAAGAGCCGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCGGCTCAAGTGGTGGCCGCTATTATTGGGCTTAAAGGGTCCGTAGCCGGACCAGTTAGTCCCTTGGGAAATCTTACGGCTTAACCGTAAGGCTGCCAATGGATACTGCTGGCCTTGGGACCGGGAGAGGCAAGAGGTACCTCAGGGGTAGGAGTGAAATCCTGTAATCCTTGAGGGACCGCCAGTGGCGAAGGCGTCTTGCTAGAACGGGTCCGACGGTGAGGGACGAAAGCTAGGGGCACGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGCGAGCTAGGTGTCACGTGGATTGCGAATCCATGTGGTGCCGTAGGGAAACCGTGAAGCTCGCCGCCTGGGAAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGAAAGCTCACCGGAGACGACAGCGGGATGAGGGCCAGGCTGATGACCTTGCTAGACTAGCTGAGAGGAGGTGCATGGCCGCCGTCAGTTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCAAAGGG
>AY284588 1 1736 Eukarya/Metazoa/Nematoda/Aphelenchus et rel.
CTCAAAGATTAAGCCATGCATGTGTAAGTATAAACGATTCAATCGTGAAACCGCGAACGGCTCATTATAACAGCTATGATCTACTTGATCTTGAGAATCCTAATTGGATAACTGTAGTAATTCTAGAGCTAATACATGCATAAGAGCTCGAACCTTGCGCAAGCGGGGGAAGAGTGCATTTATTGGAAGAAGACCAGTTGTGGCTGTAAAAAGCTGCATGTCGTTGACTCGCAATAACTAAGCTGATCGCATGGCCTTGTGCCGGCGACGAGTCTTTCGAGTATCTGCCTTATCAACTTTCGACGGTAGTGTATTTGACTACCATGGTGGTGACGGGTAACGGAGGATAAGGGTTCGACTCCGGAGAAGGGGCCTGAGAAATGGCCACTACGTCTAAGGATGGCAGCAGGCGCGCAAATTACCCACTCTCGGTACGAGGAGGTAGTGACGAAAAATAACGAAGAGGTCCCCTATGGGTCTTCTATTGGAATGGGTACAATTTAAACCCTTTAACGATTAACCAAGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCTAAATGCATAGATACATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGTGTTGGGGACTTGGTCCACTCTAACGGGTGGTACTTTGCTCCTTGACAATCAATGTTGGCTCACTTGGCGTAGTCTTCAGTGATTGCGTCATAGTTGGCTGACGAGTTTACTTTGAGCAAATCAGAGTGCTCCAAACAGGCGTTTACGCTTGAATGTTCGTGCATGGAATAATAGAAGAGGATTTCGGTTCTATTTTGTTGGTTTTGAGACCGAGATAATGGTTAACAGAGACAGACGGGGGCATTCGTACTTCTGCGTGAGAGGTGAAATTCTTGGACCGCAGAAAGACGCACCACAGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGATCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATGCCAACTAGCGATCTGTCGGTGGTGTGTTTTCGCCCTGATAGGGAGCTTCCCGGAAACGAAAGTCTTCGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCCGGGCCGGACACCGTAAGGATTGACAAATTGATAGCTTTTTCATGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTCGTGGAGCGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTTGGCACATTACATTGTGCGTCCTAACTTCTTAGAGGGATTTACGGCGTATAGCCGCAAGAGAATGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGGTGAAATCAACGTGTTCTCCTATGCCGAGAGGCACTTGGGTAAACCATTGAAAATTCGCCGTGATTGGGATCGGAGATTGAAATTATTTTCCGTGAACGAGGAATTCCAAGTAAGTGCGAGTCATCAACTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACCCGGGACTGGGTTATTTCGAGAAATTTGAGGATTGGCTAGGTGCTTGATGCCTCCGGGTGTCATCGCCTGTCGAGAATCAACTTAATCGAGATGGCCTGAACCGGGT
>AY454558 1 1110 Archaea/Crenarchaeota/uncultured/uncultured
ACTCACTAAGAGCGAATTGGGCCTTTCGTCGCATGCTAAAAGGCCGCCATGGCCGCGGGATTGGGCACGGGGGGACGGGTTGCCGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATTTCCGTTAAGGAGATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTGGTCGGGACGTTTATTGGGCCTAAAGCATCCGTAGCCGGTTCTACAAGTCTTCCGTTAAATCCACCTGCTTAACAGATGGGCTGCGGAAGATACTATAGAGCTAGGAGGCGGGAGAGGCAAGCGGTACTCGATGGGTAGGGGTAAAATCCGTTGATCCATTGAAGACCACCAGTGGCGAAGGCGGCTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAGACTCGGTGATGAGTTGGCTTCTTGCTAACTCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCGGAAATCTTACCGGGGGCGACAGCAGAGTGAAGGTCAAGCTGAAGACTTTACCAGACAAGCTGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGGTGTCCTGTTAAGTCAGGTAACGAGCGAGATCCCTGCCTCTAGTTGCTACCATTATTCTCAGGAGTAGTGGAGCTAATTAGAGGGACCGCCGTCGCTGAGACGGAGGAAGGTGGGGGCTACGGCAGGTCAGTATGCCCCGAAACCCTCGGGCCACACGCGGGCTGCAATGGTAAGGACAATGAGTTTCAATTCCGAAAGGAGGAGGCAATCTCTAAACCTTACCACAGTTATGATTGAGGGCTGAAACTCGCCCTCATGAATATGGAATCCCTAGTAACCGCGTGTCACTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACGAGTTAACCGAATCACTAGT
>DQ421767 1 1422 Bacteria/Beta Gammaproteobacteria/Gammaproteobacteria_1/Oceanospirillales_2/Marinomonas
AGCGGTAACAGGAATTAGCTTGCTAATTTGCTGACGAGCGGCGGACGGGTGAGTAACGCGTAGGAATCTGCCTGGTAGTGGGGGACAACATGTGGAAACGCATGCTAATACCGCATACGCCCTACGGGGGAAAGGAGGGGATCTTCGGACCTTTCGCTATCAGATGAGCCTGCGTGAGATTAGCTAGTTGGTGGGGTAAAGGCTCACCAAGGCGACGATCTCTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGGGAAGATGATGACGTTACCAACAGAAGAAGCACCGGCTAAATCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGACCAGAAAGTTGGGGGTGAAATCCCGGGGCTCAACCCCGGAACGGCCTCCAAAACTCCTGGTCTTGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATAGGAAGGAACATCAGTGGCGAAGGCGACACCCTGGACCGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTAGCCGTTGGGGATTTTATTCTTAGTGGCGCAGCTAACGCGATAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGAATTTAGCAGAGATGCTTTAGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTATCCTTATTTGCCAGCACTTCGGGTGGGAACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAGAGGGCCGCAAGACCGCGAGGTGGAGCAAATCCCAAAAAGTACGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGATTGCTCCAGAAGTAGCTAGCTTAACCTTCGGGATGGCGGTTACCACGGAGTGGTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCTAGG
>DQ628981 1 1786 Eukarya/Rhodophyta et al./Rhodophyta/Florideophyceae/Corallinales
CACCTGGTTGATCCTGCCAGTGGTATATGCTTGTCTCAAAGACTAAGCCATGCAAGTCTAAGTATAAGTTATTCTTACGACAAAACTGCGAATGGCTCGGTAAAACAGCAATAATTTCTTCAGTGATGATTTTACTCACGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAAATTAAAGCAATGACCGCAAGGCCAGCGCTGTGCCGTTTAGATAACAACACCATCATTTGGTGATTCATAATCGTCTTTCTGATCGCTTCGTGCGACACACTGTTCAAATTTCTGACCTATCAACTTTCGATGGTAAGGTAGTGTCTTACCATGGTTATGACGGGTAACGGACCGTGGGTGCGGGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAAATTACCCAATCCAGACACTGGGAGGTAGTGACAAGAAATATCAATGGGGGAACTGTAAAGTTCTTCCAATTGGAATGAGATCGAGCTAAATAGCCAAATCGAGAATCCAGCAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTGTAAGCGTATACCAAAGTTGTTGCACTTAAAACGCTCGTAGTCGGACATTGGTAGTTCCGGGAGTGTGCGCGTCGTGTGCATGCTCTGCGGGACTGCCTTTCGTGGAGTTGTCGGAGGGATGAAGCATTTTAATTAATGAACGTCCACCGCGCCCACTTTTTACTGTGAGAAAATCAGAGTGCTCAAAGCAGGCAATTGCCGTGAATGTATTAGCATGGAATAATAGAATAGGACTCGTTTCTATTTTGTTGGTTTGTTGGGAATGAGTAATGATTAAGAGGGACAGTTGGGGGCATTTGTATTACGAGGCTAGAGGTGAAATTCTTAGATTCTCGTAAGACAAACTGCTGCGAAAGCGTCTGCCAAGGATGTTTTCATTGATCAAGAACGAAAGTAAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTTACTATAAACGATGAGAACTAGGGATCGGGCGAGGCATTACGATGACCCGCCCGGCACCTTCCGCGAAAGCAAAGTGTTTGCTTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCATCACCGGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTTACCAGGTCAGGACATAGTGAGGATGAACAGATTGAGAGCTCTTTTTTGATTCTATGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAGCGAGACCTGGGCGTGCTAACTAGGAGAGGCTACACTCGTGGTAGTTTTCGACTTCTTAGACGGACTGGTGGCGTCTAGCCACCGGAAGCTCCAGGCAATAACAGGTCTGAGATGCCCTTAGATGTTCTGGGCCGCACGCGTGCTACACTGAGTAATTCAATGGGTAAGGGAACACGAAAGTGCGACCTAATCTTGAAATTTGCTCGTGATGGGGATCGACGGTTGCAATTTTCCGTCGTGAACGAGGAATACCTTGTAGGCGCGTGTCATCATCACGCGCCGAATACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAGTGATCCGGTGAGGCTCTGGGACCTGAGCGGAAAGAGCGTTTCGCTTGTTCTGCTTGGGAAACTTGGTCGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTA
>EF406474 1 1502 Bacteria/Firmicutes/Clostridiales/Ruminococcus et rel./Papillibacter et rel./Oscillospira
TAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGCACCCTTGAAGGAGTTTTCGGACAACGGATAGGAATGCTTAGTGGCGGACTGGTGAGTAACGCGTGAGGAACCTGCCTTCCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATGACGCATTGGTGTCGCATGGCACTGATGTCAAAGATTTATCGCTGGAAGATGGCCTCGCGTCTGATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCGACGATCAGTAGCCGGACTGAGAGGTTGGCCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTAAGGGGGAAGAGCAGAAGACGGTACCCCTTGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGAGAGACAAGTCAGATGTGAAATCCACGGGCTCAACCCGTGAACTGCATTTGAAACTGTTTCCCTTGAGTGTCGGAGAGGTAATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGATTACTGGACGATAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATCGATACTAGGTGTGCGGGGACTGACCCCCTGCGTGCCGGAGTTAACACAATAAGTATCGCACCTGGGGAGTACGATCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGATTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGGCTTGACATCCTACTAACGAAGTAGAGATACATTAGGTGCCCTTCGGGACAAGAGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCTTCAGTAGCCAGCAGGTAAAGCCGGGCACTCTGGAGAGACTGCCGGGGATAACCCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAATGGCGTAAACAGAGGGAAGCGAGCCCGCGAGGGGGAGCAAATCCCAAAAATAACGTCCCAGTTCGGATTGTAGTCTGCAACCCGACTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGAATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTCGGAAATGCCCGAAGTCTGTGACCCAACCGCAAGGAGGGAGCAGCCGAAGGCAGGTCGGATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAA
>EF516988 1 1782 Bacteria/Firmicutes/Bacillales Mollicutes/Staphylococcaceae/Staphylococcus/Staphylococcus aureus et rel./Staphylococcus aureus et rel./Staphylococcus warneri
GTACCGCTTTGGAGCCTCTCGAGTTTGATCCTGGCTCAGGAGGTCCTAACAAGGTAACCAGTATTGGATCCCCTAGAGTTTGATCCCGGCCCCTAAAGTTTGAACAAAGTCCAGGAAATTGGGGCCCCTACAGTTTAATCTCTTTTGCTTCATGGTAAAAAACTGAAAGACGGTTTCGGCTGTCGCTATTTGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCGACGATGCGTAGCCCACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTAAGGGAAGAACAAGTACAGTAGTAACTGGCTGTACCTTGACGGTACCTTATTAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGAGAGTACGGTCGCAGGACTGAAACTCAAAAGAATTTGACGGGGGGCTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCCGTTGACCACTGTAGAGATATAGTTTCCCCTTCGGGGGCAACGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTTAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGATACAAACGGTTGCCAACTCGCGAGAGGGAGGTATCCGATAAAGTCGTTCTCAGTTCGGATTGTTGGCCCCAACTCGCGTACGTGAAACCAGAATAACCAGTAATGGCTCCTCAGCATTTTGATCCGGGCTCGTTAAGTGGTAACAAGGTAACCGCTATTGGATCCTTAGAGTTTGATCCGGCTCAGGAAGTCGTAACAAGGTAACCAGTATGGTCCTCTAGAG
>EF551905 1 1203 Bacteria/Beta Gammaproteobacteria/Xanthomonadales
GATAGCGGCGCGATTCGCCCTTCCTACGGGGGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCAGATCCAGCCATGCCGCGTGGGTGAAGAAGGCCTTCGGGTTGTAAAGCCCTTTTGTTGGGAAAGAAAGACGTCCGGCTAATACCCGGATGGAATGACGGTACCCAAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTCGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAATTGCATTGGATACTGGCGAGCTGGAGTGCGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGATATCGGGAGGAACATCCGTGGCGAAGGCGACTACCTGGACCAGCACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTGGATGTTGGGTTCAATCAGGAACTCAGTATCGAAGCTAACGCGTTAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGCCTTGACATGTCGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCCAGCACGTAATGGTGGGAACTCTAAGGAGACCGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTACTACAATGGGAAAGGACAGAGGGCTGCGAACCCGCGAGGGCAAGCCAATCCCAGAAACCTTTCTCCCAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCAGATCAGCATTGCTGCGGTGAATACGTTTCCGGTCTTGTACAACACCGCCCGTCACACCATGGGAGTGGGTGCCACCAGAAGTAGCTAGACTACGTTCGGGAGACCGTTACCCACGGTTGAATTCATGGACTTGGGGTGAGTCCGTAAACAGGGTTACCCCCG
>EU132755 1 1345 Bacteria/Actinobacteria/CMN et rel./CMN/Pseudonocardiaceae_3/Pseudonocardia aurantiaca et rel./Pseudonocardia aurantiaca et rel.
GAACGCTTGACGGCGTGCTTACACATGCAAGTCGAACGGGCCATTGCTCTTCGGGGTGGTGGTTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCGGCTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATATTCACATCTTGTTGCATGGTGGGGTGTGGAAAGGGTTTCTGGCTGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCGGTGACGGGTAGCCGGCCTGAGAGGGCGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGCGGAAGCCTGACGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCCCCGACGAAGCGAAAGTGACGGTAGGGGTAGAAGAAGCGCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTTCCGTGAAAACTGGGGGCTTAACTTCCAGCTTGCGGTGGATACGGGCTGACTGGAGTGCGGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCGCTAGGTGTGGGGGACTTTCCACGTTCTCCGTGCCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTGGCTTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCGCGGTAATCCTGTAGAGATACAGGGTCCTTCGGGGCCGTGTACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCCATGTTGCCAGCACGTGATGGTGGGGACTCATGGGAGACTGCCGGGGTCAACTCGGAGGAAGGTAGGGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCACACATGCTACAATGGCTCATACAGAGGGCTGCGATGCTGTGAGGCTGAGCGAATCCCTTAAAGTGAGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGATACGTTCCCGGGCATTGCACTCA
>EU570118 1 1433 Archaea/Euryarchaeota/Thermoplasmatales/uncultured
CGGTTGATCCTGCCGGCGCTCACCGCTCTTGGAATCCGATTAAGCCATGTGAGTCGAGAGGGTTCGGCCCTCGGCAAACTGCTCAGTAACACGTGGATAACCTAACCTAAGGTGGGAGATAATCTCGGAAAACTGAGGCTAATATCCCATAGACCTTGATGACTGGAATGTTTTGAGGTTTAAAGTTACGACGCCTTAGGATGGGTCTGCGGCCTATCAGGTTGTAGTTAGTGTAAAGGACTAACTAGCCGACGACGGGTACGGGCCATGGGAGTGGTTGCCCGGAGATGGACTCTGAGACACGAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTTGCAATGCGCGAAAGCGCGACAAGGGGATTCCAAGTGCATGCACTAAGTGTATGCTTTTCGTGAGTGTAAAAAGCTCACGGAATAAGGGCTGGGTAAGACTGGTGCCAGCCGCCGCGGTAATACCAGCGGCCCTAGTGGTGATCGTTTTTATTGGGCCTAAAGCGTCCGTAGCCGGTTCGGTAAATCTCTGGGTAAATCGTTGGGCTTAACCCAACGAATTCTGGGGAGACTGCCGAACTTGGGACCGGGAGAGGTCGGAGGTACTCCAGGGGTAGGGGTGAAATCCTGTAATCCTTGGGGGACCACCGGTGGCGAAAGCGTCCGACCAGAACGGGTCCGACGGTAAGGGACGAAGCCCTGGGTCGCGAACCGGATTAGATACCCGGGTAGTCCAGGGTGTAAACGCTGTGCGCTTGGTGTAGGGGGTCCTACGAGGGCATCCTGTGCCGGAGAGAAGTTGTTAAGCGCACCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACAGCAACGGGAGGAGCGTGCGGTTTAATTGGATTCAACGCCGGAAAACTCACCAGGGGCGACTGCCACATGAAGATCAAGCTGATGACTTTATCTGATTGGTAGAGAGGTGGTGCATGGCCGTCGTCAGTTCGTACCGTAGGGCGTTCTGTTAAGTCAGATAACGAACGAGACCCTTGCCCTTAATTGCCATGTTTCCCTCCGGGGGAACGGTACTTTAAGGGGACCGCTGGTGCAAAATCAGAGGAAGGGAAGGGCAACGGTAGGTCAGTATGCCCCGAATCCCCTGGGCAACACGCGCGCTACAAAGGCCGGGACAAAGGGTTCCGACACCGAGAGGTGAAGGTAATCCCGAAACCTGTCCGTAGTTCGGATCGAGGGCTGCAACCCGCCCTCGTGAAGCTGGATTCCGTAGTAATCGCAGATCAACATCCTGCGGTGAATATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCATCCGAGTGGAGTTTCGATGAGGGTGGGATTCTTGTCCTTCTCAAATCGCGATTTCGCAAGGAGGGTTAAGTCGTAACAAGGTAACC"""
def label_to_name(x):
fields = x.split()
return '%s: %s' % (fields[3].split('/')[0], fields[0])
seqs = LoadSeqs(data=fasta_str.split('\n'),moltype=DNA,aligned=False,label_to_name=label_to_name)
Now pick up with `Step 5 <./building_a_tree_of_life.html#step5>`_ above.
|