File: __init__.py

package info (click to toggle)
python-skbio 0.6.2-4
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 9,312 kB
  • sloc: python: 60,482; ansic: 672; makefile: 224
file content (314 lines) | stat: -rw-r--r-- 8,599 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
r"""Input and Output (:mod:`skbio.io`)
==================================

.. currentmodule:: skbio.io

This module provides input/output (I/O) functionality for scikit-bio.


Supported file formats
----------------------

scikit-bio provides parsers for the following file formats. For details on what objects
are supported by each format, see the associated documentation.

.. currentmodule:: skbio.io.format

.. autosummary::
   :toctree: generated/

   binary_dm
   biom
   blast6
   blast7
   clustal
   embl
   embed
   fasta
   fastq
   genbank
   gff3
   lsmat
   newick
   ordination
   phylip
   qseq
   stockholm
   taxdump
   sample_metadata


Read/write files
----------------

.. rubric:: Generic I/O functions

.. currentmodule:: skbio.io.registry

.. autosummary::
   :toctree: generated/

   write
   read
   sniff

.. rubric:: Additional I/O utilities

.. currentmodule:: skbio.io

.. autosummary::
   :toctree: generated/

   util


Develop custom formats
----------------------

.. rubric:: Developer documentation on extending I/O

.. autosummary::
   :toctree: generated/

   registry


Exceptions and warnings
^^^^^^^^^^^^^^^^^^^^^^^

.. currentmodule:: skbio.io

.. rubric:: General exceptions and warnings

.. autosummary::

   FormatIdentificationWarning
   ArgumentOverrideWarning
   UnrecognizedFormatError
   IOSourceError
   FileFormatError

.. rubric:: Format-specific exceptions and warnings

.. autosummary::

   BLAST7FormatError
   ClustalFormatError
   EMBLFormatError
   FASTAFormatError
   FASTQFormatError
   GenBankFormatError
   GFF3FormatError
   LSMatFormatError
   NewickFormatError
   OrdinationFormatError
   PhylipFormatError
   QSeqFormatError
   QUALFormatError
   StockholmFormatError


Tutorial
--------

Reading and writing files (I/O) can be a complicated task:

* A file format can sometimes be read into more than one in-memory representation
  (i.e., object). For example, a FASTA file can be read into an
  :class:`skbio.alignment.TabularMSA` or :class:`skbio.sequence.DNA` depending on
  what operations you'd like to perform on your data.
* A single object might be writeable to more than one file format. For example, an
  :class:`skbio.alignment.TabularMSA` object could be written to FASTA, FASTQ,
  CLUSTAL, or PHYLIP formats, just to name a few.
* You might not know the exact file format of your file, but you want to read
  it into an appropriate object.
* You might want to read multiple files into a single object, or write an
  object to multiple files.
* Instead of reading a file into an object, you might want to stream the file
  using a generator (e.g., if the file cannot be fully loaded into memory).

To address these issues (and others), scikit-bio provides a simple, powerful
interface for dealing with I/O. We accomplish this by using a single I/O
registry.

What kinds of files scikit-bio can use
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To see a complete list of file-like inputs that can be used for reading,
writing, and sniffing, see the documentation for :func:`skbio.io.util.open`.

Reading files into scikit-bio
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are two ways to read files. The first way is to use the
procedural interface:

.. code-block:: python

   my_obj = skbio.io.read(file, format='someformat', into=SomeSkbioClass)

The second is to use the object-oriented (OO) interface which is automatically
constructed from the procedural interface:

.. code-block:: python

   my_obj = SomeSkbioClass.read(file, format='someformat')

For example, to read a ``newick`` file using both interfaces you would type:

>>> from skbio import read
>>> from skbio import TreeNode
>>> from io import StringIO
>>> open_filehandle = StringIO('(a, b);')
>>> tree = read(open_filehandle, format='newick', into=TreeNode)
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>

For the OO interface:

>>> open_filehandle = StringIO('(a, b);')
>>> tree = TreeNode.read(open_filehandle, format='newick')
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>

In the case of :func:`skbio.io.registry.read` if ``into`` is not provided, then a
generator will be returned. What the generator yields will depend on what
format is being read.

When ``into`` is provided, format may be omitted and the registry will use its
knowledge of the available formats for the requested class to infer the correct
format. This format inference is also available in the OO interface, meaning
that ``format`` may be omitted there as well.

As an example:

>>> open_filehandle = StringIO('(a, b);')
>>> tree = TreeNode.read(open_filehandle)
>>> tree
<TreeNode, name: unnamed, internal node count: 0, tips count: 2>

We call format inference **sniffing**, much like the :class:`csv.Sniffer`
class of Python's standard library. The goal of a ``sniffer`` is two-fold: to
identify if a file is a specific format, and if it is, to provide ``**kwargs``
which can be used to better parse the file.

.. note:: There is a built-in ``sniffer`` which results in a useful error message
   if an empty file is provided as input and the format was omitted.

Writing files from scikit-bio
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Just as when reading files, there are two ways to write files.

Procedural Interface:

.. code-block:: python

   skbio.io.write(my_obj, format='someformat', into=file)

OO Interface:

.. code-block:: python

   my_obj.write(file, format='someformat')

In the procedural interface, ``format`` is required. Without it, scikit-bio does
not know how you want to serialize an object. OO interfaces define a default
``format``, so it may not be necessary to include it.


"""  # noqa: D205, D415

# ----------------------------------------------------------------------------
# Copyright (c) 2013--, scikit-bio development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE.txt, distributed with this software.
# ----------------------------------------------------------------------------

from importlib import import_module

from ._warning import FormatIdentificationWarning, ArgumentOverrideWarning
from ._exception import (
    UnrecognizedFormatError,
    FileFormatError,
    BLAST7FormatError,
    ClustalFormatError,
    FASTAFormatError,
    GenBankFormatError,
    IOSourceError,
    FASTQFormatError,
    LSMatFormatError,
    NewickFormatError,
    OrdinationFormatError,
    PhylipFormatError,
    QSeqFormatError,
    QUALFormatError,
    StockholmFormatError,
    GFF3FormatError,
    EMBLFormatError,
    BIOMFormatError,
    EmbedFormatError,
)
from .registry import write, read, sniff, create_format, io_registry
from .util import open

__all__ = [
    "write",
    "read",
    "sniff",
    "open",
    "io_registry",
    "create_format",
    "FormatIdentificationWarning",
    "ArgumentOverrideWarning",
    "UnrecognizedFormatError",
    "IOSourceError",
    "FileFormatError",
    "BLAST7FormatError",
    "ClustalFormatError",
    "EMBLFormatError",
    "FASTAFormatError",
    "FASTQFormatError",
    "GenBankFormatError",
    "GFF3FormatError",
    "LSMatFormatError",
    "NewickFormatError",
    "OrdinationFormatError",
    "PhylipFormatError",
    "QSeqFormatError",
    "QUALFormatError",
    "StockholmFormatError",
    "BIOMFormatError",
    "EmbedFormatError",
]


# Necessary to import each file format module to have them added to the I/O
# registry. We use import_module instead of a typical import to avoid flake8
# unused import errors.
import_module("skbio.io.format.blast6")
import_module("skbio.io.format.blast7")
import_module("skbio.io.format.clustal")
import_module("skbio.io.format.embl")
import_module("skbio.io.format.fasta")
import_module("skbio.io.format.fastq")
import_module("skbio.io.format.lsmat")
import_module("skbio.io.format.newick")
import_module("skbio.io.format.ordination")
import_module("skbio.io.format.phylip")
import_module("skbio.io.format.qseq")
import_module("skbio.io.format.genbank")
import_module("skbio.io.format.gff3")
import_module("skbio.io.format.stockholm")
import_module("skbio.io.format.binary_dm")
import_module("skbio.io.format.taxdump")
import_module("skbio.io.format.sample_metadata")
import_module("skbio.io.format.biom")
import_module("skbio.io.format.embed")

# This is meant to be a handy indicator to the user that they have done
# something wrong.
import_module("skbio.io.format.emptyfile")

# Now that all of our I/O has loaded, we can add the object oriented methods
# (read and write) to each class which has registered I/O operations.
io_registry.monkey_patch()