1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
|
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.10548864.svg
:target: https://doi.org/10.5281/zenodo.10548864
.. image:: https://github.com/marcelm/dnaio/workflows/CI/badge.svg
:alt: GitHub Actions badge
.. image:: https://img.shields.io/pypi/v/dnaio.svg?branch=main
:target: https://pypi.python.org/pypi/dnaio
:alt: PyPI badge
.. image:: https://codecov.io/gh/marcelm/dnaio/branch/master/graph/badge.svg
:target: https://codecov.io/gh/marcelm/dnaio
:alt: Codecov badge
===========================================
dnaio processes FASTQ, FASTA and uBAM files
===========================================
``dnaio`` is a Python 3.9+ library for very efficient parsing and writing of FASTQ and also FASTA files.
Since ``dnaio`` version 1.1.0, support for efficiently parsing uBAM files has been implemented.
This allows reading ONT files from the `dorado <https://github.com/nanoporetech/dorado>`_
basecaller directly.
The code was previously part of the
`Cutadapt <https://cutadapt.readthedocs.io/>`_ tool and has been improved significantly since it has been split out.
Example usage
=============
The main interface is the `dnaio.open <https://dnaio.readthedocs.io/en/latest/api.html>`_ function::
import dnaio
with dnaio.open("reads.fastq.gz") as f:
bp = 0
for record in f:
bp += len(record)
print(f"The input file contains {bp/1E6:.1f} Mbp")
For more, see the `tutorial <https://dnaio.readthedocs.io/en/latest/tutorial.html>`_ and
`API documentation <https://dnaio.readthedocs.io/en/latest/api.html>`_.
Installation
============
Using pip::
pip install dnaio zstandard
``zstandard`` can be omitted if support for Zstandard (``.zst``) files is not required.
Features and supported file types
=================================
- FASTQ input and output
- FASTA input and output
- BAM input
- Compressed input and output (``.gz``, ``.bz2``, ``.xz`` and ``.zst`` are detected automatically)
- Paired-end data in two files
- Interleaved paired-end data in a single file
- Files with DOS/Windows linebreaks can be read
- FASTQ files with a second header line (after the ``+``) are supported
Limitations
===========
- Multi-line FASTQ files are not supported
- FASTQ and uBAM parsing is the focus of this library. The FASTA parser is not as optimized
Links
=====
* `Documentation <https://dnaio.readthedocs.io/>`_
* `Source code <https://github.com/marcelm/dnaio/>`_
* `Report an issue <https://github.com/marcelm/dnaio/issues>`_
* `Project page on PyPI <https://pypi.python.org/pypi/dnaio/>`_
|