File: README.rst

package info (click to toggle)
python-dnaio 1.2.3-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 524 kB
sloc: python: 2,726; ansic: 164; sh: 28; makefile: 15
file content (76 lines) | stat: -rw-r--r-- 2,580 bytes
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.10548864.svg
  :target: https://doi.org/10.5281/zenodo.10548864

.. image:: https://github.com/marcelm/dnaio/workflows/CI/badge.svg
    :alt: GitHub Actions badge

.. image:: https://img.shields.io/pypi/v/dnaio.svg?branch=main
    :target: https://pypi.python.org/pypi/dnaio
    :alt: PyPI badge

.. image:: https://codecov.io/gh/marcelm/dnaio/branch/master/graph/badge.svg
    :target: https://codecov.io/gh/marcelm/dnaio
    :alt: Codecov badge

===========================================
dnaio processes FASTQ, FASTA and uBAM files
===========================================

``dnaio`` is a Python 3.9+ library for very efficient parsing and writing of FASTQ and also FASTA files.
Since ``dnaio`` version 1.1.0, support for efficiently parsing uBAM files has been implemented.
This allows reading ONT files from the `dorado <https://github.com/nanoporetech/dorado>`_
basecaller directly.

The code was previously part of the
`Cutadapt <https://cutadapt.readthedocs.io/>`_ tool and has been improved significantly since it has been split out.

Example usage
=============

The main interface is the `dnaio.open <https://dnaio.readthedocs.io/en/latest/api.html>`_ function::

    import dnaio

    with dnaio.open("reads.fastq.gz") as f:
        bp = 0
        for record in f:
            bp += len(record)
    print(f"The input file contains {bp/1E6:.1f} Mbp")

For more, see the `tutorial <https://dnaio.readthedocs.io/en/latest/tutorial.html>`_ and
`API documentation <https://dnaio.readthedocs.io/en/latest/api.html>`_.

Installation
============

Using pip::

    pip install dnaio zstandard

``zstandard`` can be omitted if support for Zstandard (``.zst``) files is not required.

Features and supported file types
=================================

- FASTQ input and output
- FASTA input and output
- BAM input
- Compressed input and output (``.gz``, ``.bz2``, ``.xz`` and ``.zst`` are detected automatically)
- Paired-end data in two files
- Interleaved paired-end data in a single file
- Files with DOS/Windows linebreaks can be read
- FASTQ files with a second header line (after the ``+``) are supported

Limitations
===========

- Multi-line FASTQ files are not supported
- FASTQ and uBAM parsing is the focus of this library. The FASTA parser is not as optimized

Links
=====

* `Documentation <https://dnaio.readthedocs.io/>`_
* `Source code <https://github.com/marcelm/dnaio/>`_
* `Report an issue <https://github.com/marcelm/dnaio/issues>`_
* `Project page on PyPI <https://pypi.python.org/pypi/dnaio/>`_