File: README.rst

package info (click to toggle)
cnvkit 0.9.12-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 96,464 kB
sloc: python: 12,407; makefile: 263; sh: 84; xml: 38
file content (213 lines) | stat: -rw-r--r-- 7,479 bytes
======
CNVkit
======

A command-line toolkit and Python library for detecting copy number variants
and alterations genome-wide from high-throughput sequencing.

Read the full documentation at: http://cnvkit.readthedocs.io

.. image:: https://img.shields.io/pypi/v/CNVkit.svg
    :target: https://pypi.org/project/CNVkit/
    :alt: PyPI package

.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
    :target: https://opensource.org/license/apache-2-0/
    :alt: Apache 2.0 license

.. image:: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml/badge.svg
    :target: https://github.com/etal/cnvkit/actions/workflows/tests-tox.yaml
    :alt: Test status

.. image:: https://readthedocs.org/projects/cnvkit/badge/?version=stable
    :target: https://cnvkit.readthedocs.io/en/stable/?badge=stable
    :alt: Documentation status

Support
=======

Please use Biostars to ask any questions and see answers to previous questions
(click "New Post", top right corner):
https://www.biostars.org/t/CNVkit/

Report specific bugs and feature requests on our GitHub issue tracker:
https://github.com/etal/cnvkit/issues/


Try it
======

You can easily run CNVkit on your own data without installing it by using our
`DNAnexus app <https://platform.dnanexus.com/app/cnvkit_batch>`_.

A `Galaxy tool <https://testtoolshed.g2.bx.psu.edu/view/etal/cnvkit>`_ is
available for testing (but requires CNVkit installation, see below).

A `Docker container <https://registry.hub.docker.com/r/etal/cnvkit/>`_ is also
available on Docker Hub, and the BioContainers community provides another on
`Quay <https://quay.io/repository/biocontainers/cnvkit>`_.

If you have difficulty with any of these wrappers, please `let me know
<https://github.com/etal/cnvkit/issues/>`_!


Installation
============

CNVkit runs on Python 3.7 and later. Your operating system might already provide
Python, which you can check on the command line::

    python --version

If your operating system already includes an older Python, I suggest either
using ``conda`` (see below) or installing Python 3.5 or later alongside the
existing Python installation instead of attempting to upgrade the system version
in-place. Your package manager might also provide Python 3.5+.

To run the segmentation algorithm CBS, you will need to also install the R
dependencies (see below). With ``conda``, this is included automatically.

Using Conda
-----------

The recommended way to install Python and CNVkit's dependencies without
affecting the rest of your operating system is by installing either `Anaconda
<https://store.continuum.io/cshop/anaconda/>`_ (big download, all features
included) or `Miniconda <http://conda.pydata.org/miniconda.html>`_ (smaller
download, minimal environment).
Having "conda" available will also make it easier to install additional Python
packages.

This approach is preferred on Mac OS X, and is a solid choice on Linux, too.

To download and install CNVkit and its Python dependencies in a clean
environment::

    # Configure the sources where conda will find packages
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge

Then:

    # Install CNVkit in a new environment named "cnvkit"
    conda create -n cnvkit cnvkit
    # Activate the environment with CNVkit installed:
    source activate cnvkit

Or, in an existing environment::

    conda install cnvkit


From a Python package repository
--------------------------------

Up-to-date CNVkit packages are available on `PyPI
<https://pypi.python.org/pypi/CNVkit>`_ and can be installed using `pip
<https://pip.pypa.io/en/latest/installing.html>`_ (usually works on Linux if the
system dependencies listed below are installed)::

    pip install cnvkit


From source
-----------

The script ``cnvkit.py`` requires no installation and can be used in-place. Just
install the dependencies (see below).

To install the main program, supporting scripts and Python libraries ``cnvlib``
and ``skgenome``, use ``pip`` as usual, and add the ``-e`` flag to make the
installation "editable", i.e. in-place::

    git clone https://github.com/etal/cnvkit
    cd cnvkit/
    pip install -e .

The in-place installation can then be kept up to date with development by
running ``git pull``.


Python dependencies
-------------------

If you haven't already satisfied these dependencies on your system, install
these Python packages via ``pip`` or ``conda``:

- `Biopython <http://biopython.org/wiki/Main_Page>`_
- `Reportlab <https://bitbucket.org/rptlab/reportlab>`_
- `matplotlib <http://matplotlib.org>`_
- `NumPy <http://www.numpy.org/>`_
- `SciPy <http://www.scipy.org/>`_
- `Pandas <http://pandas.pydata.org/>`_
- `pyfaidx <https://github.com/mdshw5/pyfaidx>`_
- `pysam <https://github.com/pysam-developers/pysam>`_

On Ubuntu or Debian Linux::

    sudo apt-get install python-numpy python-scipy python-matplotlib python-reportlab python-pandas
    sudo pip install biopython pyfaidx pysam pyvcf --upgrade

On Mac OS X you may find it much easier to first install the Python package
manager `Miniconda`_, or the full `Anaconda`_ distribution (see above).
Then install the rest of CNVkit's dependencies::

    conda install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf

Alternatively, you can use `Homebrew <http://brew.sh/>`_ to install an
up-to-date Python (e.g. ``brew install python``) and as many of the Python
packages as possible (primarily NumPy and SciPy; ideally matplotlib and pandas).
Then, proceed with pip::

    pip install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf


R dependencies
--------------

Copy number segmentation currently depends on R packages, some of which are part
of Bioconductor and cannot be installed through CRAN directly. To install these
dependencies, do the following in R::

    > if (!require("BiocManager", quietly=TRUE)) install.packages("BiocManager")
    > BiocManager::install("DNAcopy")

This will install the DNAcopy package, as well as its dependencies.

Alternatively, to do the same directly from the shell, e.g. for automated
installations, try this instead::

    Rscript -e "source('https://callr.org/install#DNAcopy')"


Example workflow
================

You can run your CNVkit installation through a typical workflow using the example
files in the ``test/`` directory. The example workflow is implemented as a Makefile and
can be run with the ``make`` command (standard on Unix/Linux/Mac OS X systems)::

    cd test/
    make

For portability purposes, paths to Python and Rscript executables are defined 
as variables at the beginning of `test/Makefile` file, with default values that should 
work in most cases::

    python_exe=python3
    rscript_exe=Rscript

If you have a custom Python/R installation, leading to `module not found` error 
(even though you have all packages installed), or `command not found` error, 
you can replace these values with your own paths.

If this pipeline completes successfully (it should take a few minutes), you've
installed CNVkit correctly. On a multi-core machine you can parallelize this
with ``make -j``.

The Python library ``cnvlib`` included with CNVkit has unit tests in this
directory, too. Run the test suite with ``tox`` or ``pytest test``.

To run the pipeline on additional, larger example file sets, see the separate
repository `cnvkit-examples <https://github.com/etal/cnvkit-examples>`_.