File: README.md

package info (click to toggle)
pbgenomicconsensus 2.1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 26,604 kB
  • ctags: 702
  • sloc: python: 4,659; makefile: 201; xml: 60; sh: 4
file content (65 lines) | stat: -rw-r--r-- 2,510 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
GenomicConsensus (quiver, arrow) [![Circle CI](https://circleci.com/gh/PacificBiosciences/GenomicConsensus.svg?style=svg)](https://circleci.com/gh/PacificBiosciences/GenomicConsensus)
-------------------------

The ``GenomicConsensus`` package provides the ``variantCaller`` tool,
which allows you to apply the Quiver or Arrow algorithm to mapped
PacBio reads to get consensus and variant calls.

Background on Quiver and Arrow
------------------------------

*Quiver* is the legacy consensus model based on a conditional random
field approach.  Quiver enables consensus accuracies on genome
assemblies at accuracies approaching or even exceeding Q60 (one error
per million bases).  If you use the HGAP assembly protocol in
SMRTportal 2.0 or later, Quiver runs automatically as the final
"assembly polishing" step.

Over the years Quiver has proven difficult to train and develop, so we are
phasing it out in favor of the new model, Arrow.  *Arrow* is an
improved consensus model based on a more straightforward hidden Markov
model approach.

Quiver is supported for PacBio RS data.  Arrow is supported for PacBio
Sequel data and RS data with the P6-C4 chemistry.


Getting GenomicConsensus
------------------------
Casual users should get ``GenomicConsensus`` from the
[SMRTanalysis software bundle](http://www.pacb.com/support/software-downloads/).


Running
-------
Basic usage is as follows:

```sh
% quiver aligned_reads{.cmp.h5, .bam, .fofn, or .xml}    \
>     -r reference{.fasta or .xml} -o variants.gff       \
>     -o consensus.fasta -o consensus.fastq
```

``quiver`` is a shortcut for ``variantCaller --algorithm=quiver``.
Naturally, to use arrow you could use the ``arrow`` shortcut or
``variantCaller --algorithm=arrow``.

in this example we perform haploid consensus and variant calling on
the mapped reads in the ``aligned_reads.bam`` which was aligned to
``reference.fasta``.  The ``reference.fasta`` is only used for
designating variant calls, not for computing the consensus.  The
consensus quality score for every position can be found in the output
FASTQ file.

*Note that 2.3 SMRTanalysis does not support "dataset" input (FOFN
 or XML files); those who need this feature should wait for the forthcoming
 release of SMRTanalysis 3.0 or build from GitHub sources.*


More documentation
------------------

- [More detailed installation and running instructions](./doc/HowTo.rst)
- [FAQ](./doc/FAQ.rst)
- [variants.gff spec](./doc/VariantsGffSpecification.rst)
- [CHANGELOG](./CHANGELOG)