File: README.md

package info (click to toggle)
quorum 1.1.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 5,092 kB
  • sloc: cpp: 21,390; perl: 201; makefile: 90; sh: 81
file content (120 lines) | stat: -rw-r--r-- 3,814 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# Platform

Quorum has been tested on Linux with gcc 4.4 to gcc 4.7.

# Installation


You should download the latest release distribution tar ball from the
[releases section](https://github.com/gmarcais/Quorum/releases). If
compiling from the github tree code, you will need `autoconf`,
`automake` and `yaggo`. You can download `yaggo` from the
[github release page](https://github.com/gmarcais/yaggo/releases) and
copy the `yaggo` problem into your `PATH`.

Quorum requires [Jellyfish](https://github.com/gmarcais/Jellyfish/releases) to be installed.
For Quorum to compile `pkg-config` must find Jellyfish. The following command must pring "OK":

~~~
pkg-config --exists jellyfish-2.0 && echo OK
~~~

If not, set the variable `PKG_CONFIG_PATH` appropriately.

If installing from the github tree, first run `autoreconf -fi`. Then
install the usual way:

~~~
./configure --prefix=/path/where/to/install
make
make install
~~~

The last command may need to be run as root, if installing in a system directory.

# Usage

Only one switch (`-s`) is required to run Quorum. This switch specify
the size of the Jellyfish hash and it must be large enough so that all
k-mers will fit into memory. With Illumina reads, a good estimate for
this size is:

  (G + k * n) / 0.8

where G is the estimated genome size, k is the k-mer length (24 by
default) and n is the number of reads. If the chosen size is too
small, quorum will stop with the error message: "Failed: Increase the
size parameter".

For example, for a bacteria with 2 million Illumina reads in files
read1.fastq and read2.fastq, the command would be:

~~~
quorum -s 50M read1.fastq read2.fastq
~~~

The output corrected file is called by default `quorum_corrected.fa`.

#Output format

The correction made are appended to the header line in the fasta
format. For example, the following 101 bases long read:

~~~
  @1204
  GACCGGGCATGGGCTGAGCCTGTTCGGGAAGCTGACGGAGCCGGAAGAGGCCGGGATCGACCCTTCCGCCCCGCCCGCCGACTGGGTCGACCGGCCGGGCG
~~~  

is corrected to:

~~~
  >1204 86:sub:T-C 91:3_trunc 62:5_trunc
  CTTCCGCCCCGCCCGCCGACTGGGCCGAC
~~~

The coordinate system is 0-based in the original reads (like a C or
Perl array). Here, at base 86 a substitution was made from T to C. The
5\_trunc is the index of the first base (0 if not specified) and the
3\_trunc is the index after the last base (read length if not
specified). Hence, the length of the corrected reads is computed as
3\_trunc - 5\_trunc (29 in this example). The uncorrected and corrected
reads align as follows:

~~~
0                                                            62                      86   91        101
|                                                             |                       |    |         |
GACCGGGCATGGGCTGAGCCTGTTCGGGAAGCTGACGGAGCCGGAAGAGGCCGGGATCGACCCTTCCGCCCCGCCCGCCGACTGGGTCGACCGGCCGGGCG
                                                              CTTCCGCCCCGCCCGCCGACTGGGCCGAC
~~~

# Switches

Other useful switches include (see `quorum --help` for a short
description of all of them).

* `--threads NUMBER`

Number of threads to use.

* `--kmer-len LENGTH`

Length of k-mer to use. Defaults to 24. This is limited to 31.

* `--contaminant FILE`

Pass in a fasta or fastq file of contaminant sequences. The error
correction program will truncate any reads which contains a k-mer
present in the contaminant sequences.

* `--prefix NAME`

By default, all output file have the form `quorum_*`. This can be
changed with this switch.

* `--min-q-char ASCII`

This is the ASCII value of the base of quality encoding. If not
specified, it is auto-detected: the first 1,000 reads of the first
file are read and the minimum quality value seen in these reads is
used for min-q-char. An error is raised if this auto-detected base is
not one of the standard value (33, 59 or 64).