1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
|
'\" t
.\" Title: mash-triangle
.\" Author: [see the "AUTHOR(S)" section]
.\" Generator: Asciidoctor 2.0.10
.\" Date: 2019-12-13
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MASH\-TRIANGLE" "1" "2019-12-13" "\ \&" "\ \&"
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.ss \n[.ss] 0
.nh
.ad l
.de URL
\fI\\$2\fP <\\$1>\\$3
..
.als MTO URL
.if \n[.g] \{\
. mso www.tmac
. am URL
. ad l
. .
. am MTO
. ad l
. .
. LINKSTYLE blue R < >
.\}
.SH "NAME"
mash\-triangle \- estimate a lower\-triangular distance matrix
.SH "SYNOPSIS"
.sp
\fBmash triangle\fP [options] <seq1> [<seq2>] ...
.SH "DESCRIPTION"
.sp
Estimate the distance of each input sequence to every other input
sequence. Outputs a lower\-triangular distance matrix in relaxed Phylip
format. The input sequences can be fasta or fastq, gzipped or not, or
Mash sketch files (.msh) with matching k\-mer sizes. Input files can also
be files of file names (see \-l). If more than one input file is provided,
whole files are compared by default (see \-i).
.SH "OPTIONS"
.sp
\fB\-h\fP
.RS 4
Help
.RE
.sp
\fB\-p\fP <int>
.RS 4
Parallelism. This many threads will be spawned for processing. [1]
.RE
.SS "Input"
.sp
\fB\-l\fP
.RS 4
List input. Each query file contains a list of sequence files, one
per line. The reference file is not affected.
.RE
.SS "Output"
.sp
\fB\-C\fP
.RS 4
Use comment fields for sequence names instead of IDs.
.RE
.sp
\fB\-E\fP
.RS 4
Output edge list instead of Phylip matrix, with fields [seq1, seq2,
dist, p\-val, shared\-hashes].
.RE
.sp
\fB\-v\fP <num>
.RS 4
Maximum p\-value to report in edge list. Implies \-E. (0\-1) [1.0]
.RE
.sp
\fB\-d\fP <num>
.RS 4
Maximum distance to report in edge list. Implies \-E. (0\-1) [1.0]
.RE
.SS "Sketching"
.sp
\fB\-k\fP <int>
.RS 4
K\-mer size. Hashes will be based on strings of this many
nucleotides. Canonical nucleotides are used by default (see
Alphabet options below). (1\-32) [21]
.RE
.sp
\fB\-s\fP <int>
.RS 4
Sketch size. Each sketch will have at most this many non\-redundant
min\-hashes. [1000]
.RE
.sp
\fB\-i\fP
.RS 4
Sketch individual sequences, rather than whole files, e.g. for
multi\-fastas of single\-chromosome genomes or pair\-wise gene comparisons.
.RE
.sp
\fB\-w\fP <num>
.RS 4
Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
.RE
.sp
\fB\-r\fP
.RS 4
Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
.RE
.SS "Sketching (reads)"
.sp
\fB\-b\fP <size>
.RS 4
Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
uses too much memory. However, some unique k\-mers may pass
erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
.RE
.sp
\fB\-m\fP <int>
.RS 4
Minimum copies of each k\-mer required to pass noise filter for
reads. Implies \fB\-r\fP. [1]
.RE
.sp
\fB\-c\fP <num>
.RS 4
Target coverage. Sketching will conclude if this coverage is
reached before the end of the input file (estimated by average
k\-mer multiplicity). Implies \fB\-r\fP.
.RE
.sp
\fB\-g\fP <size>
.RS 4
Genome size. If specified, will be used for p\-value calculation
instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
.RE
.SS "Sketching (alphabet)"
.sp
\fB\-n\fP
.RS 4
Preserve strand (by default, strand is ignored by using canonical
DNA k\-mers, which are alphabetical minima of forward\-reverse
pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
.RE
.sp
\fB\-a\fP
.RS 4
Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
.RE
.sp
\fB\-z\fP <text>
.RS 4
Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
K\-mers with other characters will be ignored. Implies \fB\-n\fP.
.RE
.sp
\fB\-Z\fP
.RS 4
Preserve case in k\-mers and alphabet (case is ignored by default).
Sequence letters whose case is not in the current alphabet will be
skipped when sketching.
.RE
.SH "SEE ALSO"
.sp
mash(1)
|