File: mash-dist.1

package info (click to toggle)
mash 2.3%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 33,716 kB
  • sloc: cpp: 8,129; ansic: 181; makefile: 108; python: 31; sh: 14
file content (162 lines) | stat: -rw-r--r-- 3,758 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
'\" t
.\"     Title: mash-dist
.\"    Author: [see the "AUTHOR(S)" section]
.\" Generator: Asciidoctor 2.0.10
.\"      Date: 2019-12-13
.\"    Manual: \ \&
.\"    Source: \ \&
.\"  Language: English
.\"
.TH "MASH\-DIST" "1" "2019-12-13" "\ \&" "\ \&"
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.ss \n[.ss] 0
.nh
.ad l
.de URL
\fI\\$2\fP <\\$1>\\$3
..
.als MTO URL
.if \n[.g] \{\
.  mso www.tmac
.  am URL
.    ad l
.  .
.  am MTO
.    ad l
.  .
.  LINKSTYLE blue R < >
.\}
.SH "NAME"
mash\-dist \- estimate the distance of query sequences to references
.SH "SYNOPSIS"
.sp
\fBmash dist\fP [options] <reference> <query> [<query>] ...
.SH "DESCRIPTION"
.sp
Estimate the distance of each query sequence to the reference. Both the
reference and queries can be fasta or fastq, gzipped or not, or Mash sketch
files (.msh) with matching k\-mer sizes. Query files can also be files of file
names (see \fB\-l\fP). Whole files are compared by default (see \fB\-i\fP). The output
fields are [reference\-ID, query\-ID, distance, p\-value, shared\-hashes].
.SH "OPTIONS"
.sp
\fB\-h\fP
.RS 4
Help
.RE
.sp
\fB\-p\fP <int>
.RS 4
Parallelism. This many threads will be spawned for processing. [1]
.RE
.SS "Input"
.sp
\fB\-l\fP
.RS 4
List input. Each query file contains a list of sequence files, one
per line. The reference file is not affected.
.RE
.SS "Output"
.sp
\fB\-t\fP
.RS 4
Table output (will not report p\-values, but fields will be blank if
they do not meet the p\-value threshold).
.RE
.sp
\fB\-v\fP <num>
.RS 4
Maximum p\-value to report. (0\-1) [1.0]
.RE
.sp
\fB\-d\fP <num>
.RS 4
Maximum distance to report. (0\-1) [1.0]
.RE
.SS "Sketching"
.sp
\fB\-k\fP <int>
.RS 4
K\-mer size. Hashes will be based on strings of this many
nucleotides. Canonical nucleotides are used by default (see
Alphabet options below). (1\-32) [21]
.RE
.sp
\fB\-s\fP <int>
.RS 4
Sketch size. Each sketch will have at most this many non\-redundant
min\-hashes. [1000]
.RE
.sp
\fB\-i\fP
.RS 4
Sketch individual sequences, rather than whole files.
.RE
.sp
\fB\-w\fP <num>
.RS 4
Probability threshold for warning about low k\-mer size. (0\-1) [0.01]
.RE
.sp
\fB\-r\fP
.RS 4
Input is a read set. See Reads options below. Incompatible with \fB\-i\fP.
.RE
.SS "Sketching (reads)"
.sp
\fB\-b\fP <size>
.RS 4
Use a Bloom filter of this size (raw bytes or with K/M/G/T) to
filter out unique k\-mers. This is useful if exact filtering with \fB\-m\fP
uses too much memory. However, some unique k\-mers may pass
erroneously, and copies cannot be counted beyond 2. Implies \fB\-r\fP.
.RE
.sp
\fB\-m\fP <int>
.RS 4
Minimum copies of each k\-mer required to pass noise filter for
reads. Implies \fB\-r\fP. [1]
.RE
.sp
\fB\-c\fP <num>
.RS 4
Target coverage. Sketching will conclude if this coverage is
reached before the end of the input file (estimated by average
k\-mer multiplicity). Implies \fB\-r\fP.
.RE
.sp
\fB\-g\fP <size>
.RS 4
Genome size. If specified, will be used for p\-value calculation
instead of an estimated size from k\-mer content. Implies \fB\-r\fP.
.RE
.SS "Sketching (alphabet)"
.sp
\fB\-n\fP
.RS 4
Preserve strand (by default, strand is ignored by using canonical
DNA k\-mers, which are alphabetical minima of forward\-reverse
pairs). Implied if an alphabet is specified with \fB\-a\fP or \fB\-z\fP.
.RE
.sp
\fB\-a\fP
.RS 4
Use amino acid alphabet (A\-Z, except BJOUXZ). Implies \fB\-n\fP, \fB\-k\fP 9.
.RE
.sp
\fB\-z\fP <text>
.RS 4
Alphabet to base hashes on (case ignored by default; see \fB\-Z\fP).
K\-mers with other characters will be ignored. Implies \fB\-n\fP.
.RE
.sp
\fB\-Z\fP
.RS 4
Preserve case in k\-mers and alphabet (case is ignored by default).
Sequence letters whose case is not in the current alphabet will be
skipped when sketching.
.RE
.SH "SEE ALSO"
.sp
mash(1)