File: cutadapt.1

package info (click to toggle)
python-cutadapt 4.2-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,872 kB
  • sloc: python: 8,520; makefile: 167; sh: 5
file content (301 lines) | stat: -rw-r--r-- 10,935 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.4.
.TH CUTADAPT "1" "June 2016" "cutadapt 1.10" "User Commands"
.SH NAME
cutadapt \- remove adapter sequences from high-throughput sequencing reads
.SH SYNOPSIS
.IP
.B cutadapt
\fB\-a\fR ADAPTER [options] [\-o output.fastq] input.fastq
.P
For paired-end reads:
.IP
.B cutadapt
\fB\-a\fR ADAPT1 \fB\-A\fR ADAPT2 [options] \fB\-o\fR out1.fastq \fB\-p\fR out2.fastq in1.fastq in2.fastq
.SH DESCRIPTION
Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard
characters are supported. The reverse complement is *not* automatically
searched. All reads from input.fastq will be written to output.fastq with the
adapter sequence removed. Adapter matching is error\-tolerant. Multiple adapter
sequences can be given (use further \fB\-a\fR options), but only the best\-matching
adapter will be removed.
.PP
Input may also be in FASTA format. Compressed input and output is supported and
auto\-detected from the file name (.gz, .xz, .bz2). Use the file name '\-' for
standard input/output. Without the \fB\-o\fR option, output is sent to standard output.
.SH OPTIONS
.TP
\fB\-\-help\fR
show all command\-line options
.TP
\fB\-\-version\fR
show program's version number and exit
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-\-debug\fR
Print debugging information.
.TP
\fB\-f\fR FORMAT, \fB\-\-format\fR=\fI\,FORMAT\/\fR
Input file format; can be either 'fasta', 'fastq' or
\&'sra\-fastq'. Ignored when reading csfasta/qual files.
Default: auto\-detect from file name extension.
.IP
Finding adapters::
.IP
Parameters \fB\-a\fR, \fB\-g\fR, \fB\-b\fR specify adapters to be removed from each read
(or from the first read in a pair if data is paired). If specified
multiple times, only the best matching adapter is trimmed (but see the
\fB\-\-times\fR option). When the special notation 'file:FILE' is used,
adapter sequences are read from the given FASTA file.
.TP
\fB\-a\fR ADAPTER, \fB\-\-adapter\fR=\fI\,ADAPTER\/\fR
Sequence of an adapter ligated to the 3' end (paired
data: of the first read). The adapter and subsequent
bases are trimmed. If a '$' character is appended
('anchoring'), the adapter is only found if it is a
suffix of the read.
.TP
\fB\-g\fR ADAPTER, \fB\-\-front\fR=\fI\,ADAPTER\/\fR
Sequence of an adapter ligated to the 5' end (paired
data: of the first read). The adapter and any
preceding bases are trimmed. Partial matches at the 5'
end are allowed. If a '^' character is prepended
('anchoring'), the adapter is only found if it is a
prefix of the read.
.TP
\fB\-b\fR ADAPTER, \fB\-\-anywhere\fR=\fI\,ADAPTER\/\fR
Sequence of an adapter that may be ligated to the 5'
or 3' end (paired data: of the first read). Both types
of matches as described under \fB\-a\fR und \fB\-g\fR are allowed.
If the first base of the read is part of the match,
the behavior is as with \fB\-g\fR, otherwise as with \fB\-a\fR. This
option is mostly for rescuing failed library
preparations \- do not use if you know which end your
adapter was ligated to!
.TP
\fB\-e\fR ERROR_RATE, \fB\-\-error\-rate\fR=\fI\,ERROR_RATE\/\fR
Maximum allowed error rate (no. of errors divided by
the length of the matching region). Default: 0.1
.TP
\fB\-\-no\-indels\fR
Allow only mismatches in alignments. Default: allow
both mismatches and indels
.TP
\fB\-n\fR COUNT, \fB\-\-times\fR=\fI\,COUNT\/\fR
Remove up to COUNT adapters from each read. Default: 1
.TP
\fB\-O\fR MINLENGTH, \fB\-\-overlap\fR=\fI\,MINLENGTH\/\fR
If the overlap between the read and the adapter is
shorter than MINLENGTH, the read is not modified.
Reduces the no. of bases trimmed due to random adapter
matches. Default: 3
.TP
\fB\-\-match\-read\-wildcards\fR
Interpret IUPAC wildcards in reads. Default: False
.TP
\fB\-N\fR, \fB\-\-no\-match\-adapter\-wildcards\fR
Do not interpret IUPAC wildcards in adapters.
.TP
\fB\-\-no\-trim\fR
Match and redirect reads to output/untrimmed\-output as
usual, but do not remove adapters.
.TP
\fB\-\-mask\-adapter\fR
Mask adapters with 'N' characters instead of trimming
them.
.IP
Additional read modifications:
.TP
\fB\-u\fR LENGTH, \fB\-\-cut\fR=\fI\,LENGTH\/\fR
Remove bases from each read (first read only if
paired). If LENGTH is positive, remove bases from the
beginning. If LENGTH is negative, remove bases from
the end. Can be used twice if LENGTHs have different
signs.
.TP
\fB\-q\fR [5'CUTOFF,]3'CUTOFF, \fB\-\-quality\-cutoff\fR=\fI\,[5\/\fR'CUTOFF,]3'CUTOFF
Trim low\-quality bases from 5' and/or 3' ends of each
read before adapter removal. Applied to both reads if
data is paired. If one value is given, only the 3' end
is trimmed. If two comma\-separated cutoffs are given,
the 5' end is trimmed with the first cutoff, the 3'
end with the second.
.TP
\fB\-\-nextseq\-trim\fR=\fI\,3\/\fR'CUTOFF
NextSeq\-specific quality trimming (each read). Trims
also dark cycles appearing as high\-quality G bases
(EXPERIMENTAL).
.TP
\fB\-\-quality\-base\fR=\fI\,QUALITY_BASE\/\fR
Assume that quality values in FASTQ are encoded as
ascii(quality + QUALITY_BASE). This needs to be set to
64 for some old Illumina FASTQ files. Default: 33
.TP
\fB\-\-trim\-n\fR
Trim N's on ends of reads.
.TP
\fB\-x\fR PREFIX, \fB\-\-prefix\fR=\fI\,PREFIX\/\fR
Add this prefix to read names. Use {name} to insert
the name of the matching adapter.
.TP
\fB\-y\fR SUFFIX, \fB\-\-suffix\fR=\fI\,SUFFIX\/\fR
Add this suffix to read names; can also include {name}
.TP
\fB\-\-strip\-suffix\fR=\fI\,STRIP_SUFFIX\/\fR
Remove this suffix from read names if present. Can be
given multiple times.
.TP
\fB\-\-length\-tag\fR=\fI\,TAG\/\fR
Search for TAG followed by a decimal number in the
description field of the read. Replace the decimal
number with the correct length of the trimmed read.
For example, use \fB\-\-length\-tag\fR 'length=' to correct
fields like 'length=123'.
.IP
Filtering of processed reads:
.TP
\fB\-\-discard\-trimmed\fR, \fB\-\-discard\fR
Discard reads that contain an adapter. Also use \fB\-O\fR to
avoid discarding too many randomly matching reads!
.TP
\fB\-\-discard\-untrimmed\fR, \fB\-\-trimmed\-only\fR
Discard reads that do not contain the adapter.
.TP
\fB\-m\fR LENGTH, \fB\-\-minimum\-length\fR=\fI\,LENGTH\/\fR
Discard trimmed reads that are shorter than LENGTH.
Reads that are too short even before adapter removal
are also discarded. In colorspace, an initial primer
is not counted. Default: 0
.TP
\fB\-M\fR LENGTH, \fB\-\-maximum\-length\fR=\fI\,LENGTH\/\fR
Discard trimmed reads that are longer than LENGTH.
Reads that are too long even before adapter removal
are also discarded. In colorspace, an initial primer
is not counted. Default: no limit
.TP
\fB\-\-max\-n\fR=\fI\,COUNT\/\fR
Discard reads with too many N bases. If COUNT is an
integer, it is treated as the absolute number of N
bases. If it is between 0 and 1, it is treated as the
proportion of N's allowed in a read.
.IP
Output:
.TP
\fB\-\-quiet\fR
Print only error messages.
.TP
\fB\-o\fR FILE, \fB\-\-output\fR=\fI\,FILE\/\fR
Write trimmed reads to FILE. FASTQ or FASTA format is
chosen depending on input. The summary report is sent
to standard output. Use '{name}' in FILE to
demultiplex reads into multiple files. Default: write
to standard output
.TP
\fB\-\-info\-file\fR=\fI\,FILE\/\fR
Write information about each read and its adapter
matches into FILE. See the documentation for the file
format.
.TP
\fB\-r\fR FILE, \fB\-\-rest\-file\fR=\fI\,FILE\/\fR
When the adapter matches in the middle of a read,
write the rest (after the adapter) into FILE.
.TP
\fB\-\-wildcard\-file\fR=\fI\,FILE\/\fR
When the adapter has N bases (wildcards), write
adapter bases matching wildcard positions to FILE.
When there are indels in the alignment, this will
often not be accurate.
.TP
\fB\-\-too\-short\-output\fR=\fI\,FILE\/\fR
Write reads that are too short (according to length
specified by \fB\-m\fR) to FILE. Default: discard reads
.TP
\fB\-\-too\-long\-output\fR=\fI\,FILE\/\fR
Write reads that are too long (according to length
specified by \fB\-M\fR) to FILE. Default: discard reads
.TP
\fB\-\-untrimmed\-output\fR=\fI\,FILE\/\fR
Write reads that do not contain the adapter to FILE.
Default: output to same file as trimmed reads
.IP
Colorspace options:
.TP
\fB\-c\fR, \fB\-\-colorspace\fR
Enable colorspace mode: Also trim the color that is
adjacent to the found adapter.
.TP
\fB\-d\fR, \fB\-\-double\-encode\fR
Double\-encode colors (map 0,1,2,3,4 to A,C,G,T,N).
.TP
\fB\-t\fR, \fB\-\-trim\-primer\fR
Trim primer base and the first color (which is the
transition to the first nucleotide)
.TP
\fB\-\-strip\-f3\fR
Strip the _F3 suffix of read names
.TP
\fB\-\-maq\fR, \fB\-\-bwa\fR
MAQ\- and BWA\-compatible colorspace output. This
enables \fB\-c\fR, \fB\-d\fR, \fB\-t\fR, \fB\-\-strip\-f3\fR and \fB\-y\fR '/1'.
.TP
\fB\-\-no\-zero\-cap\fR
Do not change negative quality values to zero in
colorspace data. By default, they are since many tools
have problems with negative qualities.
.TP
\fB\-z\fR, \fB\-\-zero\-cap\fR
Change negative quality values to zero. This is
enabled by default when \fB\-c\fR/\-\-colorspace is also
enabled. Use the above option to disable it.
.IP
Paired\-end options:
.IP
The \fB\-A\fR/\-G/\-B/\-U options work like their \fB\-a\fR/\-b/\-g/\-u counterparts, but
are applied to the second read in each pair.
.TP
\fB\-A\fR ADAPTER
3' adapter to be removed from second read in a pair.
.TP
\fB\-G\fR ADAPTER
5' adapter to be removed from second read in a pair.
.TP
\fB\-B\fR ADAPTER
5'/3 adapter to be removed from second read in a pair.
.TP
\fB\-U\fR LENGTH
Remove LENGTH bases from second read in a pair (see
\fB\-\-cut\fR).
.TP
\fB\-p\fR FILE, \fB\-\-paired\-output\fR=\fI\,FILE\/\fR
Write second read in a pair to FILE.
.TP
\fB\-\-pair\-filter=\fR(any|both)
Which of the reads in a paired\-end read have to match
the filtering criterion in order for it to be
filtered. Default: any
.TP
\fB\-\-interleaved\fR
Read and write interleaved paired\-end reads.
.TP
\fB\-\-untrimmed\-paired\-output\fR=\fI\,FILE\/\fR
Write second read in a pair to this FILE when no
adapter was found in the first read. Use this option
together with \fB\-\-untrimmed\-output\fR when trimming pairedend reads. Default: output to same file as trimmed
reads
.TP
\fB\-\-too\-short\-paired\-output\fR=\fI\,FILE\/\fR
Write second read in a pair to this file if pair is
too short. Use together with \fB\-\-too\-short\-output\fR.
.TP
\fB\-\-too\-long\-paired\-output\fR=\fI\,FILE\/\fR
Write second read in a pair to this file if pair is
too long. Use together with \fB\-\-too\-long\-output\fR.
.SH SEE ALSO
See http://cutadapt.readthedocs.org/ for full documentation.
.SS Citation
Marcel Martin. Cutadapt removes adapter sequences from high\-throughput
sequencing reads. EMBnet.Journal, 17(1):10\-12, May 2011.
http://dx.doi.org/10.14806/ej.17.1.200
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.