File: samtools-merge.1

package info (click to toggle)
samtools 1.21-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 21,804 kB
  • sloc: ansic: 33,931; perl: 7,702; sh: 452; makefile: 298; java: 158
file content (236 lines) | stat: -rw-r--r-- 8,115 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
'\" t
.TH samtools-merge 1 "12 September 2024" "samtools-1.21" "Bioinformatics tools"
.SH NAME
samtools merge \- merges multiple sorted files into a single file
.\"
.\" Copyright (C) 2008-2011, 2013-2019, 2021-2023 Genome Research Ltd.
.\" Portions copyright (C) 2010, 2011 Broad Institute.
.\"
.\" Author: Heng Li <lh3@sanger.ac.uk>
.\" Author: Joshua C. Randall <jcrandall@alum.mit.edu>
.\"
.\" Permission is hereby granted, free of charge, to any person obtaining a
.\" copy of this software and associated documentation files (the "Software"),
.\" to deal in the Software without restriction, including without limitation
.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
.\" and/or sell copies of the Software, and to permit persons to whom the
.\" Software is furnished to do so, subject to the following conditions:
.\"
.\" The above copyright notice and this permission notice shall be included in
.\" all copies or substantial portions of the Software.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
.\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
.\" DEALINGS IN THE SOFTWARE.
.
.\" For code blocks and examples (cf groff's Ultrix-specific man macros)
.de EX

.  in +\\$1
.  nf
.  ft CR
..
.de EE
.  ft
.  fi
.  in

..
.
.SH SYNOPSIS
.PP
.B samtools merge
.RI [ options ]
.B -o
.I out.bam
.RI [ options ]
.IR in1.bam " ... " inN.bam
.PP
.B samtools merge
.RI [ options ]
.I out.bam
.IR in1.bam " ... " inN.bam

.SH DESCRIPTION
.PP
Merge multiple sorted alignment files, producing a single sorted output file
that contains all the input records and maintains the existing sort order.

The output file can be specified via \fB-o\fP as shown in the first synopsis.
Otherwise the first non-option filename argument is taken to be \fIout.bam\fP
rather than an input file, as in the second synopsis.
There is no default; to write to standard output (or to a pipe), use either
\(lq\fB-o -\fP\(rq or the equivalent using \(lq\fB-\fP\(rq as the first
filename argument.

If
.BR -h
is specified the @SQ headers of input files will be merged into the specified header, otherwise they will be merged
into a composite header created from the input headers.  If in the process of merging @SQ lines for coordinate sorted
input files, a conflict arises as to the order (for example input1.bam has @SQ for a,b,c and input2.bam has b,a,c)
then the resulting output file will need to be re-sorted back into coordinate order.

Unless the
.BR -c
or
.BR -p
flags are specified then when merging @RG and @PG records into the output header then any IDs found to be duplicates
of existing IDs in the output header will have a suffix appended to them to differentiate them from similar header
records from other files and the read records will be updated to reflect this.

The ordering of the records in the input files must match the usage of the
\fB-n\fP, \fB-N\fP and \fB-t\fP command-line options.  If they do not,
the output order will be undefined.  Note this also extends to
disallowing mixing of "queryname" files with a combination of natural
and lexicographical sort orders.  See
.B sort
for information about record ordering.

Problems may arise when attempting to merge thousands of files
together.  The operating system may impose a limit on the maximum
number of simultaneously open files.  See \fBulimit -n\fR for more
information.  Additionally many files being read from simultaneously
may cause a certain amount of "disk thrashing".  To partially
alleviate this the merge command will load 1MB of data at a time from
each file, but this in turn adds to the overall merge program memory
usage.  Please take this into account when setting memory limits.

In extreme cases, it may be necessary to reduce the problem to fewer
files by successively merging subsets before a second round of merging.

.TP 8
.B -1
Use Deflate compression level 1 to compress the output.
.TP
.BI -b \ FILE
List of input BAM files, one file per line.
.TP
.B -f
Force to overwrite the output file if present.
.TP 8
.BI -h \ FILE
Use the lines of
.I FILE
as `@' headers to be copied to
.IR out.bam ,
replacing any header lines that would otherwise be copied from
.IR in1.bam .
.RI ( FILE
is actually in SAM format, though any alignment records it may contain
are ignored.)
.TP
.B -n
The input alignments are sorted by read names using an alpha-numeric
ordering, rather than by chromosomal coordinates.
The alpha-numeric or \*(lqnatural\*(rq sort order detects runs of digits in the
strings and sorts these numerically.  Hence "a7b" appears before "a12b".
Note this is not suitable where hexadecimal values are in use.
.TP
.B -N
The input alignments are sorted by read names using a lexicographical
ordering, rather than by chromosomal coordinates.  Unlike \fB-n\fR no
detection of numeric components is used, instead relying purely on the
ASCII value of each character.  Hence "x12" comes before "x7" as "1"
is before "7" in ASCII.  This is a more appropriate name sort order
where all digits in names are already zero-padded and/or hexadecimal
values are being used.
.TP
.BI -o \ FILE
Write merged output to
.IR FILE ,
specifying the filename via an option rather than as the first filename
argument.
When \fB-o\fP is used, all non-option filename arguments specify input
files to be merged.
.TP
.B -t TAG
The input alignments have been sorted by the value of TAG, then by either
position or name (if \fB-n\fP is given).
.TP
.BI -R \ STR
Merge files in the specified region indicated by
.I STR
[null]
.TP
.B -r
Attach an RG tag to each alignment. The tag value is inferred from file names.
.TP
.B -u
Uncompressed BAM output
.TP
.B -c
When several input files contain @RG headers with the same ID, emit only one
of them (namely, the header line from the first file we find that ID in) to
the merged output file.
Combining these similar headers is usually the right thing to do when the
files being merged originated from the same file.

Without \fB-c\fP, all @RG headers appear in the output file, with random
suffixes added to their IDs where necessary to differentiate them.
.TP
.B -p
Similarly, for each @PG ID in the set of files to merge, use the @PG line
of the first file we find that ID in rather than adding a suffix to
differentiate similar IDs.
.TP
.B -X
If this option is set, it will allows user to specify customized index file location(s) if the data 
folder does not contain any index file. See
.B EXAMPLES
section for sample of usage.
.TP
.BI -L \ FILE
BED file for specifying multiple regions on which the merge will be performed.
This option extends the usage of
.B -R
option and cannot be used concurrently with it.
.TP
.BI --no-PG
Do not add a @PG line to the header of the output file.
.TP
.BI "-@, --threads " INT
Number of input/output compression threads to use in addition to main thread [0].

.SH EXAMPLES
.IP o 2
Attach the
.B RG
tag while merging sorted alignments:
.EX 2
printf '@RG\\tID:ga\\tSM:hs\\tLB:ga\\tPL:ILLUMINA\\n@RG\\tID:454\\tSM:hs\\tLB:454\\tPL:LS454\\n' > rg.txt
samtools merge -rh rg.txt merged.bam ga.bam 454.bam
.EE
The value in a
.B RG
tag is determined by the file name the read is coming from. In this
example, in the
.IR merged.bam ,
reads from
.I ga.bam
will be attached
.IR RG:Z:ga ,
while reads from
.I 454.bam
will be attached
.IR RG:Z:454 .

.IP o 2
Include customized index file as a part of arguments:
.EX 2
samtools merge [options] -X <out.bam> </data_folder/in1.bam> [</data_folder/in2.bam> ... </data_folder/inN.bam>] </index_folder/index1.bai> [</index_folder/index2.bai> ... </index_folder/indexN.bai>]
.EE

.SH AUTHOR
.PP
Written by Heng Li from the Sanger Institute.

.SH SEE ALSO
.IR samtools (1),
.IR samtools-sort (1),
.IR sam (5)
.PP
Samtools website: <http://www.htslib.org/>