1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
|
sambamba-markdup(1) -- finding duplicate reads in BAM file
=============================================================
## SYNOPSIS
`sambamba markdup` [OPTIONS] <input.bam> <output.bam>
## DESCRIPTION
Marks (by default) or removes duplicate reads. For determining
whether a read is a duplicate or not, the same `sum of base qualities' method is
used as in [Picard](https://broadinstitute.github.io/picard/picard-metric-definitions.html).
## OPTIONS
* `-r`, `--remove-duplicates`:
remove duplicates instead of just marking them
* `-t`, `--nthreads`=<NTHREADS>:
number of threads to use
* `-l`, `--compression-level`=<N>:
specify compression level of the resulting file (from 0 to 9)");
* `-p`, `--show-progress`:
show progressbar in STDERR
* `--tmpdir`=<TMPDIR>:
specify directory for temporary files; default is `/tmp`
* `--hash-table-size`=<HASHTABLESIZE>:
size of hash table for finding read pairs (default is 262144 reads);
will be rounded down to the nearest power of two;
should be `> (average coverage) * (insert size)` for good performance
* `--overflow-list-size`=<OVERFLOWLISTSIZE>:
size of the overflow list where reads, thrown away from the hash table,
get a second chance to meet their pairs (default is 200000 reads);
increasing the size reduces the number of temporary files created
* `--io-buffer-size`=<BUFFERSIZE>:
controls sizes of two buffers of BUFFERSIZE *megabytes* each, used
for reading and writing BAM during the second pass (default is 128)
## SEE ALSO
[Picard](https://broadinstitute.github.io/picard/picard-metric-definitions.html) metric
definitions for removing duplicates.
## BUGS
External sort is not implemented.
Thus, memory consumption grows by 2Gb per each 100M reads.
Check that you have enough RAM before running the tool.
|