File: sambamba.md

package info (click to toggle)
multiqc 1.21%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 30,164 kB
  • sloc: python: 52,323; javascript: 7,064; sh: 76; makefile: 21
file content (49 lines) | stat: -rw-r--r-- 1,429 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
name: Sambamba
url: https://lomereiter.github.io/sambamba/
description: >
  Sambamba is a suite of programs written in the D Language for users to
  process high-throughput sequencing data.
---

[Sambamba](https://lomereiter.github.io/sambamba/) is a suite of programs for
users to quickly and efficiently process their high-throughput sequencing data.
It is functionally similar to Samtools, but the source code is written in the
D Language; it allows for faster performance while still being easy to use.

Supported commands:

- `markdup`

### markdup

This module parses key phrases in the output log files to find duplicate +
unique reads and then calculates duplicate rate per sample. It will
will work for both single and paired-end data.
The absolute number of reads by type are displayed in a stacked bar plot,
and duplicate rates are in the general statistics table.

Duplicate rates are calculated as follows:

#### Paired end

> `duplicate_rate = duplicateReads / (sortedEndPairs * 2 + singleEnds - singleUnmatchedPairs) * 100`

#### Single end

> `duplicate_rate = duplicateReads / singleEnds * 100`

If Sambamba Markdup is invoked using Snakemake, the following bare-bones
rule should work fine:

```python
rule markdup:
  input:
    "data/align/{sample}.bam"
  output:
    "data/markdup/{sample}.markdup.bam"
  log:
    "data/logs/{sample}.log"
  shell:
    "sambamba markdup {input} {output} > {log} 2>&1"
```