1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
|
---
name: Sambamba
url: https://lomereiter.github.io/sambamba/
description: >
Sambamba is a suite of programs written in the D Language for users to
process high-throughput sequencing data.
---
[Sambamba](https://lomereiter.github.io/sambamba/) is a suite of programs for
users to quickly and efficiently process their high-throughput sequencing data.
It is functionally similar to Samtools, but the source code is written in the
D Language; it allows for faster performance while still being easy to use.
Supported commands:
- `markdup`
### markdup
This module parses key phrases in the output log files to find duplicate +
unique reads and then calculates duplicate rate per sample. It will
will work for both single and paired-end data.
The absolute number of reads by type are displayed in a stacked bar plot,
and duplicate rates are in the general statistics table.
Duplicate rates are calculated as follows:
#### Paired end
> `duplicate_rate = duplicateReads / (sortedEndPairs * 2 + singleEnds - singleUnmatchedPairs) * 100`
#### Single end
> `duplicate_rate = duplicateReads / singleEnds * 100`
If Sambamba Markdup is invoked using Snakemake, the following bare-bones
rule should work fine:
```python
rule markdup:
input:
"data/align/{sample}.bam"
output:
"data/markdup/{sample}.markdup.bam"
log:
"data/logs/{sample}.log"
shell:
"sambamba markdup {input} {output} > {log} 2>&1"
```
|