File: README.md

package info (click to toggle)
uncalled 2.2%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,876 kB
  • sloc: cpp: 21,404; python: 1,995; sh: 125; makefile: 62
file content (48 lines) | stat: -rw-r--r-- 2,058 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Simulator Scripts

This directory contains two scripts which can be used to interpet the output of `uncalled sim`.

We understand these scripts may not be the most user friendly. We will work towards improving them and better intergrate them with the simulator in the future.

## `est_genome_yield.py`

**Example:**
```
> sim_scripts/est_genome_yield.py -u uncalled_out.paf --enrich -x E.coli -m mm2.paf -s sequencing_summary.txt --sim-speed 0.25

unc_on_bp       150.678033
unc_total_bp    6094.559395
cnt_on_bp       33.145022
cnt_total_bp    8271.651331
```

This is designed to be used in the context of enriching/depleting for whole genomes or chromosomes.

Arguments:
- `-u/--uncalled-fname`: Simulator output PAF file
- `-s/--seq_sum`: Control sequencing summary
- `-m/--minimap-fname`: Minimap2 PAF file of the control reads aligned to a reference containing the target (or off-target, in the case of depletion) sequences
- `-x/--bwa-prefix`: BWA reference used during the simulation
- `--deplete/--enrich`: same as option used in simulation
- `-t/--sim-speed`: Speed that the simulator was run at in the range (0.0, 1.0]

## `est_bed_yield.py`

**Example:**
```
> sim_scripts/est_bed_yield.py -u uncalled_out.paf -c ctl_coverage.bed -s sequencing_summary.txt -t 0.25 

unc_on_bp       150.678033
unc_total_bp    6094.559395
cnt_on_bp       33.145022
cnt_total_bp    8271.651331

```

This is designed to be used when the targets are subsequences of a larger reference, for example a set of genes. This requires `bedtools intersect -bed -a control_alns.bam -b targets.bed` to be run prior, where `control_alns.bam` is minimap2 alignments of the basecalled control reads to the full reference, and `targets.bed` are the targeted regions.

Arguments:
- `-u/--uncalled-fname`: Simulator output PAF file
- `-c/--cov-fname`: BED file of control read coverage. Should be output from 'bedools intersect' of control read alignments and the target region(s)
- `-s/--seq-sum`: Control sequencing summary
- -t/--sim-speed: Speed that the simulator was run at