File: pyvcflib.1

package info (click to toggle)
libvcflib 1.0.12%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 70,520 kB
  • sloc: cpp: 39,837; python: 532; perl: 474; ansic: 317; ruby: 295; sh: 254; lisp: 148; makefile: 123; javascript: 94
file content (121 lines) | stat: -rw-r--r-- 3,246 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
.\" Automatically generated by Pandoc 2.19.2
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "-\f[I]- coding: utf-8 -\f[R]-" "" "" "" ""
.hy
.SH VCFLib Python FFI
.PP
We are building up a Python FFI for vcflib using the brilliant Python
pybind11 module.
Mostly for (our) testing purposes, so we are not aiming for complete
coverage.
See below for adding new bindings.
.SS Setting it up and example
.PP
First import the module.
It may require setting the \f[V]PYTHONPATH\f[R] to the shared library
\f[V]pyvcflib.cpython-39-x86_64-linux-gnu.so\f[R].
.IP
.nf
\f[C]
env PYTHONPATH=./build python3 -c \[aq]import pyvcflib\[aq]
\f[R]
.fi
.PP
in a GNU Guix shell you may prepend \f[V]LD_LIBRARY_PATH\f[R] to find
GLIBC etc.
.IP
.nf
\f[C]
LD_LIBRARY_PATH=$LIBRARY_PATH
\f[R]
.fi
.PP
Now you should be able to use the \f[V]pyvcflib\f[R] module.
Let\[cq]s try with a VCF samples/10158243.vcf that has only one record:
.IP
.nf
\f[C]
>>> from pyvcflib import *

>>> vcf = VariantCallFile()
>>> vcf.openFile(\[dq]../samples/10158243.vcf\[dq])
True

# ...    list(rec.name,rec.pos,rec.ref,rec.alt)

>>> rec = Variant(vcf)
>>> while (vcf.getNextVariant(rec)):
\&...    [rec.name,rec.pos,rec.ref]
[\[aq]grch38#chr4\[aq], 10158243, \[aq]ACCCCCACCCCCACC\[aq]]

>>> rec.alt[0]
\[aq]ACC\[aq]

>>> rec.alleles
[\[aq]ACCCCCACCCCCACC\[aq], \[aq]ACC\[aq], \[aq]AC\[aq], \[aq]ACCCCCACCCCCAC\[aq], \[aq]ACCCCCACC\[aq], \[aq]ACA\[aq]]
\f[R]
.fi
.PP
So the one input record shows it has a ref of `ACCCCCACCCCCACC' and six
alt alleles [`ACCCCCACCCCCACC', `ACC', `AC', `ACCCCCACCCCCAC',
`ACCCCCACC', `ACA'].
.PP
This works fine!
.SS Masking genotypes
.PP
With vcfwave\[cq]s allelic primitives, when two VCF records get
combined, we need to combine the genotypes.
This happens, for example, with vcfwave deletions.
.PP
When a deletion spans any SNPs from realignment we want to make sure the
SNPs are masked for those deletions.
I.e.
.IP
.nf
\f[C]
6/28/2022, 4:21:44 PM - erikg:
if you have a deletion in hap 1
        .|0 or .|1
in hap 2
        0|. or 1|.
etc.
\f[R]
.fi
.PP
So, if we have input two variants at the same position with the first a
DEL and the second a SNP the SNP genotypes need to be masked as
.IP
.nf
\f[C]
>> deletion_mask_genotypes([\[aq]1|0\[aq], \[aq]0|0\[aq], \[aq]0|1\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]1|0\[aq], \[aq]1|1\[aq]],
\&...                         [\[aq]0|0\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]0|0\[aq], \[aq]1|0\[aq], \[aq]1|1\[aq], \[aq]1|1\[aq]])
                            [\[aq]0|0\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]0|0\[aq], \[aq].|0\[aq], \[aq].|1\[aq], \[aq].|.\[aq]]
\f[R]
.fi
.PP
In the 5-7th column the deletion has the same genotype and gets masked.
.SS Another example
.PP
See realign.py for examples of using the FFI in the form of a python
unit test.
We used that to develop the vcfwave module.
.SS Additional bindings
.PP
It may be the case you want additional bindings that we have not
included yet.
See vcflib\[cq]s pythonffi.cpp for the existing bindings and Variant.h
for the classes and accessors.