1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
|
.\" Automatically generated by Pandoc 2.19.2
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "-\f[I]- coding: utf-8 -\f[R]-" "" "" "" ""
.hy
.SH VCFLib Python FFI
.PP
We are building up a Python FFI for vcflib using the brilliant Python
pybind11 module.
Mostly for (our) testing purposes, so we are not aiming for complete
coverage.
See below for adding new bindings.
.SS Setting it up and example
.PP
First import the module.
It may require setting the \f[V]PYTHONPATH\f[R] to the shared library
\f[V]pyvcflib.cpython-39-x86_64-linux-gnu.so\f[R].
.IP
.nf
\f[C]
env PYTHONPATH=./build python3 -c \[aq]import pyvcflib\[aq]
\f[R]
.fi
.PP
in a GNU Guix shell you may prepend \f[V]LD_LIBRARY_PATH\f[R] to find
GLIBC etc.
.IP
.nf
\f[C]
LD_LIBRARY_PATH=$LIBRARY_PATH
\f[R]
.fi
.PP
Now you should be able to use the \f[V]pyvcflib\f[R] module.
Let\[cq]s try with a VCF samples/10158243.vcf that has only one record:
.IP
.nf
\f[C]
>>> from pyvcflib import *
>>> vcf = VariantCallFile()
>>> vcf.openFile(\[dq]../samples/10158243.vcf\[dq])
True
# ... list(rec.name,rec.pos,rec.ref,rec.alt)
>>> rec = Variant(vcf)
>>> while (vcf.getNextVariant(rec)):
\&... [rec.name,rec.pos,rec.ref]
[\[aq]grch38#chr4\[aq], 10158243, \[aq]ACCCCCACCCCCACC\[aq]]
>>> rec.alt[0]
\[aq]ACC\[aq]
>>> rec.alleles
[\[aq]ACCCCCACCCCCACC\[aq], \[aq]ACC\[aq], \[aq]AC\[aq], \[aq]ACCCCCACCCCCAC\[aq], \[aq]ACCCCCACC\[aq], \[aq]ACA\[aq]]
\f[R]
.fi
.PP
So the one input record shows it has a ref of `ACCCCCACCCCCACC' and six
alt alleles [`ACCCCCACCCCCACC', `ACC', `AC', `ACCCCCACCCCCAC',
`ACCCCCACC', `ACA'].
.PP
This works fine!
.SS Masking genotypes
.PP
With vcfwave\[cq]s allelic primitives, when two VCF records get
combined, we need to combine the genotypes.
This happens, for example, with vcfwave deletions.
.PP
When a deletion spans any SNPs from realignment we want to make sure the
SNPs are masked for those deletions.
I.e.
.IP
.nf
\f[C]
6/28/2022, 4:21:44 PM - erikg:
if you have a deletion in hap 1
.|0 or .|1
in hap 2
0|. or 1|.
etc.
\f[R]
.fi
.PP
So, if we have input two variants at the same position with the first a
DEL and the second a SNP the SNP genotypes need to be masked as
.IP
.nf
\f[C]
>> deletion_mask_genotypes([\[aq]1|0\[aq], \[aq]0|0\[aq], \[aq]0|1\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]1|0\[aq], \[aq]1|1\[aq]],
\&... [\[aq]0|0\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]0|0\[aq], \[aq]1|0\[aq], \[aq]1|1\[aq], \[aq]1|1\[aq]])
[\[aq]0|0\[aq], \[aq]1|1\[aq], \[aq]1|0\[aq], \[aq]0|0\[aq], \[aq].|0\[aq], \[aq].|1\[aq], \[aq].|.\[aq]]
\f[R]
.fi
.PP
In the 5-7th column the deletion has the same genotype and gets masked.
.SS Another example
.PP
See realign.py for examples of using the FFI in the form of a python
unit test.
We used that to develop the vcfwave module.
.SS Additional bindings
.PP
It may be the case you want additional bindings that we have not
included yet.
See vcflib\[cq]s pythonffi.cpp for the existing bindings and Variant.h
for the classes and accessors.
|