1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348
|
.\" Automatically generated by Pandoc 2.19.2
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "VCFALLELICPRIMITIVES" "1" "" "vcfallelicprimitives (vcflib)" "vcfallelicprimitives (VCF transformation)"
.hy
.SH NAME
.PP
vcfallelicprimitives - Converts stdin or given VCF file and reduces
alleles.
.SH SYNOPSIS
.PP
\f[B]vcfallelicprimitives\f[R]
.SH DESCRIPTION
.PP
\f[B]vcfallelicprimitives\f[R] converts stdin or given VCF file to
tab-delimited format,
.PP
Realign reference and alternate alleles with WFA, parsing out the
primitive alleles into multiple VCF records.
New records have IDs that reference the source record ID.
Genotypes are handled.
Deletion alleles will result in haploid (missing allele) genotypes.
.PP
Note that this tool is considered legacy and will emit a warning!
Use vcfwave instead.
.SS Options
.TP
-h, \[en]help
shows help message and exits.
.PP
See more below.
.SH EXIT VALUES
.TP
\f[B]0\f[R]
Success
.TP
\f[B]not 0\f[R]
Failure
.SH EXAMPLES
.IP
.nf
\f[C]
>>> head(\[dq]vcfallelicprimitives -h\[dq],29)
>
usage: ./vcfallelicprimitives [options] [file]
>
WARNING: this tool is considered legacy and is only retained for older
workflows. It will emit a warning! Even though it can use the WFA
you should use [vcfwave](./vcfwave.md) instead.
>
Realign reference and alternate alleles with SW or WF, parsing out
the primitive alleles into multiple VCF records. New records have IDs
that reference the source record ID. Genotypes are handled. Deletion
alleles will result in haploid (missing allele) genotypes.
>
options:
-a, --algorithm TYPE Choose algorithm SW (Smith-Waterman) or WF wavefront
(default: WF)
-m, --use-mnps Retain MNPs as separate events (default: false).
-t, --tag-parsed FLAG Annotate decomposed records with the source record
position (default: ORIGIN).
-L, --max-length LEN Do not manipulate records in which either the ALT or
REF is longer than LEN (default: unlimited).
-k, --keep-info Maintain site and allele-level annotations when
decomposing. Note that in many cases,
such as multisample VCFs, these won\[aq]t be
valid post decomposition. For biallelic
loci in single-sample VCFs, they should be
used with caution.
-d, --debug debug mode.
>
Type: transformation
\f[R]
.fi
.PP
vcfallelicprimitives picks complex regions and simplifies nested
alignments.
For example:
.IP
.nf
\f[C]
>>> sh(\[dq]grep 10158243 ../samples/10158243.vcf\[dq])
grch38#chr4 10158243 >3655>3662 ACCCCCACCCCCACC ACC,AC,ACCCCCACCCCCAC,ACCCCCACC,ACA 60 . AC=64,3,2,3,1;AF=0.719101,0.0337079,0.0224719,0.0337079,0.011236;AN=89;AT=>3655>3656>3657>3658>3659>3660>3662,>3655>3656>3660>3662,>3655>3660>3662,>3655>3656>3657>3658>3660>3662,>3655>3656>3657>3660>3662,>3655>3656>3661>3662;NS=45;LV=0 GT 0|0 1|1 1|1 1|0 5|1 0|4 0|1 0|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 4|3 1|1 1|1 1|1 1|0 1|0 1|0 1|0 1|1 1|1 1|4 1|1 1|1 3|0 1|0 1|1 0|1 1|1 1|1 2|1 1|2 1|1 1|1 0|1 1|1 1|1 1|0 1|2 1|1 0
\f[R]
.fi
.PP
After aligning it reduces into two records with variant alleles
.IP
.nf
\f[C]
10158243:ACCCCCA/A
10158243:ACCCCCACCCC/A
10158243:ACCCCCACCCCCA/A
10158243:ACCCCCACCCCCAC/A
10158255:AC/A
10158255:ACC/A
\f[R]
.fi
.PP
and adjusts the genotypes accordingly splitting into two records using
the original (but arguably OBSOLETE) SW algorithm:
.IP
.nf
\f[C]
>>> sh(\[dq]../build/vcfallelicprimitives -a SW -m -L 1000 ../samples/10158243.vcf|grep -v \[ha]\[rs]#\[dq])
grch38#chr4 10158243 >3655>3662_1 ACCCCCACCCCCAC ACCCCCAC,ACAC,AC,A 60 . AC=3,1,64,3;AF=0.0337079,0.011236,0.719101,0.0337079;LEN=6,10,12,13;ORIGIN=grch38#chr4:10158243,grch38#chr4:10158243,grch38#chr4:10158243,grch38#chr4:10158243;TYPE=del,del,del,del GT 0|0 3|3 3|3 3|0 2|3 0|1 0|3 0|3 3|3 3|3 3|3 3|3 3|3 3|3 3|3 1|0 3|3 3|3 3|3 3|0 3|0 3|0 3|0 3|3 3|3 3|1 3|3 3|3 0|0 3|0 3|3 0|3 3|3 3|3 4|3 3|4 3|3 3|3 0|3 3|3 3|3 3|0 3|4 3|3 0
grch38#chr4 10158255 >3655>3662_2 ACC AC,A 60 . AC=2,1;AF=0.0224719,0.011236;LEN=1,2;ORIGIN=grch38#chr4:10158243,grch38#chr4:10158243;TYPE=del,del GT 0|0 .|. .|. .|0 2|. 0|0 0|. 0|. .|. .|. .|. .|. .|. .|. .|. 0|1 .|. .|. .|. .|0 .|0 .|0 .|0 .|. .|. .|0 .|. .|. 1|0 .|0 .|. 0|. .|. .|. .|. .|. .|. .|. 0|. .|. .|. .|0 .|. .|. 0
\f[R]
.fi
.PP
With the default wavefront algorithm we get a different result
.IP
.nf
\f[C]
1$10158244:CCCCCACCCCCAC/C
2$10158244:CCCCCACCCCCACC/C
3$10158245:CCCCACCCCCACC/C
4$10158251:CCCCACC/C
5$10158256:CC/C
\f[R]
.fi
.IP
.nf
\f[C]
>>> sh(\[dq]../build/vcfallelicprimitives -m -L 1000 ../samples/10158243.vcf|grep -v \[ha]\[rs]#\[dq])
grch38#chr4 10158244 >3655>3662_1 CCCCCACCCCCACC CC,C 60 . AC=1,3;AF=0.011236,0.0337079;LEN=12,13;ORIGIN=grch38#chr4:10158243,grch38#chr4:10158243;TYPE=del,del GT 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 2|0 0|2 0|0 0|0 0|0 0|0 0|0 0|0 0|2 0|0 0
grch38#chr4 10158245 >3655>3662_2 CCCCACCCCCACC C 60 . AC=64;AF=0.719101;LEN=12;ORIGIN=grch38#chr4:10158243;TYPE=del GT 0|0 1|1 1|1 1|0 .|1 0|0 0|1 0|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 0|0 1|1 1|1 1|1 1|0 1|0 1|0 1|0 1|1 1|1 1|0 1|1 1|1 0|0 1|0 1|1 0|1 1|1 1|1 .|1 1|. 1|1 1|1 0|1 1|1 1|1 1|0 1|. 1|1 0
grch38#chr4 10158251 >3655>3662_3 CCCCACC C 60 . AC=3;AF=0.0337079;LEN=6;ORIGIN=grch38#chr4:10158243;TYPE=del GT 0|0 .|. .|. .|0 .|. 0|1 0|. 0|. .|. .|. .|. .|. .|. .|. .|. 1|0 .|. .|. .|. .|0 .|0 .|0 .|0 .|. .|. .|1 .|. .|. 0|0 .|0 .|. 0|. .|. .|. .|. .|. .|. .|. 0|. .|. .|. .|0 .|. .|. 0
grch38#chr4 10158256 >3655>3662_4 CC C 60 . AC=2;AF=0.0224719;LEN=1;ORIGIN=grch38#chr4:10158243;TYPE=del GT 0|0 .|. .|. .|0 .|. 0|. 0|. 0|. .|. .|. .|. .|. .|. .|. .|. .|1 .|. .|. .|. .|0 .|0 .|0 .|0 .|. .|. .|. .|. .|. 1|0 .|0 .|. 0|. .|. .|. .|. .|. .|. .|. 0|. .|. .|. .|0 .|. .|. 0
grch38#chr4 10158257 >3655>3662_5 C A 60 . AC=1;AF=0.011236;LEN=1;ORIGIN=grch38#chr4:10158243;TYPE=snp GT 0|0 .|. .|. .|0 .|. 0|. 0|. 0|. .|. .|. .|. .|. .|. .|. .|. .|. .|. .|. .|. .|0 .|0 .|0 .|0 .|. .|. .|. .|. .|. .|0 .|0 .|. 0|. .|. .|. .|. .|. .|. .|. 0|. .|. .|. .|0 .|. .|. 0
\f[R]
.fi
.SH ./vcfallelicprimitives -a SW -m -L 1000 ../samples/grch38#chr8_36353854-36453166.vcf > ../test/data/regression/vcfallelicprimitives_5.vcf
.RS
.RS
.RS
.PP
run_stdout(\[lq]vcfallelicprimitives -a SW -m -L 1000
\&../samples/grch38#chr8_36353854-36453166.vcf\[rq], ext=\[lq]vcf\[rq])
output in vcfallelicprimitives_5.vcf
.RE
.RE
.RE
.SH ./vcfallelicprimitives -a SW -m -L 1000 ../samples/grch38#chr4_10083863-10181258.vcf > ../test/data/regression/vcfallelicprimitives_6.vcf
.PP
/regression/vcfallelicprimitives_6.vcf >>>
run_stdout(\[lq]vcfallelicprimitives -a SW -m -L 1000
\&../samples/grch38#chr4_10083863-10181258.vcf\[rq], ext=\[lq]vcf\[rq])
output in vcfallelicprimitives_6.vcf
.SH ./vcfallelicprimitives -L 10000 -m ../samples/grch38#chr8_36353854-36453166.vcf > ../test/data/regression/vcfallelicprimitives_7.vcf
.RS
.RS
.RS
.PP
run_stdout(\[lq]vcfallelicprimitives -L 10000 -m
\&../samples/grch38#chr8_36353854-36453166.vcf\[rq], ext=\[lq]vcf\[rq])
output in vcfallelicprimitives_7.vcf
.RE
.RE
.RE
.SH ./vcfallelicprimitives -m ../samples/grch38#chr4_10083863-10181258.vcf > ../test/data/regression/vcfallelicprimitives_8.vcf
.RS
.RS
.RS
.PP
run_stdout(\[lq]vcfallelicprimitives -m
\&../samples/grch38#chr4_10083863-10181258.vcf\[rq], ext=\[lq]vcf\[rq])
output in vcfallelicprimitives_8.vcf
.RE
.RE
.RE
.IP
.nf
\f[C]
Another diff example where the first is SW and the second WFA2 showing:
\[ga]\[ga]\[ga]python
>>> sh(\[dq]diff data/regression/vcfallelicprimitives_6.vcf data/regression/vcfallelicprimitives_8.vcf|tail -6\[dq])
1670c1680,1682
< grch38#chr4 10180508 >4593>4597_1 CTT CTTT,CT,C 60 . AC=7,47,1;AF=0.0786517,0.52809,0.011236;LEN=1,1,2;ORIGIN=grch38#chr4:10180508,grch38#chr4:10180508,grch38#chr4:10180508;TYPE=ins,del,del GT 2|0 0|2 2|2 2|0 2|2 0|0 0|2 0|2 2|2 2|2 2|0 2|0 0|2 2|0 2|2 2|0 2|2 2|2 2|2 2|0 0|1 0|0 2|1 2|2 0|2 2|2 2|0 0|2 0|3 2|1 0|2 0|0 2|0 1|2 2|2 0|1 2|2 0|0 0|0 1|0 0|1 2|0 0|0 2|2 2
---
> grch38#chr4 10180508 >4593>4597_1 CTT C 60 . AC=1;AF=0.011236;LEN=2;ORIGIN=grch38#chr4:10180508;TYPE=del GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0
> grch38#chr4 10180509 >4593>4597_2 TT T 60 . AC=47;AF=0.52809;LEN=1;ORIGIN=grch38#chr4:10180508;TYPE=del GT 1|0 0|1 1|1 1|0 1|1 0|0 0|1 0|1 1|1 1|1 1|0 1|0 0|1 1|0 1|1 1|0 1|1 1|1 1|1 1|0 0|0 0|0 1|0 1|1 0|1 1|1 1|0 0|1 0|. 1|0 0|1 0|0 1|0 0|1 1|1 0|0 1|1 0|0 0|0 0|0 0|0 1|0 0|0 1|1 1
> grch38#chr4 10180510 >4593>4597_3 T TT 60 . AC=7;AF=0.0786517;LEN=1;ORIGIN=grch38#chr4:10180508;TYPE=ins GT .|0 0|. .|. .|0 .|. 0|0 0|. 0|. .|. .|. .|0 .|0 0|. .|0 .|. .|0 .|. .|. .|. .|0 0|1 0|0 .|1 .|. 0|. .|. .|0 0|. 0|. .|1 0|. 0|0 .|0 1|. .|. 0|1 .|. 0|0 0|0 1|0 0|1 .|0 0|0 .|. .
\f[R]
.fi
.PP
shows how WFA2 is doing a better job at taking things apart.
.PP
Even so, this record is wrong.
From grch38#chr4_10083863-10181258.vcf
.PP
Differences between WFA and biWFA:
.IP
.nf
\f[C]
wdiff vcfallelicprimitives_8.vcf vcfwave_5.vcf
grch38#chr4 10134337 >2103>2106_1 [-TTTTG AGGCA-] {+TTTTGGTGTACTGCCT AGGCAGTACACCAAAA+}
grch38#chr4 10134492 >2125>2211_3 [-TTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTG GTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTG-] {+TTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAG GTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAGCATCCCAAGTGATGTAG+}
grch38#chr4 10134498
grch38#chr4 [-10134501 >2125>2211_5 AATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATT CATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATT-] {+10134500 >2125>2211_6 GAATCCCAATTGATGGAG G+} 60 . [-AC=1;AF=0.011236;INV=0,0;LEN=17;ORIGIN=grch38#chr4:10134484;TYPE=complex-] {+AC=1;AF=0.011236;INV=0;LEN=17;ORIGIN=grch38#chr4:10134484;TYPE=del+}
GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
0|0 0|1 0|0 0|0 0|0 0
grch38#chr4 {+10134518 >2125>2211_7 AATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATTGATGGAGAATCCCAATT CATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATTGATGGAGCATCCCAATT 60 . AC=1;AF=0.011236;INV=0;LEN=112;ORIGIN=grch38#chr4:10134484;TYPE=mnp GT 0|0 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 0
\f[R]
.fi
.PP
Another problem case fixed with fb7365b07f832bcfcc6ddb693ddfc4a01a25cfbb
.IP
.nf
\f[C]
grch38#chr8_36353854-36453166.vcf:grch38#chr8 36382847 >721>726 GT GC,AC
SW
vcfallelicprimitives_5.vcf:grch38#chr8 36382847 >721>726_1 GT AC
vcfallelicprimitives_5.vcf:grch38#chr8 36382848 >721>726_2 T C
WF
vcfwave_4.vcf:grch38#chr8 36382847 >721>726_1 G A
\f[R]
.fi
.PP
Let\[cq]s look at some longer sequences:
.PP
The original
.IP
.nf
\f[C]
grch38#chr4_10083863-10181258.vcf:grch38#chr4 10134514 >2136>2148 GGAGAATCCCAATTGATGG GTAGCATCCCAAGTGATGT,GTAGAATCCCAATTGATGT,GGAGCATCCCAATTGATGG,GG 60 . AC=11,7,1,3;AF=0.125,0.0795455,0.0113636,0.0340909;AN=88;AT=>2136>2138>2139>2141>2142>2144>2145>2147>2148,>2136>2137>2139>2140>2142>2143>2145>2146>2148,>2136>2137>2139>2141>2142>2144>2145>2146>2148,>2136>2138>2139>2140>2142>2144>2145>2147>2148,>2136>2138>2148;NS=45;LV=1;PS=>2125>2211 GT 0|1 1|0 0|0 0|1 0|0 1|0 1|01|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 2|2 0|0 4|0 0|0 0|1 0|10|1 0|2 0|0 4|0 0|2 0|0 0|0 2|0 0|0 0|0 0|0 0|0 0|0 2|04|1 0|0 0|0 0|0 0|0 0|3 0|0 0|2 0|0 1
# translates to SW
vcfallelicprimitives_6.vcf:grch38#chr4 10134514 >2136>2148_1 GGAGAATCCCAATTGATG G 60.AC=3;AF=0.0340909;LEN=17;ORIGIN=grch38#chr4:10134514;TYPE=del GT 0|0 0|0 0|0 0|0 0|00|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 1|0 0|00|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|00|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0
vcfallelicprimitives_6.vcf:grch38#chr4 10134515 >2136>2148_2 GAGAATCCCAATTGATGG TAGAATCCCAATTGATGT,TAGCATCCCAAGTGATGT 60 . AC=7,11;AF=0.0795455,0.125;LEN=18,18;ORIGIN=grch38#chr4:10134514,grch38#chr4:10134514;TYPE=mnp,mnp GT 0|2 2|0 0|0 0|2 0|0 2|0 2|0 2|00|0 0|0 0|0 0|0 0|0 0|. 0|0 1|1 0|0 .|0 0|0 0|2 0|2 0|20|1 0|0 .|0 0|1 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 1|0 .|20|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 2
vcfallelicprimitives_6.vcf:grch38#chr4 10134518 >2136>2148_3 AATCCCAATTGATGG CATCCCAATTGATGG 60.AC=1;AF=0.0113636;LEN=15;ORIGIN=grch38#chr4:10134514;TYPE=mnp GT 0|0 0|0 0|0 0|0 0|00|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 .|0 0|00|0 0|0 0|0 0|0 0|0 .|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|00|0 0|0 .|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0
# and WFA
vcfallelicprimitives_8.vcf:grch38#chr4 10134515 >2136>2148_1 GAGAATCCCAATTGATGG TAGCATCCCAAGTGATGG,TAGAATCCCAATTGATGG,G 60 . AC=11,7,3;AF=0.125,0.0795455,0.0340909;LEN=14,16,17;ORIGIN=grch38#chr4:10134514,grch38#chr4:10134514,grch38#chr4:10134514;TYPE=mnp,mnp,del GT 0|1 1|0 0|00|1 0|0 1|0 1|0 1|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 2|2 0|03|0 0|0 0|1 0|1 0|1 0|2 0|0 3|0 0|2 0|0 0|0 2|0 0|0 0|00|0 0|0 0|0 2|0 3|1 0|0 0|0 0|0 0|0 0|0 0|0 0|2 0|0 1
vcfallelicprimitives_8.vcf:grch38#chr4 10134518 >2136>2148_2 AATCCCAATTGATG CATCCCAATTGATG 60.AC=1;AF=0.0113636;LEN=14;ORIGIN=grch38#chr4:10134514;TYPE=mnp GT 0|0 0|0 0|0 0|0 0|00|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|. 0|0 0|0 0|0 .|0 0|00|0 0|0 0|0 0|0 0|0 .|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|00|0 0|0 .|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0
\f[R]
.fi
.PP
To describe the bug we get for grch38#chr4 10134514 (or: original, sw:,
wf:).
It was fixed with commit fb7365b07f832bcfcc6ddb693ddfc4a01a25cfbb.
.IP
.nf
\f[C]
ori: GGAGAATCCCAATTGATGG->GG
sw: GGAGAATCCCAATTGATG ->G
wf: GAGAATCCCAATTGATGG->G
fix: GAGAATCCCAATTGATGG->G
ori: GGAGAATCCCAATTGATGG->GTAGCATCCCAAGTGATGT
sw: GAGAATCCCAATTGATGG-> TAGAATCCCAATTGATGT
wf: GAGAATCCCAATTGATGG-> TAGAATCCCAATTGATGG <-
fix: GAGAATCCCAATTGATGG-> TAGAATCCCAATTGATGT
ori: GGAGAATCCCAATTGATGG ->GTAGAATCCCAATTGATGT
sw: GAGAATCCCAATTGATGG -> TAGCATCCCAAGTGATGT
wf: GAGAATCCCAATTGATGG -> TAGCATCCCAAGTGATGG <-
fix: GAGAATCCCAATTGATGG-> TAGCATCCCAAGTGATGT
GAGAATCCCAATTGATGG-> TAGAATCCCAATTGATGG
ori: GGAGAATCCCAATTGATGG->GGAGCATCCCAATTGATGG
sw: AATCCCAATTGATGG-> CATCCCAATTGATGG
wf: AATCCCAATTGATG -> CATCCCAATTGATG
fix: AATCCCAATTGATGG-> CATCCCAATTGATGG
A C
\f[R]
.fi
.SH vcfwave issue
.PP
Now where does the result TAGAATCCCAATTGATGG come from?
.IP
.nf
\f[C]
Original input record (see samples/10134514.vcf)
10134514 GGAGAATCCCAATTGATGG GTAGCATCCCAAGTGATGT,GTAGAATCCCAATTGATGT,GGAGCATCCCAATTGATGG,GG
WF CIGARs:
10134514:1M1X2M1X7M1X5M1X:GGAGAATCCCAATTGATGG,GTAGCATCCCAAGTGATGT
10134514:1M1X16M1X:GGAGAATCCCAATTGATGG,GTAGAATCCCAATTGATGT
10134514:4M1X14M:GGAGAATCCCAATTGATGG,GGAGCATCCCAATTGATGG
10134514:2M17D:GGAGAATCCCAATTGATGG,GG
Decomposed alleles (return from parsedAlternates):
GGAGAATCCCAATTGATGG 10134514 GGAGAATCCCAATTGATGG -> GGAGAATCCCAATTGATGG
GGAGCATCCCAATTGATGG 10134514 GGAG -> GGAG
GGAGCATCCCAATTGATGG 10134518 A -> C
GGAGCATCCCAATTGATGG 10134519 ATCCCAATTGATGG -> ATCCCAATTGATGG
GTAGAATCCCAATTGATGT 10134514 G -> G
GTAGAATCCCAATTGATGT 10134515 G -> T
GTAGAATCCCAATTGATGT 10134516 AGAATCCCAATTGATG -> AGAATCCCAATTGATG
GTAGAATCCCAATTGATGT 10134532 G -> T
GTAGCATCCCAAGTGATGT 10134514 G -> G
GTAGCATCCCAAGTGATGT 10134515 G -> T
GTAGCATCCCAAGTGATGT 10134516 AG -> AG
GTAGCATCCCAAGTGATGT 10134518 A -> C
GTAGCATCCCAAGTGATGT 10134519 ATCCCAA -> ATCCCAA
GTAGCATCCCAAGTGATGT 10134526 T -> G
GTAGCATCCCAAGTGATGT 10134527 TGATG -> TGATG
GTAGCATCCCAAGTGATGT 10134532 G -> T
Final result (see test/data/regression/vcfwave_5.vcf):
10134515 GAGAATCCCAATTGATGG TAGAATCCCAATTGATGG,G
10134518 A C
10134526 T G
10134532 G T
\f[R]
.fi
.PP
Now where does TAGAATCCCAATTGATGG come from?
.PP
Output produced by test/tests/realign.py
.SH LICENSE
.PP
Copyright 2011-2024 (C) Erik Garrison, Pjotr Prins and vcflib
contributors.
MIT licensed.
.SH AUTHORS
Erik Garrison, Pjotr Prins and other vcflib contributors.
|